discrete-valued feature
Table of Contents
Variables whose set of possible values have 1-to-1 correspondance with the natural numbers.
Variables that have a finite(or countably infinite) set of possible values.
For example, categorical feature like color with set of options {red, yellow, blue}, or boolean feature like “love troll 2” with options {yes, no}, or number of days with options 1,2,3,4,....
Backlinks
target encoding
To encode the discrete-valued feature with the target value we are trying to pridict.
one-hot vector encoding
To represent discrete-valued feature, one can put one feature per value, and put 1 if the discrete-valued feature take that value, i.e.
| name | favorite color | height | net worth |
| james | red | 1.7 | 5000 |
| josh | blue | 1.8 | 4000 |
to
| name | favorite red | favorite blue | height | net worth |
| james | 1 | 0 | 1.7 | 5000 |
| josh | 0 | 1 | 1.8 | 4000 |
label encoding
Map each value of the discrete-valued feature into a natural number, i.e.
| name | favorite color | height | net worth |
| james | red | 1.7 | 5000 |
| josh | blue | 1.8 | 4000 |
(red -> 1, blue ->2)
| name | favorite color | height | net worth |
| james | 1 | 1.7 | 5000 |
| josh | 2 | 1.8 | 4000 |
feature
In data science and machine learning, a feature is a measurable property of a phenomenon.
In the raw data form, it normally refers to a single column in the data set such as follows:
| name | favorite color | height | net worth |
| james | red | 1.7 | 5000 |
| josh | blue | 1.8 | 4000 |
In this dataset, favorite color is a feature, and height is another one. They both describes some measureable property of people like james and josh
favorite color would be refer to as a discrete-valued feature, while height a continuous feature, and the whole row
| james | red | 1.7 | 5000 |