discrete-valued feature
Table of Contents
Variables whose set of possible values have 1-to-1 correspondance with the natural numbers.
Variables that have a finite(or countably infinite) set of possible values.
For example, categorical feature like color with set of options {red, yellow, blue}
, or boolean feature like “love troll 2” with options {yes, no}
, or number of days with options 1,2,3,4,...
.
Backlinks
target encoding
To encode the discrete-valued feature with the target value we are trying to pridict.
one-hot vector encoding
To represent discrete-valued feature, one can put one feature per value, and put 1 if the discrete-valued feature take that value, i.e.
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
to
name | favorite red | favorite blue | height | net worth |
james | 1 | 0 | 1.7 | 5000 |
josh | 0 | 1 | 1.8 | 4000 |
label encoding
Map each value of the discrete-valued feature into a natural number, i.e.
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
(red -> 1, blue ->2)
name | favorite color | height | net worth |
james | 1 | 1.7 | 5000 |
josh | 2 | 1.8 | 4000 |
feature
In data science and machine learning, a feature is a measurable property of a phenomenon.
In the raw data form, it normally refers to a single column in the data set such as follows:
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
In this dataset, favorite color
is a feature, and height
is another one. They both describes some measureable property of people like james and josh
favorite color
would be refer to as a discrete-valued feature, while height
a continuous feature, and the whole row
james | red | 1.7 | 5000 |