naive target encoding
Table of Contents
To encode the each value with the mean of target features of the rows which take that value.i.e.
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
johnathan | red | 1.7 | 7000 |
red would be mapped into \(\frac{5000+7000}{2}\) because there’s 2 rows/peoples with faviour color = red
and their networth
are 5000 and 7000
Blue would be mapped to 4000
Backlinks
target encoding
To encode the discrete-valued feature with the target value we are trying to pridict.
bayesian target encoding
weighted mean of the value of naive target encoding and the mean of all rows.
For the following
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
johnathan | red | 1.7 | 7000 |
red would be mapped with the following equation
\[ encoding = \frac{n * \text{mean networth of people who fav red} + m * \text{mean networth of all people}}{n + m} \] in which \(n\) is usually number of people who fav red(number of rows with favorite color red), and \(m\) a user-defined number like 2 or 7