bayesian target encoding
Table of Contents
weighted mean of the value of naive target encoding and the mean of all rows.
For the following
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
johnathan | red | 1.7 | 7000 |
red would be mapped with the following equation
\[ encoding = \frac{n * \text{mean networth of people who fav red} + m * \text{mean networth of all people}}{n + m} \] in which \(n\) is usually number of people who fav red(number of rows with favorite color red), and \(m\) a user-defined number like 2 or 7
1. reference
Statquest’s encoding episode
Backlinks
target encoding
To encode the discrete-valued feature with the target value we are trying to pridict.
leave-one-out target encoding
k-fold target encoding in which each fold contains only 1 row, so that row’s feature’s encoding is calculated with bayesian target encoding for that feature of all other rows.
k-fold target encoding
bayesian target encoding but split the training set to k equal size parts, and encode each part’s feature with all other parts target values.
So for
name | favorite color | height | net worth |
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
johnathan | red | 1.7 | 7000 |
joe | blue | 1.8 | 6000 |
joel | blue | 1.8 | 5000 |
johnas | red | 1.7 | 5000 |
with k = 3,
name | favorite color | height | net worth |
---|---|---|---|
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
johnathan | red | 1.7 | 7000 |
joe | blue | 1.8 | 6000 |
joel | blue | 1.8 | 5000 |
johnas | red | 1.7 | 5000 |
James’ red value is encoded with only
name | favorite color | height | net worth |
---|---|---|---|
johnathan | red | 1.7 | 7000 |
joe | blue | 1.8 | 6000 |
joel | blue | 1.8 | 5000 |
johnas | red | 1.7 | 5000 |
And joe’s blue with only
name | favorite color | height | net worth |
---|---|---|---|
james | red | 1.7 | 5000 |
josh | blue | 1.8 | 4000 |
joel | blue | 1.8 | 5000 |
johnas | red | 1.7 | 5000 |