bayesian target encoding

1. reference
Backlinks

weighted mean of the value of naive target encoding and the mean of all rows.

For the following

name	favorite color	height	net worth
james	red	1.7	5000
josh	blue	1.8	4000
johnathan	red	1.7	7000

red would be mapped with the following equation

\[ encoding = \frac{n * \text{mean networth of people who fav red} + m * \text{mean networth of all people}}{n + m} \] in which \(n\) is usually number of people who fav red(number of rows with favorite color red), and \(m\) a user-defined number like 2 or 7

1. reference

Statquest’s encoding episode

Backlinks

target encoding

To encode the discrete-valued feature with the target value we are trying to pridict.

leave-one-out target encoding

k-fold target encoding in which each fold contains only 1 row, so that row’s feature’s encoding is calculated with bayesian target encoding for that feature of all other rows.

k-fold target encoding

bayesian target encoding but split the training set to k equal size parts, and encode each part’s feature with all other parts target values.

So for

name	favorite color	height	net worth
james	red	1.7	5000
josh	blue	1.8	4000
johnathan	red	1.7	7000
joe	blue	1.8	6000
joel	blue	1.8	5000
johnas	red	1.7	5000

with k = 3,

name	favorite color	height	net worth
james	red	1.7	5000
josh	blue	1.8	4000
johnathan	red	1.7	7000
joe	blue	1.8	6000
joel	blue	1.8	5000
johnas	red	1.7	5000

James’ red value is encoded with only

name	favorite color	height	net worth


johnathan	red	1.7	7000
joe	blue	1.8	6000
joel	blue	1.8	5000
johnas	red	1.7	5000

And joe’s blue with only

name	favorite color	height	net worth
james	red	1.7	5000
josh	blue	1.8	4000


joel	blue	1.8	5000
johnas	red	1.7	5000