naive target encoding

Table of Contents

To encode the each value with the mean of target features of the rows which take that value.i.e.

name favorite color height net worth
james red 1.7 5000
josh blue 1.8 4000
johnathan red 1.7 7000

red would be mapped into \(\frac{5000+7000}{2}\) because there’s 2 rows/peoples with faviour color = red and their networth are 5000 and 7000

Blue would be mapped to 4000

Backlinks

bayesian target encoding

weighted mean of the value of naive target encoding and the mean of all rows.

For the following

name favorite color height net worth
james red 1.7 5000
josh blue 1.8 4000
johnathan red 1.7 7000

red would be mapped with the following equation

\[ encoding = \frac{n * \text{mean networth of people who fav red} + m * \text{mean networth of all people}}{n + m} \] in which \(n\) is usually number of people who fav red(number of rows with favorite color red), and \(m\) a user-defined number like 2 or 7

Author: Linfeng He

Created: 2024-04-03 Wed 23:24