ML Wiki

Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Data Discretization

data-transformation

Data Discretization

What if we want to transform a continuous attribute to a categorical?

Equal-Width Partitioning

Also called ‘‘distance partitioning’’

want to divide $X = (x_1, …, x_m)$ into $N$ equal intervals
let $A = \min X$ and $B = \max X$
width: $W = \cfrac{B - A}{N}$
suppose that in one such partition you have all your data
you’ll lose a lot of information
so it’s sensible to Outliers

Equal-Depth Partitioning

Also called ‘‘frequency partitioning’’

Divides $X$ into $N$ intervals,
with each interval containing approximately same number of samples
not sensible to outliers
distribution of values is taken into account

Entropy-Based Discretization

Uses entropy to find the best way to split your data

find the value $\alpha$ that maximizes the Information Gain
split by $\alpha$
repeat recursively until have $N$ intervals or no information gain is possible

Sources

Data Mining (UFRT)

✏️ Edit on GitHub