Data Normalization
Typically, normalization refers to
- transforming all values of some continuous variable to the same scale
- it's done at the Data Transformation stage
There are several approaches
Min-Max Normalization
Min-max normalization
- normalize to scale $[\text{new_min}_A, \text{new_max}_A]$
- for each new value, calculate $v'= \cfrac{v - \text{min}_A}{\text{max}_A - \text{min}_A} \cdot (\text{new_max}_A - \text{new_min}_A) + \text{new_min}_A$
- the easiest model
- not always good - if there are outliers
Example
- income range between 12K to 98K
- want to normalize to $[0.0, 1.0]$.
- so, for 73K have $\cfrac{73-12}{98-12} \approx 0.716$
$Z$-score Normalization
$v'= \cfrac{v - \mu_A}{\sigma_A}$
Example
- Assume that $\mu = 54K$ and $\sigma = 16K$
- So 73K becomes 1.225
Usages
Sources