# ML Wiki

## Data Normalization

Typically, normalization refers to

• transforming all values of some continuous variable to the same scale
• it's done at the Data Transformation stage

There are several approaches

• Min-Max
• $Z$-score

## Min-Max Normalization

Min-max normalization

• normalize to scale $[\text{new_min}_A, \text{new_max}_A]$
• for each new value, calculate $v'= \cfrac{v - \text{min}_A}{\text{max}_A - \text{min}_A} \cdot (\text{new_max}_A - \text{new_min}_A) + \text{new_min}_A$
• the easiest model
• not always good - if there are outliers

Example

• income range between 12K to 98K
• want to normalize to $[0.0, 1.0]$.
• so, for 73K have $\cfrac{73-12}{98-12} \approx 0.716$

## $Z$-score Normalization

$v'= \cfrac{v - \mu_A}{\sigma_A}$

Example

• Assume that $\mu = 54K$ and $\sigma = 16K$
• So 73K becomes 1.225