## Feature Filtering

Given $D$ features $f_1, \ ... \ , f_D$ and outcome $Y$

• rank these features according to some criterion of "importance"
• keep only important ones

Important?

• top $d$
• ones with scores above some threshold

Criteria of usefulness :

• All these functions capture the intuition that the best features for predicting the outcome $Y$ is ones that distribute very differently given values of $Y$
• Usually these functions measure (in)dependence between $f_i$ and $Y$
• the more dependent the feature is, the better it is for classification

E.g. $\chi^2$ measures how the results of an observation differs from the result expected according to the null hypothesis

• lower values indicate less dependency
• so for $\chi^2$ we want to take biggest values

