This is a stub. Edit it. |

Given $D$ features $f_1, \ ... \ , f_D$ and outcome $Y$

- rank these features according to some criterion of "importance"
- keep only important ones

Important?

- top $d$
- ones with scores above some threshold

Criteria of usefulness :

- Information Theory measures (Shannon's Information Measures))
- Entropy-Based Ranking
- Information Gain
- Mutual Information
- Odds Ratio
- Chi-Squared Ranking

- All these functions capture the intuition that the best features for predicting the outcome $Y$ is ones that distribute very differently given values of $Y$
- Usually these functions measure (in)dependence between $f_i$ and $Y$
- the more dependent the feature is, the better it is for classification

E.g. $\chi^2$ measures how the results of an observation differs from the result expected according to the null hypothesis

- lower values indicate less dependency
- so for $\chi^2$ we want to take biggest values

- Sebastiani, Fabrizio. "Machine learning in automated text categorization." (2002). [1]