ML Wiki

This is a stub. Edit it.

Feature Filtering

Given $D$ features $f_1, \ ... \ , f_D$ and outcome $Y$

rank these features according to some criterion of "importance"
keep only important ones

Important?

top $d$
ones with scores above some threshold

Criteria of usefulness :

All these functions capture the intuition that the best features for predicting the outcome $Y$ is ones that distribute very differently given values of $Y$
Usually these functions measure (in)dependence between $f_i$ and $Y$
the more dependent the feature is, the better it is for classification

E.g. $\chi^2$ measures how the results of an observation differs from the result expected according to the null hypothesis

lower values indicate less dependency
so for $\chi^2$ we want to take biggest values

Sources

Sebastiani, Fabrizio. "Machine learning in automated text categorization." (2002). [1]

Retrieved from "http://mlwiki.org/index.php?title=Feature_Filtering&oldid=618"