ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Feature Filtering

Feature Filtering

Given $D$ features $f_1, \ … \ , f_D$ and outcome $Y$

  • rank these features according to some criterion of “importance”
  • keep only important ones

Important?

  • top $d$
  • ones with scores above some threshold

Criteria of usefulness :

E.g. $\chi^2$ measures how the results of an observation differs from the result expected according to the null hypothesis

  • lower values indicate less dependency
  • so for $\chi^2$ we want to take biggest values

Sources

  • Sebastiani, Fabrizio. “Machine learning in automated text categorization.” (2002). [http://arxiv.org/pdf/cs/0110053.pdf]