This is a technique to reduce the dimensionality of our data sets

- we have a data set of $\{ \mathbf x_i \}$ of $\mathbf x_i \in \mathbb R^D$ with very large $D$
- the goal is to find a mapping $f: \mathbb R^D \mapsto \mathbb R^d$ s.t. $d \ll D$
- for Visualization the target dimension is usually small, e.g. $d = 2$ or $d =3$

- DR techniques tend to reduce Overfitting:
- if dimensionality of data is $D$ and there are $N$ examples in the training set
- then it's good to have $D \approx N$ to avoid overfitting

- Note that DR techniques sometimes may remove important information
- Aggressiveness of reduction is $D / d$

In IR these techniques are usually called "Term Selection" rather than "Feature selection"

Usual IR and indexing techniques for reducing dimensionality are

- Stop Words Removal
- Stemming or Lemmatization
- less common techniques are Term Strength and Term Contribution

- Subset Selection ("Wrapper Approach") take subset of features and see if it's better or not
- Feature Filtering: rank features according to some "usefulness" function

Generate new features based on the original ones

Linear

- Principal Component Analysis (often done via Eigendecomposition or SVD)
- Fisher Discriminant Analysis (sometimes Linear Discriminant Analysis) - supervised technique for Dimensionality Reduction

Non-Linear

- http://www.public.asu.edu/~jtang20/publication/feature_selection_for_classification.pdf
- http://www.public.asu.edu/~jtang20/publication/FSClustering.pdf

- Machine Learning (coursera)
- Machine Learning 1 (TUB)
- Machine Learning 2 (TUB)
- Sebastiani, Fabrizio. "Machine learning in automated text categorization." (2002). [1]