Dimensionality Reduction
This is a technique to reduce the dimensionality of our data sets
- we have a data set of $\{ \mathbf x_i \}$ of $\mathbf x_i \in \mathbb R^D$ with very large $D$
- the goal is to find a mapping $f: \mathbb R^D \mapsto \mathbb R^d$ s.t. $d \ll D$
- for Visualization the target dimension is usually small, e.g. $d = 2$ or $d =3$
Overfitting
- DR techniques tend to reduce Overfitting:
- if dimensionality of data is $D$ and there are $N$ examples in the training set
- then it's good to have $D \approx N$ to avoid overfitting
Agressiveness
- Note that DR techniques sometimes may remove important information
- Aggressiveness of reduction is $D / d$
In IR these techniques are usually called "Term Selection" rather than "Feature selection"
Usual IR and indexing techniques for reducing dimensionality are
Term Clustering
General Techniques
Factor Analysis
Generate new features based on the original ones
Linear
Non-Linear
Links
Sources