Dimensionality Reduction

dimensionality-reduction feature-selection machine-learning

This is a technique to reduce the dimensionality of our data sets

we have a data set of ${ \mathbf x_i }$ of $\mathbf x_i \in \mathbb R^D$ with very large $D$
the goal is to find a mapping $f: \mathbb R^D \mapsto \mathbb R^d$ s.t. $d \ll D$
for Visualization the target dimension is usually small, e.g. $d = 2$ or $d =3$

Feature Selection

In IR these techniques are usually called “Term Selection” rather than “Feature selection”

Usual IR and indexing techniques for reducing dimensionality are

Subset Selection (“Wrapper Approach”) take subset of features and see if it’s better or not
Feature Filtering: rank features according to some “usefulness” function

Generate new features based on the original ones

Linear

Principal Component Analysis (often done via Eigendecomposition or SVD)
Fisher Discriminant Analysis (sometimes Linear Discriminant Analysis) - supervised technique for Dimensionality Reduction

Non-Linear

Machine Learning (coursera)
Machine Learning 1 (TUB)
Machine Learning 2 (TUB)
Sebastiani, Fabrizio. “Machine learning in automated text categorization.” (2002). [http://arxiv.org/pdf/cs/0110053.pdf]

✏️ Edit on GitHub