# ML Wiki

## Dimensionality Reduction

This is a technique to reduce the dimensionality of our data sets

• we have a data set of $\{ \mathbf x_i \}$ of $\mathbf x_i \in \mathbb R^D$ with very large $D$
• the goal is to find a mapping $f: \mathbb R^D \mapsto \mathbb R^d$ s.t. $d \ll D$
• for Visualization the target dimension is usually small, e.g. $d = 2$ or $d =3$

### Overfitting

• DR techniques tend to reduce Overfitting:
• if dimensionality of data is $D$ and there are $N$ examples in the training set
• then it's good to have $D \approx N$ to avoid overfitting

### Agressiveness

• Note that DR techniques sometimes may remove important information
• Aggressiveness of reduction is $D / d$

## Feature Selection

### Information Retrieval and Text Mining

In IR these techniques are usually called "Term Selection" rather than "Feature selection"

Usual IR and indexing techniques for reducing dimensionality are

### General Techniques

• Subset Selection ("Wrapper Approach") take subset of features and see if it's better or not
• Feature Filtering: rank features according to some "usefulness" function

## Feature Extraction

Generate new features based on the original ones

Linear

Non-Linear