Dimensionality Reduction
This is a technique to reduce the dimensionality of our data sets
- we have a data set of ${ \mathbf x_i }$ of $\mathbf x_i \in \mathbb R^D$ with very large $D$
- the goal is to find a mapping $f: \mathbb R^D \mapsto \mathbb R^d$ s.t. $d \ll D$
- for Visualization the target dimension is usually small, e.g. $d = 2$ or $d =3$
Overfitting
- DR techniques tend to reduce Overfitting:
- if dimensionality of data is $D$ and there are $N$ examples in the training set
- then it’s good to have $D \approx N$ to avoid overfitting
Agressiveness
- Note that DR techniques sometimes may remove important information
- Aggressiveness of reduction is $D / d$
Feature Selection
Information Retrieval and Text Mining
In IR these techniques are usually called “Term Selection” rather than “Feature selection”
Usual IR and indexing techniques for reducing dimensionality are
- Stop Words Removal
- Stemming or Lemmatization
- less common techniques are Term Strength and Term Contribution
General Techniques
- Subset Selection (“Wrapper Approach”) take subset of features and see if it’s better or not
- Feature Filtering: rank features according to some “usefulness” function
Feature Extraction
Generate new features based on the original ones
Linear
- Principal Component Analysis (often done via Eigendecomposition or SVD)
- Fisher Discriminant Analysis (sometimes Linear Discriminant Analysis) - supervised technique for Dimensionality Reduction
Non-Linear
Links
- http://www.public.asu.edu/~jtang20/publication/feature_selection_for_classification.pdf
- http://www.public.asu.edu/~jtang20/publication/FSClustering.pdf
Sources
- Machine Learning (coursera)
- Machine Learning 1 (TUB)
- Machine Learning 2 (TUB)
- Sebastiani, Fabrizio. “Machine learning in automated text categorization.” (2002). [http://arxiv.org/pdf/cs/0110053.pdf]