Kernel Methods
Kernel is a generalized dot product
- Latent Semantic Analysis captures semantic relations between terms
- drawback: computationally expensive
- data items are mapped into some high-dimensional space where we use Inner Product for constructing models
- kernel acts as an “interface” between the model and the high-dimensional space
- by defining (often implicitly) a mapping function that transform the input space to some (possibly very high dimensional) feature space
Just dot product is enough for many algorithms:
Use of Kernels
Classification
Text Mining
Latent Semantic Kernels: Use Latent Semantic Analysis
Clustering Analysis
Regression
Support Vector Regression
- Smola, Alex J., and Bernhard Schölkopf. “A tutorial on support vector regression.” 2004. [http://lasa.epfl.ch/teaching/lectures/ML_Phd/Notes/nu-SVM-SVR.pdf]
Anomaly Detection
Gaussian Processes
Kernel Density Estimation
Kernels
Data Span Solution
Choosing a kernel $\equiv$ choosing a feature space
- $k(\mathbf x, \mathbf z) = \langle \varphi(\mathbf x), \varphi(\mathbf z) \rangle$
- given a dataset, apply $k$ to each pair and get a Kernel Matrix (also called Gram Matrix)
If we have a linear learning machine with parameters $\mathbf w$,
- then the function we try to learn is modeled by $f(\mathbf x) = \mathbf w^T \varphi(\mathbf x)$
- we can express $\mathbf w$ as a combination of training points: $\mathbf w = \sum_{i = 1}^{n} \alpha_i \, \varphi(\mathbf x_i)$ (i.e. the solution lies in the span of the data)
- then $f(\mathbf x) = \sum_{i = 1}^{n} \alpha_i \, k(\mathbf x_i, \mathbf x)$
Mercer Kernels
Types
We can combine Kernels:
- given base kernel $k(\mathbf x, \mathbf z)$ (can be a dot product $k(\mathbf x, \mathbf z) = \langle \mathbf x, \mathbf z \rangle)$)
- Polynomial Kernel: $k’(\mathbf x, \mathbf z) = \big( k(\mathbf x, \mathbf z) + D \big)^p$
- Gaussian Kernel: $k’(\mathbf x, \mathbf z) = \exp \left( \cfrac{k(\mathbf x, \mathbf x) - 2 \, k(\mathbf x, \mathbf z) + k(\mathbf z, \mathbf z)}{\sigma^2} \right)$
References
- Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. “Kernel methods in machine learning.” 2008. [http://www.kernel-machines.org/publications/pdfs/0701907.pdf]
Sources
- Cristianini, Nello, John Shawe-Taylor, and Huma Lodhi. “Latent semantic kernels.” 2002. [http://eprints.soton.ac.uk/259781/1/LatentSemanticKernals_JIIS_18.pdf]