Kernel Methods
Support Vector Machines
Kernel is a generalized dot product
Kernel Methods
- data items are mapped into some high-dimensional space where we use Inner Product for constructing models
- kernel acts as an "interface" between the model and the high-dimensional space
- by defining (often implicitly) a mapping function that transform the input space to some (possibly very high dimensional) feature space
Just dot product is enough for many algorithms:
Use of Kernels
Support Vector Machines
Latent Semantic Kernels: Use Latent Semantic Analysis
Support Vector Regression
- Smola, Alex J., and Bernhard Schölkopf. "A tutorial on support vector regression." 2004. [1]
Kernels
Data Span Solution
Choosing a kernel $\equiv$ choosing a feature space
- $k(\mathbf x, \mathbf z) = \langle \varphi(\mathbf x), \varphi(\mathbf z) \rangle$
- given a dataset, apply $k$ to each pair and get a Kernel Matrix (also called Gram Matrix)
$$K = \begin{bmatrix}
k(\mathbf x_1, \mathbf x_1) & k(\mathbf x_1, \mathbf x_2) & \cdots & k(\mathbf x_1, \mathbf x_n) \\
k(\mathbf x_2, \mathbf x_1) & k(\mathbf x_2, \mathbf x_2) & \cdots & k(\mathbf x_2, \mathbf x_n) \\
\vdots & \vdots & \ddots & \vdots\\
k(\mathbf x_n, \mathbf x_1) & k(\mathbf x_n, \mathbf x_2) & \cdots & k(\mathbf x_n, \mathbf x_n) \\
\end{bmatrix}$$
If we have a linear learning machine with parameters $\mathbf w$,
- then the function we try to learn is modeled by $f(\mathbf x) = \mathbf w^T \varphi(\mathbf x)$
- we can express $\mathbf w$ as a combination of training points: $\mathbf w = \sum_{i = 1}^{n} \alpha_i \, \varphi(\mathbf x_i)$ (i.e. the solution lies in the span of the data)
- then $f(\mathbf x) = \sum_{i = 1}^{n} \alpha_i \, k(\mathbf x_i, \mathbf x)$
Mercer Kernels
Types
We can combine Kernels:
- given base kernel $k(\mathbf x, \mathbf z)$ (can be a dot product $k(\mathbf x, \mathbf z) = \langle \mathbf x, \mathbf z \rangle)$)
- Polynomial Kernel: $k'(\mathbf x, \mathbf z) = \big( k(\mathbf x, \mathbf z) + D \big)^p$
- Gaussian Kernel: $k'(\mathbf x, \mathbf z) = \exp \left( \cfrac{k(\mathbf x, \mathbf x) - 2 \, k(\mathbf x, \mathbf z) + k(\mathbf z, \mathbf z)}{\sigma^2} \right)$
References
- Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." 2008. [2]
Sources
- Cristianini, Nello, John Shawe-Taylor, and Huma Lodhi. "Latent semantic kernels." 2002. [3]