ML Wiki

This page is currently a draft. Edit it.

This is a stub. Edit it.

Kernel Methods

Support Vector Machines

Kernel is a generalized dot product

Latent Semantic Analysis captures semantic relations between terms
drawback: computationally expensive

Kernel Methods

data items are mapped into some high-dimensional space where we use Inner Product for constructing models
kernel acts as an "interface" between the model and the high-dimensional space
by defining (often implicitly) a mapping function that transform the input space to some (possibly very high dimensional) feature space

Just dot product is enough for many algorithms:

Use of Kernels

Text Mining

Latent Semantic Kernels: Use Latent Semantic Analysis

Regression

Support Vector Regression

Smola, Alex J., and Bernhard Schölkopf. "A tutorial on support vector regression." 2004. [1]

Kernels

Data Span Solution

Choosing a kernel $\equiv$ choosing a feature space

$k(\mathbf x, \mathbf z) = \langle \varphi(\mathbf x), \varphi(\mathbf z) \rangle$
given a dataset, apply $k$ to each pair and get a Kernel Matrix (also called Gram Matrix)

$$K = \begin{bmatrix} k(\mathbf x_1, \mathbf x_1) & k(\mathbf x_1, \mathbf x_2) & \cdots & k(\mathbf x_1, \mathbf x_n) \\ k(\mathbf x_2, \mathbf x_1) & k(\mathbf x_2, \mathbf x_2) & \cdots & k(\mathbf x_2, \mathbf x_n) \\ \vdots & \vdots & \ddots & \vdots\\ k(\mathbf x_n, \mathbf x_1) & k(\mathbf x_n, \mathbf x_2) & \cdots & k(\mathbf x_n, \mathbf x_n) \\ \end{bmatrix}$$

If we have a linear learning machine with parameters $\mathbf w$,

then the function we try to learn is modeled by $f(\mathbf x) = \mathbf w^T \varphi(\mathbf x)$
we can express $\mathbf w$ as a combination of training points: $\mathbf w = \sum_{i = 1}^{n} \alpha_i \, \varphi(\mathbf x_i)$ (i.e. the solution lies in the span of the data)
then $f(\mathbf x) = \sum_{i = 1}^{n} \alpha_i \, k(\mathbf x_i, \mathbf x)$

Mercer Kernels

Types

We can combine Kernels:

given base kernel $k(\mathbf x, \mathbf z)$ (can be a dot product $k(\mathbf x, \mathbf z) = \langle \mathbf x, \mathbf z \rangle)$)
Polynomial Kernel: $k'(\mathbf x, \mathbf z) = \big( k(\mathbf x, \mathbf z) + D \big)^p$
Gaussian Kernel: $k'(\mathbf x, \mathbf z) = \exp \left( \cfrac{k(\mathbf x, \mathbf x) - 2 \, k(\mathbf x, \mathbf z) + k(\mathbf z, \mathbf z)}{\sigma^2} \right)$

References

Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." 2008. [2]

Sources

Cristianini, Nello, John Shawe-Taylor, and Huma Lodhi. "Latent semantic kernels." 2002. [3]

ML Wiki

Contents

Kernel Methods

Use of Kernels

Classification

Text Mining

Clustering Analysis