# ML Wiki

## Kernel Methods

Kernel is a generalized dot product

Kernel Methods

• data items are mapped into some high-dimensional space where we use Inner Product for constructing models
• kernel acts as an "interface" between the model and the high-dimensional space
• by defining (often implicitly) a mapping function that transform the input space to some (possibly very high dimensional) feature space

Just dot product is enough for many algorithms:

## Use of Kernels

### Regression

Support Vector Regression

• Smola, Alex J., and Bernhard Schölkopf. "A tutorial on support vector regression." 2004. [1]

## Kernels

### Data Span Solution

Choosing a kernel $\equiv$ choosing a feature space

• $k(\mathbf x, \mathbf z) = \langle \varphi(\mathbf x), \varphi(\mathbf z) \rangle$
• given a dataset, apply $k$ to each pair and get a Kernel Matrix (also called Gram Matrix)

$$K = \begin{bmatrix} k(\mathbf x_1, \mathbf x_1) & k(\mathbf x_1, \mathbf x_2) & \cdots & k(\mathbf x_1, \mathbf x_n) \\ k(\mathbf x_2, \mathbf x_1) & k(\mathbf x_2, \mathbf x_2) & \cdots & k(\mathbf x_2, \mathbf x_n) \\ \vdots & \vdots & \ddots & \vdots\\ k(\mathbf x_n, \mathbf x_1) & k(\mathbf x_n, \mathbf x_2) & \cdots & k(\mathbf x_n, \mathbf x_n) \\ \end{bmatrix}$$

If we have a linear learning machine with parameters $\mathbf w$,

• then the function we try to learn is modeled by $f(\mathbf x) = \mathbf w^T \varphi(\mathbf x)$
• we can express $\mathbf w$ as a combination of training points: $\mathbf w = \sum_{i = 1}^{n} \alpha_i \, \varphi(\mathbf x_i)$ (i.e. the solution lies in the span of the data)
• then $f(\mathbf x) = \sum_{i = 1}^{n} \alpha_i \, k(\mathbf x_i, \mathbf x)$

### Types

We can combine Kernels:

• given base kernel $k(\mathbf x, \mathbf z)$ (can be a dot product $k(\mathbf x, \mathbf z) = \langle \mathbf x, \mathbf z \rangle)$)
• Polynomial Kernel: $k'(\mathbf x, \mathbf z) = \big( k(\mathbf x, \mathbf z) + D \big)^p$
• Gaussian Kernel: $k'(\mathbf x, \mathbf z) = \exp \left( \cfrac{k(\mathbf x, \mathbf x) - 2 \, k(\mathbf x, \mathbf z) + k(\mathbf z, \mathbf z)}{\sigma^2} \right)$

## References

• Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." 2008. [2]

## Sources

• Cristianini, Nello, John Shawe-Taylor, and Huma Lodhi. "Latent semantic kernels." 2002. [3]