# ML Wiki

## Comparing Learning Algorithms

A lot of things

### Comparing Algorithms

What if we want to compare

• two learning algorithms $A_1$ and $A_2$
• not just two specific classifiers $C_1$ and $C_2$

We want to measure the expected error

• for $R \subset D$ where $D$ is the population, we want to measure
• $E \Big[ \text{error} \big(A_1(R), D \big) - \text{error} \big(A_2(R), D \big) \Big]$
• with $\text{error} \big(A_1(R), D \big)$ being the true error

How to estimate it?

• average result over many different training sets
• ideally all of these sets must be independent from each other
• but usually use K-Fold Cross-Validation

### $K$-Fold Cross-Validation

Estimation:

• given a data set $S$
• for $k = 1..K$,
• let $T_k$ be the training set,
• and $R_k = S - T_k$
• train $C_1 = A_1(R_k)$ and $C_2 = A_2(R_k)$
• let $\delta_k = \text{error}(C_1, T_k) - \text{error}(C_2, T_k)$
• let $\delta^* = \cfrac{1}{K} \sum_{k = 1}^K \delta_k$
• this is the estimate of the expected error $E \Big[ \text{error} \big(A_1(R), D \big) - \text{error} \big(A_2(R), D \big) \Big]$

### $K$-Fold CV Paired $t$-Test

Let's conduct a Statistical Tests of Significance:

• assume (under the null hypothesis) that $A_1$ and $A_2$ have equal expected accuracy
• $t = \cfrac{\delta^*}{\sigma}$ follows the Student distribution with $K-1$ degrees of freedom
• $\delta^* = \cfrac{1}{K} \sum_{k = 1}^K \delta_k$ - the estimate of the expected error
• $\sigma = \sqrt{ \cfrac{1}{K \cdot (K - 1)} \sum_{k = 1}^K (\delta_k - \delta^*)^2 }$

Test:

• the hypothesis $H_A$: $A_1$ and $A_2$ have equal expected error rate
• $H_A$ is accepted with $N = (1 - \alpha)\%$ confidence
• if $| t | < t_{1 - \alpha / 2, K - 1}$
• $t_{1 - \alpha / 2, K - 1}$ is the $1 - \alpha / 2$ percentile of the Student distribution with $K-1$ degrees of freedom

## Examples

### Example 1

$k$ 1 2 3 4 5
$\text{error}(A_1, T_k)$ 0.12 0.15 0.14 0.16 0.11
$\text{error}(A_2, T_k)$ 0.13 0.16 0.13 0.14 0.17
$\delta_k$ -0.01 -0.01 0.01 0.02 -0.06
$(\delta_k - \delta^*)^2$ with $\delta^* = -0.01$ 0.0000 0.0000 0.0004 0.0009 0.0025
$\cfrac{\delta^*}{\sigma}$ -0.73

So,

• $\left| \cfrac{\delta^*}{\sigma} \right| = 0.73$ and $t_{97.5, 4} = 2.77$
• the hypothesis $H_A$ is accepted with 95% confidence

### Example 2

• given two algorithms $A_1$ and $A_2$
• 10 rounds are performed
Obtained error rates
$k$ 1 2 3 4 5 6 7 8 9 10
$A_1$ 0.305 0.322 0.207 0.206 0.31 0.41 0.277 0.26 0.215 0.26
$A_2$ 0.224 0.145 0.224 0.196 0.207 0.204 0.221 0.194 0.162 0.35

So we calculate and obtain the following table

$k$ $A_1$ $A_2$ $\delta_k$ $(\delta_k - \delta^*)^2$
1 0.305 0.224 0.081 0.00027225
2 0.322 0.145 0.177 0.01265625
3 0.207 0.224 -0.017 0.00664225
4 0.206 0.196 0.01 0.00297025
5 0.31 0.207 0.103 0.00148225
6 0.41 0.204 0.206 0.02002225
7 0.277 0.221 0.056 0.00007225
8 0.26 0.194 0.066 0.00000225
9 0.215 0.162 0.053 0.00013225
10 0.26 0.35 -0.09 0.02387025
$\delta^* = \cfrac{1}{10} \sum \delta_k =$ 0.0645
$\sigma^2 = \cfrac{1}{9 \cdot 10} \sum (\delta_k – \delta^*)^2 =$ 0.0007569167

$t = \cfrac{\delta^*}{\sigma^2} \approx 2.34$

• we have 9 degrees of freedom
• for confidence level $\alpha = 0.05$ $\Rightarrow$ need to look for $1 - \alpha/2 = 0.975/%$ percentile
• Calculate the $t$-value: $t_{9, 0.975} = 2.26$
• $t = 2.34 \geqslant 2.26 = t_{9, 0.975}$
• $\Rightarrow$ the hypothesis that $A_1$ and $A_2$ have the same error rate is rejected