ML Wiki

Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Learning Curves

machine-learning

Learning Curves

This is a good technique (a part of Machine Learning Diagnosis)

to sanity-check a model
to improve performance

A ‘‘learning curve’’ is a plot where we have two functions of $m$ ($m$ is a set size):

training set error $J_{\text{train}}(\theta)$,
the cross-validation error $J_{\text{cv}}(\theta)$

We can artificially reduce our training set size.

We start from $m = 1$, then $m = 2$ and so on

So suppose we have the following model:

$h_{\theta}(x) = \theta_0 + \theta_1 x + \theta_2 x^2$
for each $m$ we calculate $J_{\text{train}}(\theta)$ and $J_{\text{cv}}(\theta)$ and plot the values
This is the learning curve of the model

Diagnose High Bias (Underfitting)

Suppose we want to fit a straight line to out data:: $h_{\theta}(x) = \theta_0 + \theta_1 x$

As $m$ increases we have pretty same line:

If we draw the learning curves, we’ll have

So we see that

as $m$ grows $J_{\text{cv}}(\theta) \to J_{\text{train}}(\theta)$
and both errors are high

$\Rightarrow$ If learning algorithm is suffering from high bias, getting more examples will not help

Diagnose High Variance (Overfitting)

Now suppose we have a model with polynomial of very high order:: $h_{\theta}(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + … + \theta_{100} x^{100}$

at the beginning we very much overfit
as we increase $m$, we still able to fit the data well

So we can see that as $m$ increases,

$J_{\text{train}}(\theta)$ increases (we have more and more data - so it’s harder and harder to fit $h_{\theta}(x)$), but it increases very slowly
on the other hand, $J_{\text{cv}}(\theta)$ decreases, but also very very slow
and there’s a huge gap between these 2
to fill that gap we need many many more training examples

$\Rightarrow$ if a learning algorithm is suffering from high variance (i.e. it overfits), getting more data is likely to help

See also

Sources

Machine Learning (coursera)

✏️ Edit on GitHub