Line 82: | Line 82: | ||

* Get more training examples | * Get more training examples | ||

* Try smaller set of features | * Try smaller set of features | ||

− | * Try | + | * Try increasing [[Regularization|regularization parameter]] $\lambda$ |

To fix high bias: | To fix high bias: | ||

* Try getting additional features | * Try getting additional features | ||

* Try adding polynomial features (beware of [[Overfitting]]!) | * Try adding polynomial features (beware of [[Overfitting]]!) | ||

− | * Try | + | * Try decreasing [[Regularization|regularization parameter]] $\lambda$ |

Suppose you created a model, but when you tested it, you found that it makes large errors

What should you try?

- Get more training examples
- Try smaller set of features
- Try getting additional features
- Try adding polynomial features (beware of Overfitting!)
- Try increasing regularization parameter $\lambda$
- Try decreasing $\lambda$

*Diagnosis* - a test that you can run to gain insights what is working with the learning algorithms and what is not, and gain guidance as how to improve the performance.

To test if we overfit, we can perform Cross-Validation:

- train the model on the training set
- check the model on the test set

the main sources of problems are

- high bias (underfit)
- tendency to constantly learn the same wrong thing
- you're always missing in the same way

- high variance (Overfitting)
- tendency to output random things irrespective to the input data
- you depend too much on the training data

Dart throwing illustration:

How to distinguish between them and say which one of them we experience?

- Suppose we want to fit parameter $d$ - what degree of polynomial to use (see here)
- with $d = 1$ we underfit
- with $d = 2$ we are just right
- with $d = 4$ we overfit

We can plot the cost function errors vs degree of polynomial $d$ for

- the training set $J_{\text{train}}(\theta)$
- the cross-validation (or test) set $J_{\text{cv}}(\theta)$

in case of bias (underfit) we have

- both $J_{\text{train}}(\theta)$ and $J_{\text{cv}}(\theta)$ are high
- and $J_{\text{train}}(\theta) \approx J_{\text{cv}}(\theta)$

in case of variance (overfit)

- $J_{\text{train}}(\theta)$ is low, $but J_{\text{cv}}(\theta)$ is high
- and $J_{\text{cv}}(\theta) \gg J_{\text{train}}(\theta)$ (much greater)

When we try to find the best Regularization parameter for a hypothesis we get similar curves:

- with small $\lambda$ we have high variance
- with large $\lambda$ we have high bias

Learning Curves is a technique that is used to

- sanity-check our algorithm or
- improve performance
- diagnose high bias (underfit)
- diagnose high variance (overfit)

So, depending on what kind of problem we have, we should decide what to do next

To fix high variance:

- Get more training examples
- Try smaller set of features
- Try increasing regularization parameter $\lambda$

To fix high bias:

- Try getting additional features
- Try adding polynomial features (beware of Overfitting!)
- Try decreasing regularization parameter $\lambda$

- Machine Learning (coursera)
- Domingos, Pedro. "A few useful things to know about machine learning." [1]