|
|
Line 82: |
Line 82: |
| * Get more training examples | | * Get more training examples |
| * Try smaller set of features | | * Try smaller set of features |
− | * Try decreasing [[Regularization|regularization parameter]] $\lambda$ | + | * Try increasing [[Regularization|regularization parameter]] $\lambda$ |
| | | |
| To fix high bias: | | To fix high bias: |
| * Try getting additional features | | * Try getting additional features |
| * Try adding polynomial features (beware of [[Overfitting]]!) | | * Try adding polynomial features (beware of [[Overfitting]]!) |
− | * Try increasing [[Regularization|regularization parameter]] $\lambda$ | + | * Try decreasing [[Regularization|regularization parameter]] $\lambda$ |
| | | |
| | | |
Latest revision as of 22:34, 18 December 2017
Machine Learning Diagnosis
Suppose you created a model, but when you tested it, you found that it makes large errors
What should you try?
- Get more training examples
- Try smaller set of features
- Try getting additional features
- Try adding polynomial features (beware of Overfitting!)
- Try increasing regularization parameter $\lambda$
- Try decreasing $\lambda$
Diagnosis - a test that you can run to gain insights what is working with the learning algorithms and what is not, and gain guidance as how to improve the performance.
Evaluating a Hypothesis
To test if we overfit, we can perform Cross-Validation:
- train the model on the training set
- check the model on the test set
Diagnosing Bias vs Variance
the main sources of problems are
- high bias (underfit)
- tendency to constantly learn the same wrong thing
- you're always missing in the same way
- high variance (Overfitting)
- tendency to output random things irrespective to the input data
- you depend too much on the training data
Dart throwing illustration:
Fitting Polynomial
How to distinguish between them and say which one of them we experience?
- Suppose we want to fit parameter $d$ - what degree of polynomial to use (see here)
- with $d = 1$ we underfit
- with $d = 2$ we are just right
- with $d = 4$ we overfit
We can plot the cost function errors vs degree of polynomial $d$ for
- the training set $J_{\text{train}}(\theta)$
- the cross-validation (or test) set $J_{\text{cv}}(\theta)$
in case of bias (underfit) we have
- both $J_{\text{train}}(\theta)$ and $J_{\text{cv}}(\theta)$ are high
- and $J_{\text{train}}(\theta) \approx J_{\text{cv}}(\theta)$
in case of variance (overfit)
- $J_{\text{train}}(\theta)$ is low, $but J_{\text{cv}}(\theta)$ is high
- and $J_{\text{cv}}(\theta) \gg J_{\text{train}}(\theta)$ (much greater)
Fitting Regularization Parameter
When we try to find the best Regularization parameter for a hypothesis we get similar curves:
-
- with small $\lambda$ we have high variance
- with large $\lambda$ we have high bias
Learning Curves is a technique that is used to
What To Do Next?
So, depending on what kind of problem we have, we should decide what to do next
To fix high variance:
To fix high bias:
See Also
Sources