Overfitting

Overfitting (or high variance) - if we have too many features, the learning hypothesis may

  • fit the training set very well (with cost function $J(\theta) \approx 0$),
  • but fail to generalize to new examples (predict for new data)


Generalization Error

Cross-Validation

Best way to see if you overfit:

  • split data in training and test set
  • train the model on training set
  • evaluate the model on the training set
  • evaluate the model on the test set
  • generalization error: difference between them, measures the ability to generalize


It's clear that a model overfits when we plot the generalization error


High Variance vs High Bias

Generalization error can be decomposed into bias and variance

  • bias: tendency to constantly learn the same wrong thing
  • variance: tendency to learn random things irrespective to the input data

Dart throwing illustration:

  • high-variance-bias.png


Underfitting

  • high bias, low variance
  • you're always missing in the same way

example:

  • predict always the same
  • very insensible to the data
  • the variance is very low! (0)
  • but it has high bias - it's wrong


Examples

Multivariate Linear Regression

Suppose we have a set of data

  • overfit-dataset-lin.png
  • We can fit the following Multivariate Linear Regression model
  • linear: $\theta_0 + \theta_1 x$, likely to underfit (high bias)
  • quadratic: $\theta_0 + \theta_1 x + \theta_2 x^2 $
  • extreme: $\theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \theta_4 x^4 $
overfit-dataset-lin-ex.png


Logistic Regression

Same applies for Logistic Regression

  • Suppose we have the following set
  • overfit-dataset.png
  • We may underfit with just a line
    • $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2)$
  • We may perform just right, but missing some positive examples
    • $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_2^2 + \theta_5 x_1 x_2)$
  • Or we may overfit using high-polynomial model
    • $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_2^2 + \theta_5 x_1 x_2 + \theta_6 x_1^2 x_2 + \theta_7 x_1 x_2^2 + \theta_8 x_1^2 x_2^2 + \theta_9 x_1^3 + ...)$
  • overfitting-logreg-ex.png


The problem with it

  • overly high polynomial
  • it can fit anything!
  • it overfits - results in high variance


Diagnosing

How to Diagnose the Problem

To identify overfitting we can use Machine Learning Diagnosis:


How to Address the Problem

  • plotting - doesn't work with many features
  • reducing the number of features
  • Regularization
    • keep all the features but reduce the magnitude of parameters
  • Cross-Validation
    • test your hypotheses on cross-validation set


Sources