# ML Wiki

## Overfitting

Overfitting (or high variance) - if we have too many features, the learning hypothesis may

• fit the training set very well (with cost function $J(\theta) \approx 0$),
• but fail to generalize to new examples (predict for new data)

## Generalization Error

### Cross-Validation

Best way to see if you overfit:

• split data in training and test set
• train the model on training set
• evaluate the model on the training set
• evaluate the model on the test set
• generalization error: difference between them, measures the ability to generalize

It's clear that a model overfits when we plot the generalization error

### High Variance vs High Bias

Generalization error can be decomposed into bias and variance

• bias: tendency to constantly learn the same wrong thing
• variance: tendency to learn random things irrespective to the input data

Dart throwing illustration:

### Underfitting

• high bias, low variance
• you're always missing in the same way

example:

• predict always the same
• very insensible to the data
• the variance is very low! (0)
• but it has high bias - it's wrong

## Examples

### Multivariate Linear Regression

Suppose we have a set of data

• We can fit the following Multivariate Linear Regression model
• linear: $\theta_0 + \theta_1 x$, likely to underfit (high bias)
• quadratic: $\theta_0 + \theta_1 x + \theta_2 x^2$
• extreme: $\theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \theta_4 x^4$

### Logistic Regression

Same applies for Logistic Regression

• Suppose we have the following set
• We may underfit with just a line
• $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2)$
• We may perform just right, but missing some positive examples
• $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_2^2 + \theta_5 x_1 x_2)$
• Or we may overfit using high-polynomial model
• $g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_2^2 + \theta_5 x_1 x_2 + \theta_6 x_1^2 x_2 + \theta_7 x_1 x_2^2 + \theta_8 x_1^2 x_2^2 + \theta_9 x_1^3 + ...)$

The problem with it

• overly high polynomial
• it can fit anything!
• it overfits - results in high variance

## Diagnosing

### How to Diagnose the Problem

To identify overfitting we can use Machine Learning Diagnosis:

### How to Address the Problem

• plotting - doesn't work with many features
• reducing the number of features
• Regularization
• keep all the features but reduce the magnitude of parameters
• Cross-Validation
• test your hypotheses on cross-validation set