# ML Wiki

## Estimation

Our goal is to be able to estimate theoretical parameters with a data sample.

Example:

• we want to estimate the probability of getting heads in coin flipping experiment
• flip a coin 10 times,

Experiment:

• Our parameter of interest is $p = p(\text{heads})$
• Data: result of 10 coin flips
• $\hat{p}$ - estimate of $p$
$\hat{p} = \cfrac{\text{# of heads}}{\text{total # of flips}}$
i.e. $\hat{p}$ is calculated from data

## Sampling Distribution

• if we repeat over and over again, each time we will probably have different estimates of $\hat{p}$
• so there is a variability in the estimate
• this is called sampling variability, and it occurs because of the randomness in our data

The probability distribution of all the possible values of an estimator is it's sampling distribution.

### Unbiased estimation

In our coin flipping example

$X \sim \text{Bernoulli}(0.5)$
• and $E(X) = 0.5$

For the entire experiment:

• 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, ..., X_{10}$
• so, $\hat{p} = \cfrac{X_1 + ... + X_{10}}{10} = \bar{X}$
• thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
• and $\hat{p}$ is called unbiased estimator

A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated

### Variance estimation

• For one observation $X \sim \text{Bernoulli}(p)$, variance $\text{Var}(X)$ is:
$\text{Var}(X) = \sum_{x} (x - E(X))^2 p(X) = (1 - p)^2 p + (0 - p)^2 (1 - p) = p - p^2 = p(1 - p)$
• For $n$ observations $X_1, ..., X_{n}$ with $\hat{p} = E(X)$
since $\text{Var}(\bar{X}) = \cfrac{\sum X_i}{n}$,
$\text{Var}(\hat{p}) = \cfrac{p(1 - p)}{n}$ and $\text{sd}(\hat{p}) = \sqrt{\cfrac{p(1-p)}{n}}$,

So we get more and more precise answers over time

And by the Central Limit Theorem, for large $n$ the sampling distribution is approximately

$N\left(p, \cfrac{p(1-p)}{n}\right)$

## Theoretical World Model

In the Normal Distribution we have $N(\mu, \sigma^2)$, and we're interested in $\mu$

• Say we have $n$ data values $X_1, ..., X_n$ from independent observations
• Estimator of $\mu$ is $\bar{X} = \cfrac{X_1 + ... + X_n}{n}$
• So $E(\bar{X}) = \mu$, and $\bar{X}$ - unbiased estimator of $\mu$
• Variance of $\bar{X}$ is $\text{Var}(\bar{X}) = \cfrac{\sigma^2}{n}$ and $\text{sd}(\bar{X}) = \cfrac{\sigma}{\sqrt{n}}$
• And by the Central Limit Theorem we have $\bar{X} \sim N(\mu, \cfrac{\sigma^2}{n})$

So,

• distribution of $\hat{p} \sim N\left(p, \cfrac{p(1-p)}{n}\right)$
• distribution of $\bar{X} \sim N\left(\mu, \cfrac{\sigma^2}{n}\right)$

For data, unbiased variance is

• $\text{Var}(X) = \cfrac{1}{n-1} \sum (X_i - \bar{X})^2$ (unbiased)
• not $\text{Var}(X) = \cfrac{1}{n} \sum (X_i - \bar{X})^2$ (biased)

## Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".