Estimation

Our goal is to be able to estimate theoretical parameters with a data sample.

Example:

  • we want to estimate the probability of getting heads in coin flipping experiment
  • flip a coin 10 times,
  • count number of heads

Experiment:

  • Our parameter of interest is $p = p(\text{heads})$
  • Data: result of 10 coin flips
  • $\hat{p}$ - estimate of $p$
$\hat{p} = \cfrac{\text{# of heads}}{\text{total # of flips}}$
i.e. $\hat{p}$ is calculated from data


Sampling Distribution

  • if we repeat over and over again, each time we will probably have different estimates of $\hat{p}$
  • so there is a variability in the estimate
  • this is called sampling variability, and it occurs because of the randomness in our data


The probability distribution of all the possible values of an estimator is it's sampling distribution.


Unbiased estimation

In our coin flipping example

$X \sim \text{Bernoulli}(0.5)$
  • and $E(X) = 0.5$


For the entire experiment:

  • 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, ..., X_{10}$
  • so, $\hat{p} = \cfrac{X_1 + ... + X_{10}}{10} = \bar{X}$
  • thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
  • and $\hat{p}$ is called unbiased estimator


A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated


Variance estimation

  • For one observation $X \sim \text{Bernoulli}(p)$, variance $\text{Var}(X)$ is:
$\text{Var}(X) = \sum_{x} (x - E(X))^2 p(X) = (1 - p)^2 p + (0 - p)^2 (1 - p) = p - p^2 = p(1 - p)$
  • For $n$ observations $X_1, ..., X_{n}$ with $\hat{p} = E(X)$
since $\text{Var}(\bar{X}) = \cfrac{\sum X_i}{n}$,
$\text{Var}(\hat{p}) = \cfrac{p(1 - p)}{n}$ and $\text{sd}(\hat{p}) = \sqrt{\cfrac{p(1-p)}{n}}$,

So we get more and more precise answers over time


And by the Central Limit Theorem, for large $n$ the sampling distribution is approximately

$N\left(p, \cfrac{p(1-p)}{n}\right)$


Theoretical World Model

In the Normal Distribution we have $N(\mu, \sigma^2)$, and we're interested in $\mu$

  • Say we have $n$ data values $X_1, ..., X_n$ from independent observations
  • Estimator of $\mu$ is $\bar{X} = \cfrac{X_1 + ... + X_n}{n}$
  • So $E(\bar{X}) = \mu$, and $\bar{X}$ - unbiased estimator of $\mu$
  • Variance of $\bar{X}$ is $\text{Var}(\bar{X}) = \cfrac{\sigma^2}{n}$ and $\text{sd}(\bar{X}) = \cfrac{\sigma}{\sqrt{n}}$
  • And by the Central Limit Theorem we have $\bar{X} \sim N(\mu, \cfrac{\sigma^2}{n})$


So,

  • distribution of $\hat{p} \sim N\left(p, \cfrac{p(1-p)}{n}\right)$
  • distribution of $\bar{X} \sim N\left(\mu, \cfrac{\sigma^2}{n}\right)$


For data, unbiased variance is

  • $\text{Var}(X) = \cfrac{1}{n-1} \sum (X_i - \bar{X})^2$ (unbiased)
  • not $\text{Var}(X) = \cfrac{1}{n} \sum (X_i - \bar{X})^2$ (biased)


Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion