ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Biased Estimators

Biased Estimators

A Point Estimate is biased if

  • the Sampling Distribution of some parameter being estimated is not centered around the true parameter value
  • otherwise a Point Estimate is unbiased

Bias of an estimate is the expected difference between the estimated value and the true value

Unbiased Estimation

A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated

Proportion

In our coin flipping example

For the entire experiment:

  • 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, …, X_{10}$
  • so, $\hat{p} = \cfrac{X_1 + … + X_{10}}{10} = \bar{X}$
  • thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
  • and $\hat{p}$ is called unbiased estimator

Biased Estimation

Standard Deviation

Standard Deviation is biased estimate of the true standard deviation of the proportion

  • so we typically use the sample standard deviation, which is
    • $s = \cfrac{1}{n-1} \sum_{i=1}^n x_i $

Can simulate it to see that it’s true

  • suppose that we have the following population
    • Image
  • we sample with sample size 25 many times (e.g. 5000)
    • each time calculate biased std as well as corrected std
  • then plot the sampling distributions
    • Image
    • we see that the corrected std is closer to the real population std
    • note that the real population std should not be corrected
R simulation ```r sd.population = function(x) { n = length(x) m = mean(x) sqrt(sum((x - m) ^ 2) / n) } population = unlist(sapply(X=1:7, FUN=function(x) { rep(x, choose(8, x)) })) pop = table(population) b = barplot(pop) text(x=b, y=pop-4, pop) set.seed(1231) sample.1 = rep(NA, 5000) sample.2 = rep(NA, 5000) size = 25 for (i in 1:5000) { s = sample(population, size) sample.1[i] = sd(s) sample.2[i] = sd.population(s) } true.pop = sd.population(population) biased.center = mean(sample.2) center = mean(sample.1) c(true.pop, center, biased.center) c(abs(true.pop - center), abs(true.pop - biased.center)) x = seq(0, 3, 0.1) hist(sample.1, col=adjustcolor('blue', 1/4), breaks=35, probability=T, xlim=c(0.8, 1.9), main='Sampling Distributions of STD functions', xlab='Estimated Value') abline(v=center, col='blue') xspline(x=x, y=dnorm(x, mean=center, sd=sd(sample.1)), lwd=1, shape=1, lty=2, border="blue") hist(sample.2, col=adjustcolor('red', 1/4), probability=T, breaks=35, add=T) abline(v=biased.center, col='red') xspline(x=x, y=dnorm(x, mean=biased.center, sd=sd(sample.2)), lwd=1, shape=1, lty=2, border="red") abline(v=true.pop) ```

Sources