ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Biased Estimators

Biased Estimators

A Point Estimate is '’biased’’ if

  • the Sampling Distribution of some parameter being estimated is not centered around the true parameter value
  • otherwise a Point Estimate is ‘‘unbiased’’

Bias of an estimate is the expected difference between the estimated value and the true value

Unbiased Estimation

A statistic used to estimate a parameter is ‘‘unbiased’’ if the expected value of its sampling distribution is equal to the value of the parameter being estimated

Proportion

In our coin flipping example

For the entire experiment:

  • 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, …, X_{10}$
  • so, $\hat{p} = \cfrac{X_1 + … + X_{10}}{10} = \bar{X}$
  • thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
  • and $\hat{p}$ is called ‘‘unbiased estimator’’

Biased Estimation

Standard Deviation

Standard Deviation is biased estimate of the true standard deviation of the proportion

  • so we typically use the sample standard deviation, which is
    • $s = \cfrac{1}{n-1} \sum_{i=1}^n x_i $

Can simulate it to see that it’s true

  • suppose that we have the following population
    • Image
  • we sample with sample size 25 many times (e.g. 5000)
    • each time calculate biased std as well as corrected std
  • then plot the sampling distributions
    • Image
    • we see that the corrected std is closer to the real population std
    • note that the real population std should not be corrected
R simulation ```r sd.population = function(x) { n = length(x) m = mean(x) sqrt(sum((x - m) ^ 2) / n) } population = unlist(sapply(X=1:7, FUN=function(x) { rep(x, choose(8, x)) })) pop = table(population) b = barplot(pop) text(x=b, y=pop-4, pop) set.seed(1231) sample.1 = rep(NA, 5000) sample.2 = rep(NA, 5000) size = 25 for (i in 1:5000) { s = sample(population, size) sample.1[i] = sd(s) sample.2[i] = sd.population(s) } true.pop = sd.population(population) biased.center = mean(sample.2) center = mean(sample.1) c(true.pop, center, biased.center) c(abs(true.pop - center), abs(true.pop - biased.center)) x = seq(0, 3, 0.1) hist(sample.1, col=adjustcolor('blue', 1/4), breaks=35, probability=T, xlim=c(0.8, 1.9), main='Sampling Distributions of STD functions', xlab='Estimated Value') abline(v=center, col='blue') xspline(x=x, y=dnorm(x, mean=center, sd=sd(sample.1)), lwd=1, shape=1, lty=2, border="blue") hist(sample.2, col=adjustcolor('red', 1/4), probability=T, breaks=35, add=T) abline(v=biased.center, col='red') xspline(x=x, y=dnorm(x, mean=biased.center, sd=sd(sample.2)), lwd=1, shape=1, lty=2, border="red") abline(v=true.pop) ```

Sources