Biased Estimators

A Point Estimate is biased if

  • the Sampling Distribution of some parameter being estimated is not centered around the true parameter value
  • otherwise a Point Estimate is unbiased


Bias of an estimate is the expected difference between the estimated value and the true value


Unbiased Estimation

A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated


Proportion

In our coin flipping example


For the entire experiment:

  • 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, ..., X_{10}$
  • so, $\hat{p} = \cfrac{X_1 + ... + X_{10}}{10} = \bar{X}$
  • thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
  • and $\hat{p}$ is called unbiased estimator


Biased Estimation

Standard Deviation

Standard Deviation is biased estimate of the true standard deviation of the proportion

  • so we typically use the sample standard deviation, which is
    • $s = \cfrac{1}{n-1} \sum_{i=1}^n x_i $


Can simulate it to see that it's true

  • suppose that we have the following population
    • d6f7d488b10e4e819d77def52d4bd26d.png
  • we sample with sample size 25 many times (e.g. 5000)
    • each time calculate biased std as well as corrected std
  • then plot the sampling distributions
    • a334404ea02a4ffd877dc57c7f0636b9.png
    • we see that the corrected std is closer to the real population std
    • note that the real population std should not be corrected!


R simulation  
sd.population = function(x) {
  n = length(x)
  m = mean(x)
  sqrt(sum((x - m) ^ 2) / n)
}

population = unlist(sapply(X=1:7, FUN=function(x) { rep(x, choose(8, x)) }))
pop = table(population)
b = barplot(pop)
text(x=b, y=pop-4, pop)

set.seed(1231)
sample.1 = rep(NA, 5000)
sample.2 = rep(NA, 5000)

size = 25

for (i in 1:5000) {
  s = sample(population, size)
  sample.1[i] = sd(s)
  sample.2[i] = sd.population(s)
}

true.pop = sd.population(population)
biased.center = mean(sample.2)
center = mean(sample.1)

c(true.pop, center, biased.center)
c(abs(true.pop - center), abs(true.pop - biased.center))

x = seq(0, 3, 0.1)

hist(sample.1, col=adjustcolor('blue', 1/4), breaks=35,
     probability=T, xlim=c(0.8, 1.9),
     main='Sampling Distributions of STD functions',
     xlab='Estimated Value')
abline(v=center, col='blue')
xspline(x=x, y=dnorm(x, mean=center, sd=sd(sample.1)), 
        lwd=1, shape=1, lty=2, border="blue")

hist(sample.2, col=adjustcolor('red', 1/4), probability=T,
     breaks=35, add=T)
abline(v=biased.center, col='red')
xspline(x=x, y=dnorm(x, mean=biased.center, sd=sd(sample.2)), 
        lwd=1, shape=1, lty=2, border="red")

abline(v=true.pop)



Sources