Biased Estimators

A Point Estimate is biased if

  • the Sampling Distribution of some parameter being estimated is not centered around the true parameter value
  • otherwise a Point Estimate is unbiased

Bias of an estimate is the expected difference between the estimated value and the true value

Unbiased Estimation

A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated


In our coin flipping example

For the entire experiment:

  • 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, ..., X_{10}$
  • so, $\hat{p} = \cfrac{X_1 + ... + X_{10}}{10} = \bar{X}$
  • thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
  • and $\hat{p}$ is called unbiased estimator

Biased Estimation

Standard Deviation

Standard Deviation is biased estimate of the true standard deviation of the proportion

  • so we typically use the sample standard deviation, which is
    • $s = \cfrac{1}{n-1} \sum_{i=1}^n x_i $

Can simulate it to see that it's true

  • suppose that we have the following population
    • d6f7d488b10e4e819d77def52d4bd26d.png
  • we sample with sample size 25 many times (e.g. 5000)
    • each time calculate biased std as well as corrected std
  • then plot the sampling distributions
    • a334404ea02a4ffd877dc57c7f0636b9.png
    • we see that the corrected std is closer to the real population std
    • note that the real population std should not be corrected!

R simulation  
sd.population = function(x) {
  n = length(x)
  m = mean(x)
  sqrt(sum((x - m) ^ 2) / n)

population = unlist(sapply(X=1:7, FUN=function(x) { rep(x, choose(8, x)) }))
pop = table(population)
b = barplot(pop)
text(x=b, y=pop-4, pop)

sample.1 = rep(NA, 5000)
sample.2 = rep(NA, 5000)

size = 25

for (i in 1:5000) {
  s = sample(population, size)
  sample.1[i] = sd(s)
  sample.2[i] = sd.population(s)

true.pop = sd.population(population) = mean(sample.2)
center = mean(sample.1)

c(true.pop, center,
c(abs(true.pop - center), abs(true.pop -

x = seq(0, 3, 0.1)

hist(sample.1, col=adjustcolor('blue', 1/4), breaks=35,
     probability=T, xlim=c(0.8, 1.9),
     main='Sampling Distributions of STD functions',
     xlab='Estimated Value')
abline(v=center, col='blue')
xspline(x=x, y=dnorm(x, mean=center, sd=sd(sample.1)), 
        lwd=1, shape=1, lty=2, border="blue")

hist(sample.2, col=adjustcolor('red', 1/4), probability=T,
     breaks=35, add=T)
abline(, col='red')
xspline(x=x, y=dnorm(x,, sd=sd(sample.2)), 
        lwd=1, shape=1, lty=2, border="red")