# ML Wiki

## Biased Estimators

A Point Estimate is biased if

• the Sampling Distribution of some parameter being estimated is not centered around the true parameter value
• otherwise a Point Estimate is unbiased

Bias of an estimate is the expected difference between the estimated value and the true value

## Unbiased Estimation

A statistic used to estimate a parameter is unbiased if the expected value of its sampling distribution is equal to the value of the parameter being estimated

### Proportion

In our coin flipping example

For the entire experiment:

• 10 coin flips = 10 Bernoulli experiments with outcomes $X_1, ..., X_{10}$
• so, $\hat{p} = \cfrac{X_1 + ... + X_{10}}{10} = \bar{X}$
• thus, $E(\hat{p}) = p$ since $E(X_i) = p$ and $E(\bar{X}) = \cfrac{10 p}{10} = p$
• and $\hat{p}$ is called unbiased estimator

## Biased Estimation

### Standard Deviation

Standard Deviation is biased estimate of the true standard deviation of the proportion

• so we typically use the sample standard deviation, which is
• $s = \cfrac{1}{n-1} \sum_{i=1}^n x_i$

Can simulate it to see that it's true

• suppose that we have the following population
• • we sample with sample size 25 many times (e.g. 5000)
• each time calculate biased std as well as corrected std
• then plot the sampling distributions
• • we see that the corrected std is closer to the real population std
• note that the real population std should not be corrected!

R simulation
sd.population = function(x) {
n = length(x)
m = mean(x)
sqrt(sum((x - m) ^ 2) / n)
}

population = unlist(sapply(X=1:7, FUN=function(x) { rep(x, choose(8, x)) }))
pop = table(population)
b = barplot(pop)
text(x=b, y=pop-4, pop)

set.seed(1231)
sample.1 = rep(NA, 5000)
sample.2 = rep(NA, 5000)

size = 25

for (i in 1:5000) {
s = sample(population, size)
sample.1[i] = sd(s)
sample.2[i] = sd.population(s)
}

true.pop = sd.population(population)
biased.center = mean(sample.2)
center = mean(sample.1)

c(true.pop, center, biased.center)
c(abs(true.pop - center), abs(true.pop - biased.center))

x = seq(0, 3, 0.1)

probability=T, xlim=c(0.8, 1.9),
main='Sampling Distributions of STD functions',
xlab='Estimated Value')
abline(v=center, col='blue')
xspline(x=x, y=dnorm(x, mean=center, sd=sd(sample.1)),
lwd=1, shape=1, lty=2, border="blue")