We want to build a Confidence Interval for a Point Estimate of the population mean
With sufficiently large sample and no violations of the assumptions, we can use Normal Distribution to model the Sampling Distribution of mean
Normal approximation is crucial for this - because we use Normal Distribution to find percentiles
If these conditions are met, can use Normal Model to find the confidence intervals
We have this data set that contains data about the whole population
Suppose we take 10k samples
load(url('http://s3.amazonaws.com/assets.datacamp.com/course/dasi/ames.RData')) population = ames$Gr.Liv.Area oldpar = par(no.readonly=TRUE) # fig=c(x1, x2, y1, y2) par(fig=c(0, 1, 0, 1)) par(mar=c(6, 2, 2, 1)) h = hist(population, col='grey', probability=T, axes=F, xlab='', main='Histogram') dens = density(population, adjust=2) lines(dens, col="black", lwd=2) axis(side=1, pos=c(-max(h$density)/4,0)) axis(side=2) par(fig=c(0, 1, 0.16, 0.41), new=TRUE) par(mar=c(0, 2, 0, 1)) boxplot(population, horizontal=TRUE, axes=F, col='grey') par(oldpar) set.seed(1237) n = 50 samp.mean = replicate(10000, mean(sample(population, n))) plot(x=NA, y=NA, xlim=c(1250, 1750), ylim=c(0, 0.006), axes=F, xlab='Estimate of mean', ylab='Frequency', main='Sampling Distribution of mean') m = mean(samp.mean) s = sd(samp.mean) par(xpd=FALSE) rect(xleft=m-3*s, xright=m+3*s, ybottom=-1, ytop=1, border=NA, col=adjustcolor('blue', 0.1)) rect(xleft=m-2*s, xright=m+2*s, ybottom=-1, ytop=1, border=NA, col=adjustcolor('blue', 0.1)) rect(xleft=m-s, xright=m+s, ybottom=-1, ytop=1, border=NA, col=adjustcolor('blue', 0.1)) hist(samp.mean, probability=T, add=T, breaks=50, col='white') axis(side = 1) dens = dnorm(1200:1800, mean=m, sd=s) lines(1200:1800, dens, col="blue", lwd=2) qqnorm(samp.mean, col=adjustcolor('orange', 0.1)) qqline(samp.mean)
In this case all the assumptions hold - can use the Normal Approximation to calculate the confidence intervals
So the 95% CI with $z$-score is:
To compute a CI we need to know $\sigma^2$, but it's a parameter - we need to estimate it
To use normal approximation we need a sufficiently large sample
$t$-distribution
Use the $t$- distribution rather than the normal distribution when
Shape of $t$ vs Normal:
Thus
95% CI becomes
In R:
n = 60 xbar = mean(d) v = var(d) t = qt(0.025, df=n-1, lower.tail=F) ME = t * sqrt(v / n) xbar + c(-ME, ME)
or:
t.test(d, conf.int=0.95)$confint
The last chuck actually uses $t$-test and returns its confidence interval
Example1:
xbar = 20 v = 5 t = qt(0.005, df=50, lower.tail=F) ME = t * sqrt(v / 50) xbar + c(-ME, ME) // [19.16, 20.84]
Normal-distr based
t-based CI for mean