In Inferential Statistics we estimate a parameter of the population based on sample
Main idea - the CI should include the real parameter
The degree of confidence at which we're sure the interval will span the true parameter is Confidence level
The idea of Sampling Distribution is important here
Example
x = seq(-10, 25, 0.3) m = 10 se = 3.3 plot(x, dnorm(x, mean=m, sd=se), type='l', bty='n', lty=2, ylab='') abline(v=m, lty=2) m.observed = 5.5 abline(v=m.observed, col='red') dy = dnorm(x, mean=5.5, sd=se) lines(x, y=dy, col='red') lo = m.observed - 1.96 * se hi = m.observed + 1.96 * se c(lo, hi) x1 = min(which(x >= lo)); x2 = max(which(x <= hi)) polygon(x[c(x1, x1:x2, x2)], c(0, dy[x1:x2], 0), col=adjustcolor('red', 0.4), border=NA) par(xpd=NA) text(m, 0.13, m) text(m.observed, 0.13, m.observed) arrows(x0=lo, y0=0.02, x1=hi, y1=0.02, code=3, length=0.15) text(m.observed, 0.02-0.005, 'confidence interval', cex=0.7) par(xpd=FALSE)
A confidence interval consists of two parts
"95% confident" means that if we took many many samples from the SD and build a CI from each, then about 95% of these CIs should contain the actual parameter being estimated (e.g. $p$ for binom, $\mu$ for mean)
So we see indeed that sometimes the CI doesn't include the true value but we're 95% confident that a CI calculated from one sample will include it
load(url('http://s3.amazonaws.com/assets.datacamp.com/course/dasi/ames.RData')) population = ames$Gr.Liv.Area set.seed(1237) n = 50 sampl = replicate(51, sample(population, n)) sampl.sd = apply(sampl, MARGIN=2, sd) sampl.m = apply(sampl, MARGIN=2, mean) me = 1.96 * sampl.sd / sqrt(n) plot_ci(sampl.m - me, sampl.m + me, mean(population))
If the Sampling Distribution is symmetric (e.g. Normal Distribution or t-Distribution) we can calculate the CI bounds by adding and subtracting the margin of error
Critical Value shows the level of confidence in our interval
Main types:
Not always it's possible to calculate everything with traditional methods
A method for constructing CIs is robust if
$t$-distribution is very robust and works well for the Normal Distribution as well as for skewed distributions