ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Confidence Intervals

Confidence Intervals

In Inferential Statistics we estimate a parameter of the population based on sample

  • Point Estimate is just one single plausible value
  • it’s a good idea to expand it a bit and build a confidence interval around the point estimate
  • and use Standard Error as a measure of uncertainty in the Point Estimate to find this interval

Main idea - the CI should include the real parameter

Confidence Level

The degree of confidence at which we’re sure the interval will span the true parameter is ‘‘Confidence level’’

  • e.g. 95% confidence interval contains the estimated parameter with probability 0.95 - i.e. in 1 case out of 20 it will miss the real parameter

The idea of Sampling Distribution is important here

  • we use it to calculate percentiles of the possible values, if the SD was centered at our point estimate
  • so the SI should span the true value

Example

  • we want to estimate the mean
  • suppose we happen to know the sampling distribution: it’s $N(\mu = 10, \sigma = 3.3)$
    • it’s centered around the proportion mean $\mu$
    • and the Standard Error is 3.3
  • we draw a Point Estimate from the sampling distribution
    • we get $\bar{X} = 5.5$
  • Assuming that the SD is centered around 5.5, we compute 95% CI
    • $z$-value is 1.96, so the interval is (-0.97 11.97)
  • it includes the true value $\mu=10$

Image

R code ```carbon x = seq(-10, 25, 0.3) m = 10 se = 3.3 plot(x, dnorm(x, mean=m, sd=se), type='l', bty='n', lty=2, ylab='') abline(v=m, lty=2) m.observed = 5.5 abline(v=m.observed, col='red') dy = dnorm(x, mean=5.5, sd=se) lines(x, y=dy, col='red') lo = m.observed - 1.96 * se hi = m.observed + 1.96 * se c(lo, hi) x1 = min(which(x >= lo)); x2 = max(which(x <= hi)) polygon(x[c(x1, x1:x2, x2)], c(0, dy[x1:x2], 0), col=adjustcolor('red', 0.4), border=NA) par(xpd=NA) text(m, 0.13, m) text(m.observed, 0.13, m.observed) arrows(x0=lo, y0=0.02, x1=hi, y1=0.02, code=3, length=0.15) text(m.observed, 0.02-0.005, 'confidence interval', cex=0.7) par(xpd=FALSE) ```

A confidence interval consists of two parts

  • left part - ‘‘lower bound’’
  • right part - ‘‘upper bound ‘’

“95% confident” means that if we took many many samples from the SD and build a CI from each, then about 95% of these CIs should contain the actual parameter being estimated (e.g. $p$ for binom, $\mu$ for mean)

Image

So we see indeed that sometimes the CI doesn’t include the true value but we’re 95% confident that a CI calculated from one sample will include it

R code to produce the figure ```gdscript load(url('http://s3.amazonaws.com/assets.datacamp.com/course/dasi/ames.RData')) population = ames$Gr.Liv.Area set.seed(1237) n = 50 sampl = replicate(51, sample(population, n)) sampl.sd = apply(sampl, MARGIN=2, sd) sampl.m = apply(sampl, MARGIN=2, mean) me = 1.96 * sampl.sd / sqrt(n) plot_ci(sampl.m - me, sampl.m + me, mean(population)) ```

Margin Of Error

If the Sampling Distribution is symmetric (e.g. Normal Distribution or t-Distribution) we can calculate the CI bounds by adding and subtracting the ‘‘margin of error’’

  • ’'’margin of error’’’ is typically percentile ($z$ or $t$ score) multiplied by Standard Error

Critical Value

Critical Value shows the level of confidence in our interval

  • for $\alpha = 0.025$ CI is 90%

Types

Main types:

Statistical Simulation

Not always it’s possible to calculate everything with traditional methods

Extra Stuff

Robustness

A method for constructing CIs is ‘‘robust’’ if

  • the resulting CIs include the theoretical parameter approximately the percentage claimed by the confidence level
  • even if not all necessary conditions for the CIs are satisfied

$t$-distribution is very robust and works well for the Normal Distribution as well as for skewed distributions

Relationship with Hypothesis Testing

Additional Resources

See Also

Sources