Central Limit Theorem
C.L.T. explains why Normal Distribution is so widespread
- when values of a Random Variable are the results of a big number of independent Random Variables with limited Variances
- then the Distribution of this RV is Normal Distribution
Experiments
Sampling Distribution
C.L.T. allows us to assume that Sampling Distributions approach Normal as the sample size grows
- we want to show that experimentally for the SD of Mean
Assume we want to sample from 3 distributions:

- Uniform Distribution (blue line)
- Lognormal Distribution (orange line)
- Exponential Distribution (red line)
There are 3 various degrees of skewness in these distributions
Uniform:
Lognormal:
Exponential
R code of the experiment
``` default.par = par() set.seed(18213) x = seq(-0.1, 4.1, 0.1) yn = dlnorm(x, meanlog=0.1, sdlog=0.5) yu = dunif(x, min=0, max=4) ye = dexp(x) plot(x, yn, type='l', ylim=c(0, 1), col="orange", lwd=2, main='the distributions from which we sample') lines(x, yu, col="blue", lwd=2) lines(x, ye, col="red", lwd=2) m = 3000 generate = function(m, FUN, main, xlim, ylim, breaks=13) { sd.x = replicate(m, mean(FUN())) par(mfcol=c(1,2)) hist(sd.x, breaks=breaks, prob=T, main='', xlim=xlim, ylim=ylim) x = seq(min(sd.x), max(sd.x), 0.01) y = dnorm(x=x, mean=mean(sd.x), sd=sd(sd.x)) lines(x=x, y=y, col="blue", lwd=2) dens = density(sd.x, adjust=2) lines(dens, col="red", lwd=2) qqnorm(sd.x, col="orange", pch=19, main='') qqline(sd.x, lwd=2) mtext(main, side=3, outer=TRUE, line=-3) par(mfcol=c(1,1)) } gen.uniform = function(n) { function() { runif(n, min=0, max=4) } } gen.lnorm = function(n) { function() { rlnorm(n, meanlog=0.1, sdlog=0.5) } } gen.exp = function(n) { function() { rexp(n) } } require(animation) n.vec = c(1:20, 50) saveGIF({ for (n in n.vec) { generate(m, gen.uniform(n), xlim=c(0,4), ylim=c(0, 1.4), paste('Uniform Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:40, 100) saveGIF({ for (n in n.vec) { generate(m, gen.lnorm(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Lognormal Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:50, 100) saveGIF({ for (n in n.vec) { generate(m, gen.exp(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Exponential Distribution, sample size = ', n)) } }, interval=0.3) generate(m, gen.uniform(n), xlim=c(1.5,2.5), ylim=c(0, 4), paste('Uniform Distribution, sample size = ', n)) generate(m, gen.lnorm(n), xlim=c(1,1.5), ylim=c(0, 6), paste('Lognormal Distribution, sample size = ', n)) generate(m, gen.exp(n), xlim=c(0.5,1.5), ylim=c(0, 4), paste('Exponential Distribution, sample size = ', n)) par(default.par) ```- this is taken from OpenIntro, figure 4.20
Theorem (Lyapunov)
If a random variable $X$ represents the sum of a very large number of mutually independent random variables, each of which has a negligible influence on the entire sum, then $X$ has a distribution close to normal.
TODO: proof
Application
Let $X_i$ be a sequence of independent random variables, each having an expected value and variance:
$\mathbb{E}[X_i] = a_i, \text{Var}(X_i) = b_i^2$
- Introduce the notation $S_n = X_1 + … + X_n$ $A_n = \sum_{i = 1}^{n} a_i$ $B^2 = \sum_{i = 1}^{n} b_i^2$
- Then $F_n(X) = P\left(\frac{S_n - A_n}{B_n} < x\right)$ is the distribution function of the normalized sum
The central limit theorem is applicable to the sequence $X_i$ if
$\lim_{n \rightarrow \infty} P\left(\frac{S_n - A_n}{B_n} < x\right) = \frac{1}{\sqrt{2\Pi}} \int_{-\infty}^{x} e^{-z^2/2} dz $





