Central Limit Theorem

probability r statistics

Central Limit Theorem

C.L.T. explains why Normal Distribution is so widespread

when values of a Random Variable are the results of a big number of independent Random Variables with limited Variances
then the Distribution of this RV is Normal Distribution

Experiments

Sampling Distribution

C.L.T. allows us to assume that Sampling Distributions approach Normal as the sample size grows

we want to show that experimentally for the SD of Mean

Assume we want to sample from 3 distributions:

There are 3 various degrees of skewness in these distributions

Uniform:

Lognormal:

Exponential

R code of the experiment

``` default.par = par() set.seed(18213) x = seq(-0.1, 4.1, 0.1) yn = dlnorm(x, meanlog=0.1, sdlog=0.5) yu = dunif(x, min=0, max=4) ye = dexp(x) plot(x, yn, type='l', ylim=c(0, 1), col="orange", lwd=2, main='the distributions from which we sample') lines(x, yu, col="blue", lwd=2) lines(x, ye, col="red", lwd=2) m = 3000 generate = function(m, FUN, main, xlim, ylim, breaks=13) { sd.x = replicate(m, mean(FUN())) par(mfcol=c(1,2)) hist(sd.x, breaks=breaks, prob=T, main='', xlim=xlim, ylim=ylim) x = seq(min(sd.x), max(sd.x), 0.01) y = dnorm(x=x, mean=mean(sd.x), sd=sd(sd.x)) lines(x=x, y=y, col="blue", lwd=2) dens = density(sd.x, adjust=2) lines(dens, col="red", lwd=2) qqnorm(sd.x, col="orange", pch=19, main='') qqline(sd.x, lwd=2) mtext(main, side=3, outer=TRUE, line=-3) par(mfcol=c(1,1)) } gen.uniform = function(n) { function() { runif(n, min=0, max=4) } } gen.lnorm = function(n) { function() { rlnorm(n, meanlog=0.1, sdlog=0.5) } } gen.exp = function(n) { function() { rexp(n) } } require(animation) n.vec = c(1:20, 50) saveGIF({ for (n in n.vec) { generate(m, gen.uniform(n), xlim=c(0,4), ylim=c(0, 1.4), paste('Uniform Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:40, 100) saveGIF({ for (n in n.vec) { generate(m, gen.lnorm(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Lognormal Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:50, 100) saveGIF({ for (n in n.vec) { generate(m, gen.exp(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Exponential Distribution, sample size = ', n)) } }, interval=0.3) generate(m, gen.uniform(n), xlim=c(1.5,2.5), ylim=c(0, 4), paste('Uniform Distribution, sample size = ', n)) generate(m, gen.lnorm(n), xlim=c(1,1.5), ylim=c(0, 6), paste('Lognormal Distribution, sample size = ', n)) generate(m, gen.exp(n), xlim=c(0.5,1.5), ylim=c(0, 4), paste('Exponential Distribution, sample size = ', n)) par(default.par) ```

Theorem (Lyapunov)

If a random variable $X$ represents the sum of a very large number of mutually independent random variables, each of which has a negligible influence on the entire sum, then $X$ has a distribution close to normal.

TODO: proof

Application

Let $X_i$ be a sequence of independent random variables, each having an expected value and variance:

$\mathbb{E}[X_i] = a_i, \text{Var}(X_i) = b_i^2$

Introduce the notation $S_n = X_1 + … + X_n$ $A_n = \sum_{i = 1}^{n} a_i$ $B^2 = \sum_{i = 1}^{n} b_i^2$
Then $F_n(X) = P\left(\frac{S_n - A_n}{B_n} < x\right)$ is the distribution function of the normalized sum

The central limit theorem is applicable to the sequence $X_i$ if

$\lim_{n \rightarrow \infty} P\left(\frac{S_n - A_n}{B_n} < x\right) = \frac{1}{\sqrt{2\Pi}} \int_{-\infty}^{x} e^{-z^2/2} dz $

Sources

✏️ Edit on GitHub

Central Limit Theorem

Central Limit Theorem

Experiments

Sampling Distribution

Theorem (Lyapunov)

Application

See Also

Sources