ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Central Limit Theorem

Central Limit Theorem

C.L.T. explains why Normal Distribution is so widespread

Experiments

Sampling Distribution

C.L.T. allows us to assume that Sampling Distributions approach Normal as the sample size grows

  • we want to show that experimentally for the SD of Mean

Assume we want to sample from 3 distributions:

There are 3 various degrees of skewness in these distributions

Uniform:

  • Image
  • Image

Lognormal:

  • Image
  • Image

Exponential

  • Image
  • Image
R code of the experiment ``` default.par = par() set.seed(18213) x = seq(-0.1, 4.1, 0.1) yn = dlnorm(x, meanlog=0.1, sdlog=0.5) yu = dunif(x, min=0, max=4) ye = dexp(x) plot(x, yn, type='l', ylim=c(0, 1), col="orange", lwd=2, main='the distributions from which we sample') lines(x, yu, col="blue", lwd=2) lines(x, ye, col="red", lwd=2) m = 3000 generate = function(m, FUN, main, xlim, ylim, breaks=13) { sd.x = replicate(m, mean(FUN())) par(mfcol=c(1,2)) hist(sd.x, breaks=breaks, prob=T, main='', xlim=xlim, ylim=ylim) x = seq(min(sd.x), max(sd.x), 0.01) y = dnorm(x=x, mean=mean(sd.x), sd=sd(sd.x)) lines(x=x, y=y, col="blue", lwd=2) dens = density(sd.x, adjust=2) lines(dens, col="red", lwd=2) qqnorm(sd.x, col="orange", pch=19, main='') qqline(sd.x, lwd=2) mtext(main, side=3, outer=TRUE, line=-3) par(mfcol=c(1,1)) } gen.uniform = function(n) { function() { runif(n, min=0, max=4) } } gen.lnorm = function(n) { function() { rlnorm(n, meanlog=0.1, sdlog=0.5) } } gen.exp = function(n) { function() { rexp(n) } } require(animation) n.vec = c(1:20, 50) saveGIF({ for (n in n.vec) { generate(m, gen.uniform(n), xlim=c(0,4), ylim=c(0, 1.4), paste('Uniform Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:40, 100) saveGIF({ for (n in n.vec) { generate(m, gen.lnorm(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Lognormal Distribution, sample size = ', n)) } }, interval=0.3) n.vec = c(1:50, 100) saveGIF({ for (n in n.vec) { generate(m, gen.exp(n), xlim=c(0,3), ylim=c(0, 1.8), paste('Exponential Distribution, sample size = ', n)) } }, interval=0.3) generate(m, gen.uniform(n), xlim=c(1.5,2.5), ylim=c(0, 4), paste('Uniform Distribution, sample size = ', n)) generate(m, gen.lnorm(n), xlim=c(1,1.5), ylim=c(0, 6), paste('Lognormal Distribution, sample size = ', n)) generate(m, gen.exp(n), xlim=c(0.5,1.5), ylim=c(0, 4), paste('Exponential Distribution, sample size = ', n)) par(default.par) ```

See also here

Theorem (Lyapunov)

If a random variable $X$ represents the sum of a very large number of mutually independent random variables, each of which has a negligible influence on the entire sum, then $X$ has a distribution close to normal.

TODO: proof

Application

Let $X_i$ be a sequence of independent random variables, each having an expected value and variance:

$\mathbb{E}[X_i] = a_i, \text{Var}(X_i) = b_i^2$

  • Introduce the notation $S_n = X_1 + … + X_n$ $A_n = \sum_{i = 1}^{n} a_i$ $B^2 = \sum_{i = 1}^{n} b_i^2$
  • Then $F_n(X) = P\left(\frac{S_n - A_n}{B_n} < x\right)$ is the distribution function of the normalized sum

The central limit theorem is applicable to the sequence $X_i$ if

$\lim_{n \rightarrow \infty} P\left(\frac{S_n - A_n}{B_n} < x\right) = \frac{1}{\sqrt{2\Pi}} \int_{-\infty}^{x} e^{-z^2/2} dz $

See Also

Sources