## Q-Q Plot

### Probability Plot

A Probability plot is a technique for comparing two data sets

- e.g. two empirical observations
- or empirical set vs theoretical set

Commonly used:

- P-P plot, "Probability-Probability" or "Percent-Percent" plot;
- Q-Q plot, "Quantile-Quantile" plot, which is more commonly used.

### Normal Probability Plot

It's a special case of Q-Q plots:

- a Q-Q plot against the standard normal distribution;

The normal probability plot is formed by:

- Vertical axis: Ordered response values
- Horizontal axis: Normal order statistic medians or means (see rankit [1])

Constructing

- order the observations
- determine the percentile for each
- identify the $z$-score for each percentile
- create a Scatterplot
- observation (vertical) vs
- $z$-score (horizontal)

if the data is normally distributed, $z$-scores on the horizontal axis should approximately correspond to their percentiles

### Example 1

Evaluating the Normal Distribution (see [2])

load(url("http://www.openintro.org/stat/data/bdims.RData"))
fdims = subset(bdims, bdims$sex == 0)
qqnorm(fdims$hgt, col="orange", pch=19)
qqline(fdims$hgt, lwd=2)

Does it look similar to real Normal Distribution?

- it does
- let's simulate the normal distribution and compare

set.seed(123)
sim.norm = rnorm(n=length(fdims$hgt), mean=mean(fdims$hgt), sd=sd(fdims$hgt))
qqnorm(sim.norm, col="orange", pch=19, main="Normal Q-Q Plot of simulated data")
qqline(sim.norm, lwd=2)

Can try to plot several simulations

qqnormsim = function(dat, dim=c(2,2)) {
par(mfrow=dim)
qqnorm(dat, main="Normal QQ Plot (Data)")
qqline(dat)
for (i in 1:(prod(dim) - 1)) {
simnorm <- rnorm(n=length(dat), mean=mean(dat), sd=sd(dat))
qqnorm(simnorm, main = "Normal QQ Plot (Sim)")
qqline(simnorm)
}
par(mfrow=c(1, 1))
}
qqnormsim(fdims$hgt)

Looks like it's indeed normal

### Example 2

(Same data set as in example 1)

Let's take a look at another dataset

hist(fdims$wgt)

Looks a bit skewed

qqnorm(fdims$wgt, col="orange", pch=19)
qqline(fdims$wgt, lwd=2)

qqnormsim(fdims$wgt)

Most likely not normal

## Sources