$Z$ Tests
This is a family of statistical tests that use Normal Model to compute test statistics
- most of the time $Z$ tests are restricted versions of $t$-tests, so it's more advisable to use $t$ tests, especially because for larger degrees of freedom $t$-distribution is very close to Normal
The following are $z$ tests
- one-sample $z$-test - for comparing the mean of a sample against some given mean
- paired $z$-test - for Matching Pairs setup
- two-sample $z$-test - for comparing the means of two samples
Assumptions
- Observations are independent (if less than 10% of population is sampled, then we can make sure it's satisfied)
- Sample size is sufficiently large so C.L.T. holds
- Moderate skew, few outliers (not too extreme)
If these assumptions are hold, then we can use the $z$ statistics
- if sample size is smaller, then it's better to use $t$-tests
- if the distribution has skews and outliers, use simulations TODO: add link !!!!!!!!
- but in any case the observations have to be independent
One-Sample $z$ Test
Example 1: One-Sided
Assume we have the following
Sample: 110 students, conditions are met:
- less than 10% of all students are sampled,
- sufficiently large $\geqslant 30$
- only a couple of outliers, which is acceptable for sample size $n=110$
So we can apply the Normal Model and do the following test:
- $H_0: \mu = 7$ - students sleep only 7 hours on avg, $H_A: \mu > 7$ students sleep more than 7 hours on avg
Calculate
- the sample mean: $\bar{x} = 7.42$
- the Standard Error: $\text{SE}_{\bar{x}} = \cfrac{s_x}{\sqrt{n}} = \cfrac{1.75}{110} = 0.17$
$Z$ score:
- then compute the $Z$ score: $Z = \cfrac{x - \text{null value}}{\text{SE}_{\bar{x}}} = \cfrac{7.42 - 7}{0.17} = 2.47$
- then calculate the $p$-value for this test statistics
so, under $H_0$ the probability of observing such $\bar{x}$ is just $p = 0.007$
- our level of significance is $\alpha 0.05$, we compare $\alpha$ and $p$:
- $p = 0.007 < 0.05 = \alpha$,
- $\Rightarrow$ we reject $H_0$ in favor of $H_A$: what we observe is so unusual under $H_0$ which casts a doubt on $H_0$ and provides strong evidence to $H_A$
- so we reject $H_0$ and conclude that on average students sleep more than 7 hours
Other $z$-tests
Paired $z$-test
Analogously to Paired $t$-test, we can use $z$ statistics to analyze matched pairs data, provided that the sample size is sufficiently large
Example:
- two samples: local bookshop and amazon
- $\mu_\text{dif} = \mu_l - \mu_a$ - the mean of difference in the price
- $H_0: \mu_\text{dif} = 0$ - there's no difference in the price
- $H_A: \mu_\text{dif} \ne 0$ - there's some difference
- $\bar{x}_\text{dif} = 12.76$
- Standard Error: $\text{se}_{\bar{x}_\text{dif}} = \cfrac{s_\text{dif}}{\sqrt{n_\text{dif}}} = 1.67$
- $Z = \cfrac{\bar{x}_\text{dif}}{\text{se}_{\bar{x}_\text{dif}}} = \cfrac{12.76}{1.67} = 7.59$
- this is too large $z$ score, but let's calculate the $p$-value
- $p = 0.00004$, less than $\alpha = 0.05$, so we reject $H_0$
library(openintro)
data(textbooks)
hist(textbooks$diff, col='yellow')
n = length(textbooks$diff)
s = sd(textbooks$diff)
se = s / sqrt(n)
x.bar.nul = 0
x.bar.dif = mean(textbooks$diff)
z = (x.bar.dif - x.bar.nul) / se
z
p = pnorm(z, mean=x.bar.nul, sd=se, lower.tail=F) * 2
p
Sources