$Z$ Tests

This is a family of statistical tests that use Normal Model to compute test statistics

  • most of the time $Z$ tests are restricted versions of $t$-tests, so it's more advisable to use $t$ tests, especially because for larger degrees of freedom $t$-distribution is very close to Normal


The following are $z$ tests

  • one-sample $z$-test - for comparing the mean of a sample against some given mean
  • paired $z$-test - for Matching Pairs setup
  • two-sample $z$-test - for comparing the means of two samples


Assumptions

  • Observations are independent (if less than 10% of population is sampled, then we can make sure it's satisfied)
  • Sample size is sufficiently large so C.L.T. holds
  • Moderate skew, few outliers (not too extreme)

If these assumptions are hold, then we can use the $z$ statistics

  • if sample size is smaller, then it's better to use $t$-tests
  • if the distribution has skews and outliers, use simulations TODO: add link !!!!!!!!
  • but in any case the observations have to be independent


One-Sample $z$ Test

Example 1: One-Sided

Assume we have the following

Sample: 110 students, conditions are met:

  • less than 10% of all students are sampled,
  • sufficiently large $\geqslant 30$
  • only a couple of outliers, which is acceptable for sample size $n=110$


So we can apply the Normal Model and do the following test:

  • $H_0: \mu = 7$ - students sleep only 7 hours on avg, $H_A: \mu > 7$ students sleep more than 7 hours on avg


Calculate

  • the sample mean: $\bar{x} = 7.42$
  • the Standard Error: $\text{SE}_{\bar{x}} = \cfrac{s_x}{\sqrt{n}} = \cfrac{1.75}{110} = 0.17$

$Z$ score:

  • then compute the $Z$ score: $Z = \cfrac{x - \text{null value}}{\text{SE}_{\bar{x}}} = \cfrac{7.42 - 7}{0.17} = 2.47$
  • then calculate the $p$-value for this test statistics


so, under $H_0$ the probability of observing such $\bar{x}$ is just $p = 0.007$

  • our level of significance is $\alpha 0.05$, we compare $\alpha$ and $p$:
  • $p = 0.007 < 0.05 = \alpha$,
  • $\Rightarrow$ we reject $H_0$ in favor of $H_A$: what we observe is so unusual under $H_0$ which casts a doubt on $H_0$ and provides strong evidence to $H_A$
  • so we reject $H_0$ and conclude that on average students sleep more than 7 hours


Other $z$-tests

Paired $z$-test

Analogously to Paired $t$-test, we can use $z$ statistics to analyze matched pairs data, provided that the sample size is sufficiently large

Example:

  • two samples: local bookshop and amazon
  • $\mu_\text{dif} = \mu_l - \mu_a$ - the mean of difference in the price
  • $H_0: \mu_\text{dif} = 0$ - there's no difference in the price
  • $H_A: \mu_\text{dif} \ne 0$ - there's some difference
  • $\bar{x}_\text{dif} = 12.76$
  • Standard Error: $\text{se}_{\bar{x}_\text{dif}} = \cfrac{s_\text{dif}}{\sqrt{n_\text{dif}}} = 1.67$
  • $Z = \cfrac{\bar{x}_\text{dif}}{\text{se}_{\bar{x}_\text{dif}}} = \cfrac{12.76}{1.67} = 7.59$
  • this is too large $z$ score, but let's calculate the $p$-value
  • $p = 0.00004$, less than $\alpha = 0.05$, so we reject $H_0$


library(openintro)
data(textbooks)

hist(textbooks$diff, col='yellow')

n = length(textbooks$diff)
s = sd(textbooks$diff)
se = s / sqrt(n)

x.bar.nul = 0
x.bar.dif = mean(textbooks$diff)

z = (x.bar.dif - x.bar.nul) / se
z

p = pnorm(z, mean=x.bar.nul, sd=se, lower.tail=F) * 2
p



Sources