Two-Sample t-test

r statistical-tests statistics t-test

This type of $t$-test is used when we want to compare the means of two different samples

suppose that we have two samples $a$ and $b$ of sizes $n_a$ and $n_b$ resp.
we’re interested in inferring something about $\mu_a - \mu_b$
Point Estimate in this case is $\bar{X}_a - \bar{X}_b$
Standard Error is $\text{SE}_{\bar{X}_a - \bar{X}_b} = \sqrt{\text{SE}_a + \text{SE}_b } = \sqrt{ s^2_a / n_a + s^2_b / n_b}$
- because $\text{SE}^2_{\bar{X}_a - \bar{X}_b} = \text{var}[\bar{X}_a - \bar{X}_b] = \text{var}[x_a] + \text{var}[x_b] = \text{SE}^2_a + \text{SE}^2_b$

The test is of the following form

$H_0: \mu_a = \mu_b$, or $H_0: \mu_a - \mu_b = 0$
$H_A: \mu_a \neq \mu_b$ or $H_A: \mu_a - \mu_b \neq 0$ (two-sided, can also be $<$ or $>$)

So, test statistics:

$T = \cfrac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}}$
$T \approx t_{\text{df}}$
$\text{df}$ depends on a few things, discussed below

What is $\text{df}$ there?

Welch-Satterthwaite Approximation for df is
$\text{df} = \cfrac{( s_1^2 / n_1 + s_2^2 / n_2 )^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }$

This can be a non-integer value, but that’s fine

Can we “pool” the samples?
Yes, but only under assumption that $\sigma_1^2 = \sigma_2^2$ (in other words, we assume that the variances are equal)

We can replace $s_1^2$ and $s_2^2$ by the ‘‘pooled variance’’:

Example

We then calculate

We have the following test

$p$-value:

$P(| \bar{X}1 - \bar{X}_2 | \geqslant 4.2 ) = $ |- $P \left( \left| \cfrac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} \right| \geqslant \cfrac{4.2}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} \right) \approx $ |- $P\left( | t\text{df} | \geqslant \cfrac{4.2}{\sqrt{181.5 / 281 + 231 / 119}} \right) = 0.0097$ | pretty small, so we reject the $H_0$.

Life expectancy in E.Asia and Pacific vs S.Asia

We then calculate

Our test:

$p$-value:

$P(| \bar{X}1 - \bar{X}_2 | \geqslant 6.1 ) = $ |- $P \left( \left| \cfrac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} \right| \geqslant \cfrac{6.1}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}} \right) \approx $ |- $P ( | t\text{df} | \geqslant 1.90 ) \approx 0.09$ | Not so small - we can’t reject the $H_0$, it might be true that $\mu_0 = \mu_1$

```text only male = skeletons[sex == ‘1’, 6] female = skeletons[sex == ‘2’, 6]

text only t.test(male, female, mu=0, conf.level=0.95, alternative='two.sided')

✏️ Edit on GitHub