Simplest ANOVA
It's an $F$-Test
It's a parametric test of Variance:
- it's parametric because it's based on Normality hypothesis
Comparing Means
Comparing means of several groups
- We can compare means of two groups using Two-Sample $t$-test
- But sometimes we want to compare means across many groups
First idea: do Pairwise comparison
- but we may find the difference just by chance, even when there's no difference - because there'll be too many comparisons
- need to do some correction
Test Of Independence
One-Way ANOVA can be used to analyze the relationships between two variables
- numerical and categorical
- we group the numerical one by associated categories
- and find the means of each category
- then we test if the means are the same
- if yes - these two variables are independent
Post-ANOVA Comparison
suppose we reject $H_0$
The One-Way ANOVA Test
ANOVA: compare many means in a single hypothesis
- are the means of several groups equal?
- so it generalizes the $t$-test to more than two groups
- used in Bivariate Analysis to test if a numerical and categorical variables are independent
- assessing the variability of the group mean relative to variability among individual observations
- the questions answered: "is the variability on the sample means so large that it seems unlikely to be entirely due to chance?"
General Idea:
- simultaneously consider many groups
- evaluate if the sample means differ more than we'd expect from natural variation
- observations are independent
- data within each group is distributed normally
- variability across the groups is approximately equal
- $H_0: \mu_1 = \mu_2 = ... = \mu_k$
- $H_A:$ at least one $\mu_i$ is different from the rest
Estimating variability:
- this variability is called MSG: mean square between groups
- $\text{df}_G = k - 1$ for $k$ groups
- let $\bar{x}$ be the sample mean across all groups
- $\text{MSG} = \cfrac{1}{\text{df}_G} \cdot \text{SSG} = \cfrac{1}{k - 1} \sum_{i=1}^k n_i (\bar{x}_i - \bar{x})^2$
- $\text{SSG}$ - sum of squares between groups
- $n_i$ - sample size of group $i$
this gives us a base point, and we compare it with $\text{MSE}$:
$\text{MSE}$: mean square error (pooled variance estimate)
- $\text{df}_E = n - k$
- $\text{SST} = \sum_{i=1}^n (x_i - \bar{x})^2$
- sum of squares total
- this is calculated over all observations in the data set
- $\text{SSE} = \text{SST} - \text{SSG} = (n_1 - 1) \cdot s^2_1 + (n_2 - 1) \cdot s^2_2 + ... + (n_k - 1) \cdot s^2_k$
- $s^2_i$ - sample variance of residuals in the group
sum of squared error
standardized form of $\text{SSE}$: $\text{MSE} = 1 / \text{df}_E \text{SSE}$
if $H_0$ is true then differences are due to chance and MSG and MSE should be approximately equal
Then we can calculate the test statistics
$F = \cfrac{\text{MSG}}{\text{MSE}}$
- $\text{MSG}$ = between the group variability
- $\text{MSE}$ = withing the group variability
$F$ is a $F$ statistics that follows $F$-distribution
it has 2 associated parameters: $\text{df}_1$ and $\text{df}_2 $
for ANOVA it's $\text{df}_G$ and $\text{df}_E$
the larger $\text{MSG}$ relative to $\text{MSE}$, the larger $F$ is, and the stronger evidence against $H_0$
We use upper tail to compute the $p$-value
- $X$ with $k$ possible values $x_1, ..., x_k$ - categorical variable (e.g. Job)
- and $Y$ - continuous variable (e.g. Age)
- $\mu = \text{mean}(Y)$: Mean value of $Y$
- $\mu_k$: mean value of $Y$ for tuples such that $X = x_k$
- $N_k$ : number of records such that that $X = x_k$
- $N = \sum_k N_k$ : total number of records
- Interclass Variance: $\text{Inter} = \cfrac{1}{K-1} \cdot \sum_k N_k \cdot (\mu_k - \mu)^2$
- Intraclass Variance: $\text{Intra} = \cfrac{1}{N-K} \cdot \sum_k \sum_{j : X = x_k} ( y_j - \mu_k )^2$
- variance inside each group
- to evaluate the correlation between $X$ and $Y$ calculate $F = \cfrac{\text{Inter}}{\text{Intra}}$
- the null hypothesis $H_0$: all means $\mu_k$ are equal (i.e. assume independence),
- under $H_0$ $F$-ratio follows $F_{K-1,N-K}$: $F$-distribution with $K-1,N-K$ degrees of freedom
- if independent, all the means should be the same for all classes and $F$ should be 0
Example 1: Baseball
Batting Performance
we have 4 categories of baseball players:
- outfielders: $\text{OF}$
- infielders: $\text{IF}$
- jilter: $\text{DH}$
- catcher: $\text{C}$
Is there any difference in performance? (using on-base percentage OBP
to measure it)
- $H_0: \mu_\text{OF} = \mu_\text{IF} = \mu_\text{DH} = \mu_\text{C}$
- $H_A$: at least one is different
we approximate each $\mu$ by $\bar{x}$
OF |
IF |
DH |
Summary statistics (source: table 5.27, OpenIntro)
Sample size ($n_i$)
120 |
154 |
14 |
Sample mean ($\bar{x}_i$)
0.334 |
0.332 |
0.348 |
Sample SD ($s_i$)
0.029 |
0.037 |
0.036 |
(source: fig 5.28, OpenIntro)
We see that DH and C look really different. Why don't we just check if $\mu_\text{DH} = \mu_\text{C}$?
- the primary issue: we're inspecting the data before doing the check
- this is called Data Snooping (or Data Fishing)
- naturally we'd pick up the groups with largest differences and run the formal test
- but it would lead to Type I Errors
- it's also called Prosecutor's Fallacy
- $\text{MSG} = 0.00252$
- $\text{MSE} = 0.00121$
- $k = 4$ groups, $\text{df_G} = 4 - 1 = 3$
- $n = n_1 + n_2 + n_3 + n_4 = 327$, $\text{df}_E = n - k = 323$
- $F = \cfrac{MSG}{MSE} = \cfrac{0.00252}{0.00121} \approx 1.994$
- $p$-value is 0.115 > 0.05
- so we fail to reject $H_0$ at the significance level $\alpha=0.05$
Example 2: Statistics Class
We have high demand for a course, so run it several times in one semester
- e.g. it's run 3 times: scores of each run are sets $A, B, C$
- are these significant differences?
- $H_0: \mu_A = \mu_B = \mu_C$
- $H_A:$ scores may vary on average
Group |
$n$ |
Min. |
Mean |
Max. |
$A$ |
58 |
44 |
75.10 |
100 |
$B$ |
55 |
38 |
71.96 |
100 |
$C$ |
51 |
45 |
78.94 |
100 |
We run ANOVA tests and obtain the following summary table:
Df |
Sum Sq |
Mean Sq |
$F$ value |
lecture |
2 |
1290.11 |
645.06 |
3.48 |
Residuals |
161 |
29810.13 |
185.16 |
The $p$-value is greater than $\alpha=0.05$, so we reject $H_0$
by(classData$m1, classData$lecture, summary)
by(classData$m1, classData$lecture, sd)
by(classData$m1, classData$lecture, length)
boxplot(classData$m1 ~ classData$lecture, col='skyblue', axes=F)
axis(side=1, at=1:3, labels=c('A', 'B', 'C'))
oneway.test(classData$m1 ~ classData$lecture, var.equal=T)
# or
aov1 = aov(classData$m1 ~ classData$lecture)
Post-ANOVA processing: use $t$-test to pairwise compare $A,B,C$
- With Bonferroni Correction, $\alpha^* = \alpha / 3 = 0.05 / 3 = 0.017$
- $A$ vs $B$: $p$-value is 0.228, don't reject
- $A$ vs $C$: $p$-value is 0.148, don't reject
- $B$ vs $C$: $p$-value is 0.01, reject
In R:
pairwise.t.test(classData$m1, classData$lecture,
alternative='two.sided', p.adjust.method='bonferroni')
Example 3: Donuts
Example from [1]
- study of donuts: the relationship between the amount of absorbed fat vs the type of fat
- is there any relationship?
The data:
Fat1 |
Fat2 |
Fat3 |
164 |
178 |
175 |
172 |
191 |
193 |
168 |
197 |
178 |
177 |
182 |
171 |
156 |
185 |
163 |
195 |
177 |
176 |
We run ANOVA analysis and get the following:
- $F = 5.4063$,
- num $\text{df} = 3$, denom $\text{df} = 20$,
- $p\text{-value} = 0.006876$
file = ''
donuts = read.table(file, header=T)
donuts = stack(donuts)
boxplot(donuts$values ~ donuts$ind)
oneway.test(donuts$values ~ donuts$ind, var.equal=TRUE)
# p\text{-value} is small, we reject the hypothesis of equal absorption.
Same, done in steps:
groups = 4
# total variance
df.g = groups - 1
tot.mean = mean(donuts$values)
group.mean = tapply(donuts$values, donuts$ind, mean)
n = tapply(donuts$values, donuts$ind, length)
inter = sum(n * (group.mean - tot.mean) ^ 2) / df.g
# variance inside each group
df.e = length(donuts$values) - groups
intra.1 = tapply(donuts$values, donuts$ind, FUN=function(data) {
m = mean(data)
sum( (data - m)^ 2)
intra = sum(intra.1) / df.e
F.stat = inter/intra
p = 1 - pf(F.stat, df1=df.g, df2=df.e)
x = seq(0, 6, 0.05)
y = df(x, df1=df.g, df2=df.e)
plot(x, y, type='l')