Cramer's Coefficient

Cramer’s Coefficient

Note about $\chi^2$ Test of Independence:

when the size of a data set increases, the gap between observed and expected values also increases
even if the distribution remains unchanged
thus we reject the independence hypothesis as the size grows
Crammer’s Coefficient provides a solution for that

Definition

The Cramer’s coefficient $v$

$V = \sqrt{ \cfrac{\chi^2}{\chi^2_\text{max} } }$
with $\chi^2_\text{max} = N \times ( \min(N, P) - 1 )$ where
- $N$ is the number of tuples and $P$ the number of attributes
$V \in [0, 1]$
0 - maximal independence, and 1 - maximal correlation

Example

Consider the same example as for $\chi^2$ Test

| + Small Dataset || Male | Female | Total | Blois | 55 | 45 | 100 || Tours | 20 | 30 | 50 || Total | 75 | 75 | 150 |

| + Bigger Dataset || Male | Female | Total | Blois | 550 | 450 | 1000 || Tours | 200 | 300 | 500 || Total | 750 | 750 | 1500 |

$V = \sqrt{ 3 / 150 } = \sqrt{ 30 / 1500 } \approx 0.14 $

R

cv.test = function(x,y) {
  CV = sqrt(chisq.test(x, y, correct=FALSE)$statistic /
    (length(x) * (min(length(unique(x)),length(unique(y))) - 1)))
  print.noquote("Cramér V / Phi:")
  return(as.numeric(CV))
}

So we can get Cramer’s V as

```text only helpdata = read.csv(“http://www.math.smith.edu/r/data/help.csv”) with(helpdata, cv.test(female, homeless)

or 

```bash
cv.test <- function(x) {
  CV <- sqrt(chisq.test(x, correct=FALSE)$statistic / (sum(x) * min(dim(x) - 1 )))

  ### The result of the Pearson chi-square (without the Yates correction) is divided by the sum of table cells and...
  ### ...multiplied by the smalles number of (row or column) cells minus 1.
  ### The $statistic sends the correct value (the X^2 only) into the sqrt function

  print.noquote("Cramér V / Phi:")
  return(as.numeric(CV))
}

Cramer's Coefficient

Cramer’s Coefficient

Definition

Example

R

Links

Sources