ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Cramer's Coefficient

Cramer’s Coefficient

Note about $\chi^2$ Test of Independence:

  • when the size of a data set increases, the gap between observed and expected values also increases
  • even if the distribution remains unchanged
  • thus we reject the independence hypothesis as the size grows
  • Crammer’s Coefficient provides a solution for that

Definition

The Cramer’s coefficient $v$

  • $V = \sqrt{ \cfrac{\chi^2}{\chi^2_\text{max} } }$
  • with $\chi^2_\text{max} = N \times ( \min(N, P) - 1 )$ where
    • $N$ is the number of tuples and $P$ the number of attributes
  • $V \in [0, 1]$
  • 0 - maximal independence, and 1 - maximal correlation

Example

Consider the same example as for $\chi^2$ Test

| + Small Dataset || Male | Female | Total | Blois | 55 | 45 | 100 || Tours | 20 | 30 | 50 || Total | 75 | 75 | 150 | | + Bigger Dataset || Male | Female | Total | Blois | 550 | 450 | 1000 || Tours | 200 | 300 | 500 || Total | 750 | 750 | 1500 |

$V = \sqrt{ 3 / 150 } = \sqrt{ 30 / 1500 } \approx 0.14 $

R

cv.test = function(x,y) {
  CV = sqrt(chisq.test(x, y, correct=FALSE)$statistic /
    (length(x) * (min(length(unique(x)),length(unique(y))) - 1)))
  print.noquote("Cramér V / Phi:")
  return(as.numeric(CV))
}

So we can get Cramer’s V as

```text only helpdata = read.csv(“http://www.math.smith.edu/r/data/help.csv”) with(helpdata, cv.test(female, homeless)


or 

```bash
cv.test <- function(x) {
  CV <- sqrt(chisq.test(x, correct=FALSE)$statistic / (sum(x) * min(dim(x) - 1 )))

  ### The result of the Pearson chi-square (without the Yates correction) is divided by the sum of table cells and...
  ### ...multiplied by the smalles number of (row or column) cells minus 1.
  ### The $statistic sends the correct value (the X^2 only) into the sqrt function

  print.noquote("Cramér V / Phi:")
  return(as.numeric(CV))
}
  • http://en.wikipedia.org/wiki/Phi_coefficient
  • http://sas-and-r.blogspot.fr/2011/06/example-839-calculating-cramers-v.html
  • http://home.hib.no/ansatte/gbj/cramer_v.htm
  • http://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V
  • | !! http://stats.stackexchange.com/questions/105795/understanding-chi2-and-cram%C3%A9rs-v-results/105827#105827 | |

    Sources

  • Data Mining (UFRT)
  • http://en.wikipedia.org/wiki/Analysis_of_variance