Mining Association Rules

Association Rules

Association rule mining:

  • Finding all the rules $X \to Y$ such that
  • $P(X \land Y) \geqslant \text{min_supp}$ and
  • $P(Y | X) \geqslant \text{min_con}$
  • these are predictive patterns

E.g.

  • $\text{wings} \to \text{beak}$
  • $\text{wings} \land \text{beak} \to \text{fly}$


Association rules

  • $X \to Y$ is an association rule if
  • $X$ and $Y$ are itemsets
  • $X \cap Y = \varnothing$
  • $X$ is called the body
  • $Y$ is called the head (conclusion)


Motivation

Consider this example

  • we data with customers and their purchases
  • as-rules-ex1.png


pasta t.souse red wine seafood white wine salami
1
2
3
4
5


Pattern 1: How to organize supermarket?

  • we see that seafood and white wine usually go together, so put them together
  • these are associations that are not always observed in practice


Pattern 2: What to promote?

  • we see a rule $\text{pasta} \land \text{souse} \to \text{red wine}$
  • it doesn't always hold, but it's a typical pattern
  • so promote red wine


Generation of Association Rules


Algorithm

Generate(frequent itemsets $F$, confidence threshold $\theta$)

  • $R \leftarrow \varnothing$
  • for each $X \in F$, for each $Y \subset X$
    • if $\text{conf}(Y \subset X) = \cfrac{\text{supp}(X)}{\text{supp}(Y)} \geqslant \theta$
    • then $R \leftarrow R \cup \{ Y \to X - Y \}$
  • return $R$


Complexity

It computes association rules in polynomial time


Examples

Example 1

${A, B, C, D, E, F}$

  • $T_1 = \{A,B,D,E\}$
  • $T_2 = \{A,B,C,D,F\}$
  • $T_3 = \{B,D,F\}$
  • $T_4 = \{C,E,F\}$

Task:

  • Find rules $X \to A$ ($A$ = Apple) with support threshold 50%
  • Calculate the confidence of these rules
  • Are there redundant rules?


Frequent items with support 50%:

  • $[A, B, C, D, E, F, DF, BF, AD, BD, AB, CF, BDF, ABD]$ - calculated with Apriori
  • ones that involve $A$: $[A, AD, AB, ABD]$
  • so, rules are $\varnothing \to A, B \to A, D \to A, BD \to A$


Confidence:

  • $\text{conf}(\varnothing \to A) = \cfrac{\text{supp}(\varnothing \to A)}{\text{supp}(\varnothing)} = \cfrac{2}{4}$
  • $\text{conf}(B \to A) = \cfrac{\text{supp}(B \to A)}{\text{supp}(B)} = \cfrac{2}{3}$
  • $\text{conf}(D \to A) = \cfrac{\text{supp}(D \to A)}{\text{supp}(D)} = \cfrac{2}{3}$
  • $\text{conf}(BD \to A) = \cfrac{\text{supp}(BD \to A)}{\text{supp}(BD)} = \cfrac{2}{3}$

Redundant rules

  • the rule $BD \to A$ is redundant because we have $B \to A$ and $D \to A$


See Also

Sources

Machine Learning Bookcamp: Learn machine learning by doing projects. Get 40% off with code "grigorevpc".

Share your opinion