Eclat

This is an algorithm for Frequent Pattern Mining based on Depth-First Search traversal of the itemset Lattice

  • but it's rather a DFS traversal of the prefix tree than lattice
  • and the Branch and Bound method is used for stopping


Downward Closure

This method uses the property of this Lattice:

  • downward-closure.png
  • $X \subseteq Y, \text{supp}(Y) = t \Rightarrow \text{supp}(X) \geqslant t$
  • $X \supseteq Y, \text{supp}(X) < \text{min_t} \Rightarrow \text{supp}(Y) < \text{min_t}$


Idea

  • TidList - list of transaction identifiers
  • represent each item $i$ as a list of transaction where it participated ("inverted list")
  • and calculate support as the size of intersection of TidLists


Example

  • items ${a,b,c,d,e,f}$
  • 4 transactions: $T_1: abc, T_2: acdef, T_3: abc, T_4: de$
  • $N = 4$, # of transactions

TidLists:

  • $\text{Tid}(a) = \{ T_1, T_2, T_3 \}$
  • $\text{Tid}(b) = \{ T_1, T_3 \}$
  • $\text{Tid}(c) = \{ T_1, T_2, T_3 \}$
  • $\text{Tid}(d) = \{ T_2, T_4 \}$
  • $\text{Tid}(e) = \{ T_2, T_4 \}$
  • $\text{Tid}(f) = \{ T_2 \}$

Support:

  • $\text{supp}(ab) = \cfrac{\big| \text{Tid}(a) \cap \text{Tid}(b) \big|}{N} = \cfrac{\big| \{T_1, T_2\} \big|}{N} = \cfrac{2}{4} = 0.5$


Eclat Algorithm

Algorithm

Eclat(prefix $X$, items $I$)

  • let $C$ be candidate itemsets and remove non-frequent items of $C$
  • $C = \big\{ X \cup {i} \ | \ \forall i \in I \ : \ \text{freq}(X \cup {i}) \geqslant \text{min_th} \big\}$
  • $F = \varnothing$
  • $C_\text{it} \leftarrow C$
  • for each frequent item $i$ added to $C$
    • let $X_i = X \cap \{i\}$
    • update $C_\text{it} = C_\text{it} - \{ i \}$
    • $F \leftarrow F + X_i + \text{Eclat}(X_i, C_\text{it})$
  • return $F$
th = 2

def eclat(prefix, items, D):
    if not items: return
    candidates = [(prefix | {i}, frequency(D, prefix | {i}), i)
                  for i in items if i not in prefix]
    frequent = filter(lambda x: x[1] >= th, candidates)

    for new_prefix, freq, i in frequent:
        frequent_items.append(new_prefix)
        items = items - {i}
        eclat(new_prefix, items, D)

eclat(set(), set(I), D)


Example

  • items ${a,b,c,d,e,f}$
  • threshold $t = 3$
  • 4 transactions
    • $\{ abc, acdef, abc, de \}$
  • notation: (item to add; prefix $\to$ frequency(prefix))


Let's run the algorithm on this input:

  1. eclat(prefix: $\{\}$, items: $acbedf$) (step 1)
    • candidates $(a: a \to 3), (c: c \to 3), (b: b \to 2), (e: e \to 2), (d: d \to 2), (f: f \to 1)$
    • frequent patterns $(a: a \to 3), (c: c \to 3), (b: b \to 2), (e: e \to 2), (d: d \to 2)$
    • adding $a$ to $F$
  2. eclat(prefix: $a$, items: $cbedf$) (step 2)
    • candidates $(c: ac \to 3), (b: ab \to 2), (e: ae \to 1), (d: ad \to 1), (f: af \to 1)$
    • frequent patterns $(c: ac \to 3), (b: ab \to 2)$
    • adding $ac$ to $F$
  3. eclat(prefix: $ac$, items: $bedf$) (step 3)
    • candidates $(b: acb \to 2), (e: ace \to 1), (d: acd \to 1), (f: acf \to 1)$
    • frequent patterns $(b: acb \to 2)$
    • adding $acb$ to $F$
  4. eclat(prefix: $acb$, items: $edf$) (step 4)
    • candidates $(e: acbe \to 0), (d: acbd \to 0), (f: acbf \to 0)$
    • frequent patterns $\{\}$
    • adding $ab$ to $F$
  5. eclat(prefix: $ab$, items: $edf$) (step 5)
    • candidates $(e: abe \to 0), (d: abd \to 0), (f: abf \to 0)$
    • frequent patterns $\{\}$
    • adding $c$ to $F$
  6. eclat(prefix: $c$, items: $bedf$) (step 6)
    • candidates $(b: cb \to 2), (e: ce \to 1), (d: cd \to 1), (f: cf \to 1)$
    • frequent patterns $(b: cb \to 2)$
    • adding $cb$ to $F$
  7. eclat(prefix: $cb$, items: $edf$) (step 7)
    • candidates $(e: cbe \to 0), (d: cbd \to 0), (f: cbf \to 0)$
    • frequent patterns $$
    • adding $b$ to $F$
  8. eclat(prefix: $b$, items: $edf$) (step 8)
    • candidates $(e: be \to 0), (d: bd \to 0), (f: bf \to 0)$
    • frequent patterns $\{\}$
    • adding $e$ to $F$
  9. eclat(prefix: $e$, items: $df$) (step 9)
    • candidates $(d: ed \to 2), (f: ef \to 1)$
    • frequent patterns $(d: ed \to 2)$
    • adding $ed$ to $F$
  10. eclat(prefix: $ed$, items: $f$) (step 10)
    • candidates $(f: edf \to 1)$
    • frequent patterns $\{\}$
    • adding $d$ to $F$
  11. eclat(prefix: $d$, items: $f$) (step 11)
    • candidates $(f: df \to 1)$
    • frequent patterns $\varnothing$

Result

$[a, ac, acb, ab, c, cb, b, e, ed, d]$


Algorithm with TidList

To be able to calculate support quicker,

  • use TidList



Links

Presentations

Implementations


See Also

Sources