Eclat
This is an algorithm for Frequent Pattern Mining based on Depth-First Search traversal of the itemset Lattice
- but it's rather a DFS traversal of the prefix tree than lattice
- and the Branch and Bound method is used for stopping
Downward Closure
This method uses the property of this Lattice:
-
- $X \subseteq Y, \text{supp}(Y) = t \Rightarrow \text{supp}(X) \geqslant t$
- $X \supseteq Y, \text{supp}(X) < \text{min_t} \Rightarrow \text{supp}(Y) < \text{min_t}$
Idea
- TidList - list of transaction identifiers
- represent each item $i$ as a list of transaction where it participated ("inverted list")
- and calculate support as the size of intersection of TidLists
Example
- items ${a,b,c,d,e,f}$
- 4 transactions: $T_1: abc, T_2: acdef, T_3: abc, T_4: de$
- $N = 4$, # of transactions
TidLists:
- $\text{Tid}(a) = \{ T_1, T_2, T_3 \}$
- $\text{Tid}(b) = \{ T_1, T_3 \}$
- $\text{Tid}(c) = \{ T_1, T_2, T_3 \}$
- $\text{Tid}(d) = \{ T_2, T_4 \}$
- $\text{Tid}(e) = \{ T_2, T_4 \}$
- $\text{Tid}(f) = \{ T_2 \}$
Support:
- $\text{supp}(ab) = \cfrac{\big| \text{Tid}(a) \cap \text{Tid}(b) \big|}{N} = \cfrac{\big| \{T_1, T_2\} \big|}{N} = \cfrac{2}{4} = 0.5$
Eclat Algorithm
Algorithm
Eclat(prefix $X$, items $I$)
- let $C$ be candidate itemsets and remove non-frequent items of $C$
- $C = \big\{ X \cup {i} \ | \ \forall i \in I \ : \ \text{freq}(X \cup {i}) \geqslant \text{min_th} \big\}$
- $F = \varnothing$
- $C_\text{it} \leftarrow C$
- for each frequent item $i$ added to $C$
- let $X_i = X \cap \{i\}$
- update $C_\text{it} = C_\text{it} - \{ i \}$
- $F \leftarrow F + X_i + \text{Eclat}(X_i, C_\text{it})$
- return $F$
th = 2
def eclat(prefix, items, D):
if not items: return
candidates = [(prefix | {i}, frequency(D, prefix | {i}), i)
for i in items if i not in prefix]
frequent = filter(lambda x: x[1] >= th, candidates)
for new_prefix, freq, i in frequent:
frequent_items.append(new_prefix)
items = items - {i}
eclat(new_prefix, items, D)
eclat(set(), set(I), D)
Example
- items ${a,b,c,d,e,f}$
- threshold $t = 3$
- 4 transactions
- $\{ abc, acdef, abc, de \}$
- notation: (item to add; prefix $\to$ frequency(prefix))
Let's run the algorithm on this input:
- eclat(prefix: $\{\}$, items: $acbedf$) (step 1)
- candidates $(a: a \to 3), (c: c \to 3), (b: b \to 2), (e: e \to 2), (d: d \to 2), (f: f \to 1)$
- frequent patterns $(a: a \to 3), (c: c \to 3), (b: b \to 2), (e: e \to 2), (d: d \to 2)$
- adding $a$ to $F$
- eclat(prefix: $a$, items: $cbedf$) (step 2)
- candidates $(c: ac \to 3), (b: ab \to 2), (e: ae \to 1), (d: ad \to 1), (f: af \to 1)$
- frequent patterns $(c: ac \to 3), (b: ab \to 2)$
- adding $ac$ to $F$
- eclat(prefix: $ac$, items: $bedf$) (step 3)
- candidates $(b: acb \to 2), (e: ace \to 1), (d: acd \to 1), (f: acf \to 1)$
- frequent patterns $(b: acb \to 2)$
- adding $acb$ to $F$
- eclat(prefix: $acb$, items: $edf$) (step 4)
- candidates $(e: acbe \to 0), (d: acbd \to 0), (f: acbf \to 0)$
- frequent patterns $\{\}$
- adding $ab$ to $F$
- eclat(prefix: $ab$, items: $edf$) (step 5)
- candidates $(e: abe \to 0), (d: abd \to 0), (f: abf \to 0)$
- frequent patterns $\{\}$
- adding $c$ to $F$
- eclat(prefix: $c$, items: $bedf$) (step 6)
- candidates $(b: cb \to 2), (e: ce \to 1), (d: cd \to 1), (f: cf \to 1)$
- frequent patterns $(b: cb \to 2)$
- adding $cb$ to $F$
- eclat(prefix: $cb$, items: $edf$) (step 7)
- candidates $(e: cbe \to 0), (d: cbd \to 0), (f: cbf \to 0)$
- frequent patterns $$
- adding $b$ to $F$
- eclat(prefix: $b$, items: $edf$) (step 8)
- candidates $(e: be \to 0), (d: bd \to 0), (f: bf \to 0)$
- frequent patterns $\{\}$
- adding $e$ to $F$
- eclat(prefix: $e$, items: $df$) (step 9)
- candidates $(d: ed \to 2), (f: ef \to 1)$
- frequent patterns $(d: ed \to 2)$
- adding $ed$ to $F$
- eclat(prefix: $ed$, items: $f$) (step 10)
- candidates $(f: edf \to 1)$
- frequent patterns $\{\}$
- adding $d$ to $F$
- eclat(prefix: $d$, items: $f$) (step 11)
- candidates $(f: df \to 1)$
- frequent patterns $\varnothing$
Result
- $[a, ac, acb, ab, c, cb, b, e, ed, d]$
Algorithm with TidList
To be able to calculate support quicker,
Links
Presentations
Implementations
See Also
Sources