ML Wiki

Constraint-Based Pattern Mining

Goal:

• to enumerate all patterns that satisfy some constraint $q$
• $q \equiv m(X, D) \geqslant \text{threshold}$
• so it's a general case of Frequent Pattern Mining
• where constraint is $\text{freq}(X, D) \geqslant \text{min_frec}$

Example

Example:

• suppose $q \equiv \sum_{x \in X} x.\text{price} \leqslant 8$
• prices: $(a \to 1, b \to 2, c \to 5, d \to 4, e \to 6, f \to 4)$
• the same idea as in Frequent Pattern Mining
• enumerate all subsets, check ones in which this constraint is satisfied
• solution:
• $(\{\} \to 0), (a \to 1), (b \to 2), (c \to 5), (d \to 4), (e \to 6),$
• $(f \to 4), (ab \to 3), (ac \to 6), (ad \to 5), (ae \to 7), (af \to 5),$
• $(bc \to 7), (bd \to 6), (be \to 8), (bf \to 6), (df \to 8), (abc \to 8),$
• $(abd \to 7), (abf \to 7)$
• • note that here when we add an item the price always increases!

Properties

Properties of constraints in Lattice

Anti-Monotone

Anti-monotone constraint

• A constraint $q$ is anti-monotone iff
• when an itemset $X$ satisfies $q$, then any $Y \subseteq X$ also satisfies $q$.
• $q(X) \Rightarrow \forall Y \subseteq X : q(Y)$

Monotone

Monotone constraint

• A constraint $q$ is monotone iff
• when an itemset $X$ satisfies $q$, then any $Y \supseteq X$ also satisfies $q$.
• $q(X) \Rightarrow \forall Y \supseteq X : q(Y)$

Example

Example 1

• Frequency is a anti-monotonic constraint:
• $\text{freq}(X) \geqslant \gamma \Rightarrow \forall Y \supseteq X: \text{freq}(Y) > \gamma$
• sum of prices if a monotonic constraint:
• $\sum_{x \in X} x.\text{price} \leqslant \gamma \Rightarrow \forall Y \subseteq X: \sum_{y \in Y} y.\text{price} \leqslant \gamma$

Example 2

• $T_1 = \{A,B,D,E\}, T_2 = \{A,B,C,D,F\}, T_3 = \{B,D,F\}, T_4 = \{C,E,F\}$
• $p(A) = 2, p(B) = 4, p(C) = 1, p(D) = 3, p(E) = 4, p(F) = 4$
• Consider this constraint: $\sum_{x \in X} p(x) \leqslant 6$
• Assume that $X$ is an itemset. When we add something to $X$, the sum only can grow
• so this constraint is monotonic:
• $\forall X \subseteq Y, \sum_{y \in Y} p(y) \leqslant 6 \Rightarrow \sum_{x \in X} p(X) \leqslant 6$
• and if we remove an item, we're sure that the price will decrease

Since it's anti-monotone, can use Apriori to find all itemsets that satisfy it

• Level 1: $A, B, C, D, E, F$
• Level 2: $AB, AC, AD, AE, AF, BC, CD, CE, CF$
• Level 3: $ACD$

Summary

Downward ClosureUpward Closure  • $freq(X, D) \geqslant t$
• $\min(X.\text{val}) \geqslant t$
• $\max(X.\text{val}) \leqslant t$
• $\text{sum}(X.\text{val}) \leqslant t$
• $X \subseteq \{A, B, C\}$
• $freq(X, D) \leqslant t$
• $\min(X.\text{val}) \leqslant t$
• $\max(X.\text{val}) \geqslant t$
• $\text{sum}(X.\text{val}) \geqslant t$
• $X \supseteq \{A, B, C\}$

Convertible Constraints

Consider this dataset

• $T_1 = \{A,B,D,E\}, T_2 = \{A,B,C,D,F\}, T_3 = \{B,D,F\}, T_4 = \{C,E,F\}$
• $p(A) = 2, p(B) = 4, p(C) = 1, p(D) = 3, p(E) = 4, p(F) = 4$

And the following constraint

• $\text{avg}(X.\text{price}) \leqslant 3$
• it's not anti-monotonic:
• $\text{avg}(B.\text{price}) = 4 \not \leqslant 3$, not satisfied
• $\text{avg}(AB.\text{price}) = 3 \leqslant 3$, satisfied
• it's not monotonic either:
• $\text{avg}(A.\text{price}) = 2$ and $\text{avg}(AB.\text{price}) = 3$

but we can convert it into anti-monotone by ordering items by price acs:

• $C < A < D < B, E, F$
• now can traverse the search space in BFS-way - with Eclat
• need to enumerate items in lexicographical ordering for that:
• $C, CA, CAD, ...$

Now we're sure that the average always increases

• $\text{avg}(\varnothing) = 0, \text{avg}(C) = 1/1, \text{avg}(CA) = 4/2, \text{avg}(CAD) = 7/3, \text{avg}(CADF) = 11/4, \text{avg}(CADFE) = 16/5$

Papers

• Heikki Mannila, Hannu Toivonen: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Discov. 1(3): 241-258 (1997) (about anti-monotone constraints)
• Jian Pei, Jiawei Han: Can we push more constraints into frequent pattern mining? KDD 2000: 350-354 (about convertible constraints)