Rule-Based Classifier

Rule-based classifiers

  • use a set of IF-THEN rules for classification
  • if {condition} then {conclusion}
  • if part - condition stated over the data
  • then part - a class label, consequent


1-Rule

Example

Suppose we have the following dataset

Titanic dataset [1]
pclass sex age sibsp parch fare embarked survived
3 male 22 1 0 7.25 S 0
1 female 38 1 0 71.28 C 1
3 female 26 0 0 7.93 S 1
1 female 35 1 0 53.10 S 1
3 male 35 0 0 8.05 S 0
3 male 0 0 8.46 Q 0
1 male 54 0 0 51.86 S 0
3 male 2 3 1 21.08 S 0
3 female 27 0 2 11.13 S 1
2 female 14 1 0 30.07 C 1
3 female 4 1 1 16.70 S 1
1 female 58 0 0 26.55 S 1

If we visualize the data by sex

  • color-coding survived with blue and not survived with red
  • rules-tit-sex.png
  • so we see that most women survived $\to$ can assume that all survived
  • e.g. have a rule $x.\text{women} \Rightarrow \text{survived}$


Rules

  • there could be many rules like this
  • if most children under 4 survived, assume all children under 4 survived
  • if most people with 1 sibling survived, assume all people with 1 sibling survided
  • calculate misclassification error (error rate) and take the best!


Some variables have more than 2 values

  • rules-tit-plcass.png
  • if pclass = 1 then survived=yes
  • if pclass = 2 then survived=yes
  • if pclass = 3 then survived=no


One Rule Algorithm

  • for each attribute $A$
  • for each value $V$ of $\text{dom}(A)$
  • create a rule
    • find the most frequent class $c$
    • create a rule "if $A = V$ then $c$
    • calculate the error rate of this rule
  • select attribute with best error rate for its rules


Can see 1-Rule as a one-level Decision Tree (Data Mining)

  • one branch for each value
  • each branch assigns most frequent class


Several Conditions

We also can consider several attributes at the same time

  • rules-tit-2cl.png
  • IF pclass='1' AND sex='female' THEN survive=yes
  • IF pclass='2' AND sex='female' THEN survive=yes
  • IF pclass='2' AND sex='male' THEN survive=no
  • IF pclass='3' AND sex='male' THEN survive=no
  • ...


Can go further

  • IF pclass='3' AND sex='female' AND age < 4 THEN survive=yes
  • IF pclass='3' AND sex='female' AND age >= 4 THEN survive=no
  • IF pclass='1' AND sex='male' AND age < 5 THEN survive=yes


Getting Rules

Where to extract these rules from?

  • from Decision Trees
    • each path from top to the bottom is a rule, and the leaf is a class
  • sequential covering - for learning rules directly (PRISM algorithm [2])
    • repeatedly removes a portion of the dataset
    • the portion - instances covered by the most promising rule at each stage


Sequential Covering Algorithm

PRISM(dataset $D$):

  • $R$ - resulting rule dataset, $R \leftarrow \varnothing$
  • for each class $C$
  • while $D \not \equiv \varnothing$
    • $r \leftarrow$ FindNextRule($C, D$)
    • $R \leftarrow R \cup \{ r \}$
    • remove from $D$ all instances correctly classified by $r$


Finding the next rule [3]

  • FindNextRule($C, D$) subroutine
  • uses Depth-First Search to construct the next rule for class $C$
  • we know the consequent for this rule: it's $C$
  • so we need to construct only antecedent (предыдущий член отношения)
    • start with an empty antecedent,
    • iteratively add most promising "attribute=value" constraints
    • use error rate to get the best one
  • continue DFS until the rule is specific enough to make no classification errors in the given dataset


FindNextRule(class $C$, dataset $D$):

  • let $A$ be all attributes in $D$
  • let $r$ be the initial rule $r: \varnothing \to C$
    • not examining anything, just always returning $C$
  • while $r$ incorrectly classifies some non-$C$ instances in $D$
    • let $\text{ant}(r) \to C$ be the rule computed at the previous iteration
      • $\text{ant}(r)$ is the antecedent of $r$;
      • it means take the rule from the previous iteration of the rule creation loop
      • (or an empty rule if this is the first iteration)
    • for each pair $(a, v)$ s.t. $a \in A$ and $v \in \text{dom}(a)$
      • consider rule $\text{ant}(r) \land (a = v) \to C$
      • calculate the accuracy of this rule
    • let $(a^*, v^*)$ be the pair with the best accuracy
    • so update $r$ by adding this condition:
      • let $r: \text{ant}(r) \land (a^* = v^*) \to C$
    • remove attribute $a^*$ from $A$:
      • $A \leftarrow A - \{ a^* \}$
  • return $r$


Example

A good example of PRISM can be found at [4]


Strategies for Learning Rules

General-to-Specific

  • start with an empty rule
  • add constraints to eliminate negative examples
  • stop when only positive examples are covered

Specific-to-General

  • start with a rule that identifies a single random instance
  • remove constraints to cover more positive examples
  • stop when further generalization starts covering negatives


Evaluation

Each rule can be evaluated using these measures

  • coverage: # of data points that satisfy conditions
  • accuracy = # of correct predictions / coverage


Other measures of "promisingness"


Sources and Links