ML Wiki

Rule-Based Classifier

Rule-based classifiers

use a set of IF-THEN rules for classification
if {condition} then {conclusion}
if part - condition stated over the data
then part - a class label, consequent

1-Rule

Example

Suppose we have the following dataset

Titanic dataset [1]
pclass	sex	age	sibsp	parch	fare	embarked	survived
3	male	22	1	0	7.25	S	0
1	female	38	1	0	71.28	C	1
3	female	26	0	0	7.93	S	1
1	female	35	1	0	53.10	S	1
3	male	35	0	0	8.05	S	0
3	male		0	0	8.46	Q	0
1	male	54	0	0	51.86	S	0
3	male	2	3	1	21.08	S	0
3	female	27	0	2	11.13	S	1
2	female	14	1	0	30.07	C	1
3	female	4	1	1	16.70	S	1
1	female	58	0	0	26.55	S	1

If we visualize the data by sex

color-coding survived with blue and not survived with red
so we see that most women survived $\to$ can assume that all survived
e.g. have a rule $x.\text{women} \Rightarrow \text{survived}$

Rules

there could be many rules like this
if most children under 4 survived, assume all children under 4 survived
if most people with 1 sibling survived, assume all people with 1 sibling survided
calculate misclassification error (error rate) and take the best!

Some variables have more than 2 values

if pclass = 1 then survived=yes
if pclass = 2 then survived=yes
if pclass = 3 then survived=no

One Rule Algorithm

for each attribute $A$
for each value $V$ of $\text{dom}(A)$
create a rule
- find the most frequent class $c$
- create a rule "if $A = V$ then $c$
- calculate the error rate of this rule
select attribute with best error rate for its rules

Can see 1-Rule as a one-level Decision Tree (Data Mining)

one branch for each value
each branch assigns most frequent class

Several Conditions

We also can consider several attributes at the same time

IF pclass='1' AND sex='female' THEN survive=yes
IF pclass='2' AND sex='female' THEN survive=yes
IF pclass='2' AND sex='male' THEN survive=no
IF pclass='3' AND sex='male' THEN survive=no
...

Can go further

IF pclass='3' AND sex='female' AND age < 4 THEN survive=yes
IF pclass='3' AND sex='female' AND age >= 4 THEN survive=no
IF pclass='1' AND sex='male' AND age < 5 THEN survive=yes

Getting Rules

Where to extract these rules from?

from Decision Trees
- each path from top to the bottom is a rule, and the leaf is a class
sequential covering - for learning rules directly (PRISM algorithm [2])
- repeatedly removes a portion of the dataset
- the portion - instances covered by the most promising rule at each stage

Sequential Covering Algorithm

PRISM(dataset $D$):

$R$ - resulting rule dataset, $R \leftarrow \varnothing$
for each class $C$
while $D \not \equiv \varnothing$
- $r \leftarrow$ FindNextRule($C, D$)
- $R \leftarrow R \cup \{ r \}$
- remove from $D$ all instances correctly classified by $r$

Finding the next rule [3]

FindNextRule($C, D$) subroutine
uses Depth-First Search to construct the next rule for class $C$
we know the consequent for this rule: it's $C$
so we need to construct only antecedent (предыдущий член отношения)
- start with an empty antecedent,
- iteratively add most promising "attribute=value" constraints
- use error rate to get the best one
continue DFS until the rule is specific enough to make no classification errors in the given dataset

FindNextRule(class $C$, dataset $D$):

let $A$ be all attributes in $D$
let $r$ be the initial rule $r: \varnothing \to C$
- not examining anything, just always returning $C$
while $r$ incorrectly classifies some non-$C$ instances in $D$
- let $\text{ant}(r) \to C$ be the rule computed at the previous iteration
  - $\text{ant}(r)$ is the antecedent of $r$;
  - it means take the rule from the previous iteration of the rule creation loop
  - (or an empty rule if this is the first iteration)
- for each pair $(a, v)$ s.t. $a \in A$ and $v \in \text{dom}(a)$
  - consider rule $\text{ant}(r) \land (a = v) \to C$
  - calculate the accuracy of this rule
- let $(a^*, v^*)$ be the pair with the best accuracy
- so update $r$ by adding this condition:
  - let $r: \text{ant}(r) \land (a^* = v^*) \to C$
- remove attribute $a^*$ from $A$:
  - $A \leftarrow A - \{ a^* \}$
return $r$

Example

A good example of PRISM can be found at [4]

Strategies for Learning Rules

General-to-Specific

start with an empty rule
add constraints to eliminate negative examples
stop when only positive examples are covered

Specific-to-General

start with a rule that identifies a single random instance
remove constraints to cover more positive examples
stop when further generalization starts covering negatives

Evaluation

Each rule can be evaluated using these measures

coverage: # of data points that satisfy conditions
accuracy = # of correct predictions / coverage

Other measures of "promisingness"

Information Gain
see page 38 at [5]

Sources and Links

ML Wiki

Rule-Based Classifier
Revision as of 13:50, 27 May 2014 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Rule-Based Classifier

1-Rule

Example

One Rule Algorithm

Several Conditions

Getting Rules

Sequential Covering Algorithm

Example

Strategies for Learning Rules

Evaluation

Sources and Links

Rule-Based ClassifierRevision as of 13:50, 27 May 2014 by Alexey (Talk | contribs) (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Rule-Based Classifier

1-Rule

Example

One Rule Algorithm

Several Conditions

Getting Rules

Sequential Covering Algorithm

Example

Strategies for Learning Rules

Evaluation

Sources and Links

Rule-Based Classifier
Revision as of 13:50, 27 May 2014 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)