Fast Effective Rule Induction By William W. Cohen
Overview Rule Based Learning Rule Learning Algorithm Pruning Techniques Modifications to IREP Evolution of Ripper Conclusion
Goal of the Paper The goal of this paper is to develop a rule learning algorithm that perform efficiently on a large noisy datasets and are competitive in generalization performance with more mature symbolic learning methods, such as decision trees.
Concepts to Refresh Overfit and simplify strategy - Separate and Conquer Pruning
Separate and Conquer General Idea: 1. Learn one rule that covers certain number of positive examples 2. Remove those examples covered by the rule 3. Repeat until no positive examples are left.
Sequential Covering Algorithm Sequential-Covering(class,attributes,examples,threshold T) RuleSet = 0 Rule = Learn-one-rule(class,attributes,examples) While (performance(Rule) > T) do a. RuleSet += Rule b. Examples = Examples \ {examples classified correctly by Rule} c. Rule = Learn-one-rule(class,attributes,examples) Sort RuleSet based on the performance of the rules Return RuleSet
Pruning Why do we need pruning? Techniques of pruning: 1. Reduced Error Pruning 2. Grow 3. Incremental Reduced Error Pruning
IREP Algorithm
How to build a rule in IREP? First the uncovered examples are randomly partitioned into two subsets, a growing set and a pruning set. Next a rule is grown.The implementation of a Grow rule is a propositional version of FOIL.
Grow Rule It begins with an empty conjunction of lconditions and considers adding to this any condition of the form An=v,Ac<=@ or Ac>=@ where An is a nominal attribute and v is a legal value for An or Ac is a continuous attribute and 2 is some value for Ac that occurs in the training data.
Grow Rule Grow rule repeatedly adda the conditions that maximizes FOIL’s information gain criterion until the rule covers no negative examples from the growing dataset.
Pruning After growing,the rule is immediately pruned by deleting any final sequence of conditions from the rule, and chooses the deletion that maximizes the function v(Rule,PrunePos,PruneNeg)= p+(N-n) / P+N
IREP IREP algorithm works for - Two-class problems - Multiple classes - Handles missing attributes
Experiments with IREP The First Graph
CPU times for C4.5,IREP and RIPPER2
Improvements to IREP Improvement in IREP needs modifications 1.The Real Value Metric 2. The stopping criterion 3. Rule optimization
Evolution of RIPPER First IREP* is used to obtain the initial rule set.This rule set is next optimized and finally rules are added to cover any remaining positive examples using IREP*.This leads to a new algorithm , namely RIPPER