RIPPER Fast Effective Rule Induction

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision Tree Algorithm
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning: Symbol-Based
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Classification II.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Class1 Class2 The methods discussed so far are Linear Discriminants.
By Ankur Khator Gaurav Sharma Arpit Mathur 01D05014 SPAM FILTERING.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
RULE-BASED CLASSIFIERS
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning in Practice Lecture 18
Machine Learning Inductive Learning and Decision Trees
Fast Effective Rule Induction
Data Science Algorithms: The Basic Methods
Data Mining Classification: Alternative Techniques
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Rule Learning for Go Introduction Data Extraction Bad Move Problems
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Data Mining Rule Classifiers
Machine Learning in Practice Lecture 17
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

RIPPER Fast Effective Rule Induction Machine Learning 2003 Merlin Holzapfel & Martin Schmidt Mholzapf@uos.de Martisch@uos.de

Rule Sets - advantages easy to understand usually better than decision Tree learners representable in first order logic > easy to implement in Prolog prior knowledge can be added

Rule Sets - disadvantages scale poorly with training set size problems with noisy data likely in real-world data goal: develop rule learner that is efficient on noisy data competitive with C4.5 / C4.5rules

Problem with Overfitting overfitting also handles noisy cases underfitting is too general solution pruning: reduced error pruning (REP) post pruning pre pruning

Post Pruning (C4.5) overfit & simplify bottom - up construct tree that overfits convert tree to rules prune every rule separately sort rules according accuracy consider order when classifying bottom - up

Pre pruning some examples are ignored during concept generation final concept does not classify all training data correctly can be implemented in form of stopping criteria

Reduced Error Pruning seperate and conquer split data in training and validation set construct overfitting tree until pruning reduces accuracy evaluate impact on validation set of pruning a rule remove rule so it improves accuracy most

Time Complexity REP has a time complexity of O(n4) initial phase of overfitting alone has a complexity of O(n²) alternative concept Grow: faster in benchmarks time complexity still O(n4) with noisy data

Incremental Reduced Error Pruning - IREP by Fürnkranz & Widmer (1994) competitive error rates faster than REP and Grow

How IREP Works iterative application of REP random split of sets  bad split has negative influence (but not as bad as with REP) immediately pruning after a rule is grown (top-down approach)  no overfitting

Cohens IREP Implementation build rules until new rule results in too large error rate divide data (randomly) into growing set(2/3) and pruning set(1/3) grow rule from growing set immediately prune rule Delete final sequence of conditions delete condition that maximizes function v until no deletion improves value of v add pruned rule to ruleset delete every example covered by rule (p/n)

Cohens IREP - Algorithm

IREP and Multiple Classes order classes according to increasing prevalence (C1,....,Ck) find rule set to separate C1 from other classes IREP(PosData=C1,NegData=C2,...,Ck) remove all instances learned by rule set find rule set to separate C2 from C3,...,Ck ... Ck remains as default class

IREP and Missing Attributes handle missing attributes: for all tests involving A if attribute A of an instance is missing test fails

Differences Cohen <> Original pruning: final sequence <> single final condition stopping condition: error rate 50% <> accuracy(rule) < accuracy(empty rule) application: missing attributes, numerical variables, multiple classes <> two-class problems

Time Complexity IREP: O(m log² m), m = number of examples (fixed number of classification noise)

37 Benchmark Problems

Generalization Performance IREP performs worse on benchmark problems than C4.5rules won-lost-tie ratio: 11-23-3 error ratio 1.13 excluding mushroom 1.52 including mushroom

Improving IREP three modifications: alternative metric in pruning phase new stopping heuristics for rule adding post pruning of whole rule set (non-incremental pruning)

the Rule-Value Metric old metric not intuitive R1: p1 = 2000, n1 = 1000 R2: p1 = 1000, n1 = 1 metric preferes R1 (fixed P,N) leads to occasional failure to converge new metric (IREP*)

Stopping Condition 50%-heuristics often stops too soon with moderate sized examples sensitive to the ‘small disjunct problem‘ solution: after a rule is added, the total description length of rule set and missclassifications (DL=C+E) If DL is d bits larger then the smallest length so far stop (min(DL)+d<DLcurrent) d = 64 in Cohen‘s implementation  MDL (Minimal Description Length) heuristics

IREP* IREP* is IREP, improved by the new rule-value metric and the new stopping condition 28-8-1 against IREP 16-21-0 against C4.5rules error ratio 1.06 (IREP 1.13) respectively 1.04 (1.52) including mushrooms

Rule Optimization post prunes rules produced by IREP* The rules are considered in turn for each rule R, two alternatives are constructed Ri‘ new rule Ri‘‘ based on Ri final rule is chosen according to MDL

RIPPER IREP* is used to obtain a rule set rule optimization takes place IREP* is used to cover remaining positive examples  Repeated Incremental Pruning to Produce Error Reduction

RIPPERk apply steps 2 and 3 k times

RIPPER Performance 28-7-2 against IREP*

Error Rates RIPPER obviously is competitive

Efficency of RIPPERk modifications do not change complexity

Reasons for Efficiency find model with IREP* and then improve effiecient first model with right size optimization takes linear time C4.5 has expensive optimization improvement process to large initial model RIPPER is especially more efficient on large noisy datasets

Conclusions IREP improved to IREP* IREP* improved to RIPPER IREP is efficient rule learner for large noisy datasets but performs worse than C4.5 IREP improved to IREP* IREP* improved to RIPPER k iterated RIPPER is RIPPERk RIPPERk more efficient and performs better than C4.5

References Fast Effective Rule Induction William W. Cohen [1995] Incremental Reduced Error Pruning J. Fürnkranz & G. Widmer [1994] Efficient Pruning Methods William W. Cohen [1993]