© Vipin Kumar CSci 8980 Fall 2002 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application.
Learning Rules from Data
RIPPER Fast Effective Rule Induction
Decision Tree Approach in Data Mining
Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CSci 8980: Data Mining (Fall 2002)
Decision Tree Algorithm
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision Tree Learning
Basic Data Mining Techniques
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
CS690L Data Mining: Classification
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Class1 Class2 The methods discussed so far are Linear Discriminants.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
RULE-BASED CLASSIFIERS
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
CS4445 Data Mining B term WPI Solutions HW4: Classification Rules using RIPPER By Chiying Wang 1.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 By Gun Ho Lee Intelligent Information Systems Lab Soongsil.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction.
Fast Effective Rule Induction
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Mining Classification: Alternative Techniques
Rule-Based Classification
Classification Decision Trees
Data Mining Classification: Alternative Techniques
Introduction to Data Mining, 2nd Edition by
Data Mining Classification: Alternative Techniques
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Data Mining Rule Classifiers
Data Mining Classification: Alternative Techniques
Data Mining Rule Classifiers
INTRODUCTION TO Machine Learning 2nd Edition
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Presentation transcript:

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer Science University of Minnesota

© Vipin Kumar CSci 8980 Fall Rule-Based Classifiers l Classify instances by using a collection of “if…then…” rules l Rule: (Condition)  y –where Condition is a conjunctions of attributes and y is the class label –LHS: rule antecedent or condition –RHS: rule consequent –Examples of classification rules:  (Blood Type=Warm)  (Lay Eggs=Yes)  Birds  (Taxable Income < 50K)  (Refund=Yes)  Cheat=No

© Vipin Kumar CSci 8980 Fall Classifying Instances with Rules l A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule Rule: r: (Age < 35)  (Status = Married)  Cheat=No Instances: x1: (Age=29, Status=Married, Refund=No) x2: (Age=28, Status=Single, Refund=Yes) x3: (Age=38, Status=Divorced, Refund=No) => Only x1 is covered by the rule r

© Vipin Kumar CSci 8980 Fall Rule-Based Classifiers l Rules may not be mutually exclusive –More than one rule may cover the same instance –Solution?  Order the rules  Use voting schemes l Rules may not be exhaustive –May need a default class

© Vipin Kumar CSci 8980 Fall Example of Rule-Based Classifier

© Vipin Kumar CSci 8980 Fall Advantages of Rule-Based Classifiers l As highly expressive as decision trees l Easy to interpret l Easy to generate l Can classify new instances rapidly l Performance comparable to decision trees

© Vipin Kumar CSci 8980 Fall Basic Definitions l Coverage of a rule: –Fraction of instances that satisfy the antecedent of a rule l Accuracy of a rule: –Fraction of instances that satisfy both the antecedent and consequent of a rule (Status=Single)  No Coverage = 40%, Accuracy = 50%

© Vipin Kumar CSci 8980 Fall Building Classification Rules l Generate an initial set of rules: –Direct Method:  Extract rules directly from data  e.g.: RIPPER, CN2, Holte’s 1R –Indirect Method:  Extract rules from other classification models (e.g. decision trees).  e.g: C4.5rules l Rules are pruned and simplified l Rules can be ordered to obtain a rule set R l Rule set R can be further optimized

© Vipin Kumar CSci 8980 Fall Indirect Method: From Decision Trees To Rules Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree

© Vipin Kumar CSci 8980 Fall Rules Can Be Simplified Initial Rule: (Refund=No)  (Status=Married)  No Simplified Rule: (Status=Married)  No

© Vipin Kumar CSci 8980 Fall Indirect Method: C4.5rules l Extract rules from an unpruned decision tree l For each rule, r: A  y, –consider an alternative rule r’: A’  y where A’ is obtained by removing one of the conjuncts in A –Compare the pessimistic error rate for r against all r’s –Prune if one of the r’s has lower pessimistic error rate –Repeat until we can no longer improve the generalization error

© Vipin Kumar CSci 8980 Fall Indirect Method: C4.5rules l Instead of ordering the rules, order the subsets of rules –Each subset is a collection of rules with the same rule consequent (class) –Compute description length of each subset  Description length = L(error) + g L(model)  g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)

© Vipin Kumar CSci 8980 Fall Direct Method: Sequential Covering 1. Start from an empty rule 2. Find the conjunct (test condition on attribute) that optimizes certain objective criterion (e.g., entropy or FOIL’s information gain) 3. Remove instances covered by the conjunct 4. Repeat Step (2) and (3) until stopping criterion is met e.g., stop when all instances belong to same class or all attributes have same values.

© Vipin Kumar CSci 8980 Fall Example of Sequential Covering

© Vipin Kumar CSci 8980 Fall Example of Sequential Covering…

© Vipin Kumar CSci 8980 Fall Method for Adding Conjuncts l CN2: –Start from an empty conjunct: {} –Add conjuncts that minimizes the entropy measure: {A}, {A,B}, … –Determine the rule consequent by taking the majority class of instances covered by the rule l RIPPER: –Start from an empty rule: {} => class –Add conjuncts that maximizes FOIL’s information gain measure:  R0: {} => class (empty rule)  R1: {A} => class  Gain(R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]  where t: number of positive instances covered by both R0 and R1 p0: number of positive instances covered by R0 n0: number of negative instances covered by R0 p1: number of positive instances covered by R1 n1: number of negative instances covered by R1

© Vipin Kumar CSci 8980 Fall Direct Method: Sequential Covering… l Use a general-to-specific search strategy l Greedy approach l Unlike decision tree (which uses simultaneous covering), it does not explore all possible paths –Search only the current best path –Beam search: maintain k of the best paths l At each step, –decision tree chooses among several alternative attributes for splitting –Sequential covering chooses among alternative attribute-value pairs

© Vipin Kumar CSci 8980 Fall Direct Method: RIPPER l For 2-class problem, choose one of the classes as positive class, and the other as negative class –Learn rules for positive class –Negative class will be default class l For multi-class problem –Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class) –Learn the rule set for smallest class first, treat the rest as negative class –Repeat with next smallest class as positive class

© Vipin Kumar CSci 8980 Fall Direct Method: RIPPER l Growing a rule: –Start from empty rule –Add conjuncts as long as they improve FOIL’s information gain –Stop when rule no longer covers negative examples –Prune the rule immediately using incremental reduced error pruning –Measure for pruning: v = (p-n)/(p+n)  p: number of positive examples covered by the rule in the validation set  n: number of negative examples covered by the rule in the validation set –Pruning method: delete any final sequence of conditions that maximizes v

© Vipin Kumar CSci 8980 Fall Direct Method: RIPPER l Building a Rule Set: –Use sequential covering algorithm  finds the best rule that covers the current set of positive examples  eliminate both positive and negative examples covered by the rule –Each time a rule is added to the rule set, compute the description length  stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far

© Vipin Kumar CSci 8980 Fall Direct Method: RIPPER l Optimize the rule set: –For each rule r in the rule set R  Consider 2 alternative rules: –Replacement rule (r*): grow new rule from scratch –Revised rule(r’): add conjuncts to extend the rule r  Compare the rule set for r against the rule set for r* and r’  Choose rule set that minimizes MDL principle –Repeat rule generation and rule optimization for the remaining positive examples

© Vipin Kumar CSci 8980 Fall Example

© Vipin Kumar CSci 8980 Fall C4.5 versus C4.5rules versus RIPPER C4.5rules: (Give Birth=No, Can Fly=Yes)  Birds (Give Birth=No, Live in Water=Yes)  Fishes (Give Birth=Yes)  Mammals (Give Birth=No, Can Fly=No, Live in Water=No)  Reptiles ( )  Amphibians RIPPER: (Live in Water=Yes)  Fishes (Have Legs=No)  Reptiles (Give Birth=No, Can Fly=No, Live In Water=No)  Reptiles (Can Fly=Yes,Give Birth=No)  Birds ()  Mammals

© Vipin Kumar CSci 8980 Fall C4.5 versus C4.5rules versus RIPPER C4.5 and C4.5rules: RIPPER: