CS4445 Data Mining B term 2014. WPI Solutions HW4: Classification Rules using RIPPER By Chiying Wang 1.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
RIPPER Fast Effective Rule Induction
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
NP-Completeness: Reductions
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
CS4445/B12 Provided by: Kenneth J. Loomis. CLASSIFICATION RULES: RIPPER ALGORITHM.
Data Mining Association Analysis: Basic Concepts and Algorithms
1.2 Row Reduction and Echelon Forms
Linear Equations in Linear Algebra
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Induction of Decision Trees
ROC Curves.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Research Project Mining Negative Rules in Large Databases using GRD.
Branch and Bound Algorithm for Solving Integer Linear Programming
Linear Programming Applications
Sequencing Problem.
Transportation Problem Moving towards Optimality ATISH KHADSE.
Basic Data Mining Techniques
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Digit Sums of the factors of a number An Investigation.
MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta
Chapter 9 – Classification and Regression Trees
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
Chapter 4 – 1 Chapter 4: Measures of Central Tendency What is a measure of central tendency? Measures of Central Tendency –Mode –Median –Mean Shape of.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 2.1.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
Association Rule Mining
D ATA M INING Car Evaluation Database. D ATA S ET I NFORMATION PRICE Buying ( Buying price ) VHigh, High, Med, Low Maint ( Price of the maintenance )
ID3 Algorithm Michael Crawford.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
University of Colorado at Boulder Yicheng Wang, Phone: , Optimization Techniques for Civil and Environmental Engineering.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Simplex Method Simplex: a linear-programming algorithm that can solve problems having more than two decision variables. The simplex technique involves.
Elsayed Hemayed Data Mining Course
Copyright ©2015 Pearson Education, Inc. All rights reserved.
RULE-BASED CLASSIFIERS
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Exercises on Chomsky Normal Form and CYK parsing
Presented by Yu-Shun Wang Advisor: Frank, Yeong-Sung Lin Near Optimal Defense Strategies to Minimize Attackers’ Success Probabilities for networks of Honeypots.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Fast Effective Rule Induction
Data Science Algorithms: The Basic Methods
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Science Algorithms: The Basic Methods
Data Mining Classification: Alternative Techniques
Copyright © Cengage Learning. All rights reserved.
Data Mining Rule Classifiers
Dept. of Computer Science University of Liverpool
Data Mining CSCI 307, Spring 2019 Lecture 21
Frequency Distributions
Presentation transcript:

CS4445 Data Mining B term WPI Solutions HW4: Classification Rules using RIPPER By Chiying Wang 1

Car Dataset InstanceBuyingMaintPersonsSafetyClass 1medvhighmorelowunacc 2medvhigh2medunacc 3vhigh moremedunacc 4medhigh4lowunacc 5highmed4highgood 6lowmed2 unacc 7lowhigh2 unacc 8lowvhighmoremedacc 9medvhigh4medacc 10medvhigh4medacc 11vhigh 4medunacc InstanceBuyingMaintPersonsSafetyClass 12med moremedacc 13medvhigh2medunacc 14med 4lowunacc 15medvhighmorelowunacc 16medlow4medacc 17highlow2highunacc 18highmed4lowunacc 19medlow4 unacc 20high 4lowunacc 21lowmed4highgood 22low 2highunacc The car dataset contains 22 instances, four predictive attributes and one class attribute. 2

Ripper 1 st Rule: Selecting a consequent ClassFrequency class = good2 class = acc5 class = unacc15 We will use Ripper algorithm to construct the first rule for the Car dataset. A rule is defined as the form of antecedent -> consequence. In Ripper, we construct rules for the least frequent class first. The following table shows the frequencies of class values in the dataset. ‘class = good’ has the lowest frequency and we should choose it as the consequence of the first rule. Thus, we start from the empty rule ‘ -> class = good’. 3

Ripper 1st Rule: 1 st Candidate Conditions Next we attempt to find the first condition for the antecedent of the rule ‘->class = good’. In the dataset, there are two instances covered by the rule: We only need to look at possible conditions in the above two instances where class = good. All possible conditions are listed in the following table: BuyingMaintPersonsSafetyClass highmed4highgood lowmed4highgood Rule : -> class = good Buying = highMaint = medPersons = 4Safety = highBuying = low 4

Ripper 1 st Rule: Comparing candidate conditions InstanceBuyingMaintPersonsSafetyClass 1medvhighmorelowunacc 2medvhigh2medunacc 3vhigh moremedunacc 4medhigh4lowunacc 5highmed4highgood 6lowmed2 unacc 7lowhigh2 unacc 8lowvhighmoremedacc 9medvhigh4medacc 10medvhigh4medacc 11vhigh 4medunacc InstanceBuyingMaintPersonsSafetyClass 12med moremedacc 13medvhigh2medunacc 14med 4lowunacc 15medvhighmorelowunacc 16medlow4medacc 17highlow2highunacc 18highmed4lowunacc 19medlow4 unacc 20high 4lowunacc 21lowmed4highgood 22low 2highunacc Next, we determine the information gain for each of possible conditions for the rule. We will add one of them as the first antecedent into the rule. The instances used in the construction are shown in the following table. 5

Ripper 1 st Rule: Calculating info gain To calculate FOIL’s information gain, we start by calculating p 0 and n 0 for the rule ‘->class = good’ before adding a new condition. p 0 is the number of instances with class = good and the corresponding instances are: n 0 is the number of instances such that class ≠ good and the corresponding instances are: InstanceBuyingMaintPersonsSafetyClass 5highmed4highgood 21lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 1medvhighmorelowunacc 2medvhigh2medunacc 3vhigh moremedunacc 4medhigh4lowunacc 6lowmed2 unacc 7lowhigh2 unacc 8lowvhighmoremedacc 9medvhigh4medacc 10medvhigh4medacc 11vhigh 4medunacc InstanceBuyingMaintPersonsSafetyClass 12med moremedacc 13medvhigh2medunacc 14med 4lowunacc 15medvhighmorelowunacc 16medlow4medacc 17highlow2highunacc 18highmed4lowunacc 19medlow4 unacc 20high 4lowunacc 22low 2highunacc 6

1 st Rule: Calculating info gain for candidate 1 Consider the first candidate condition: “buying = high”, we need to calculate the information gain of adding this condition to the empty rule obtaining: buying = high -> class = good Given: p 0 is the number of instances such that class = good (instances in slide 6) n 0 is the number of instances such that class ≠ good (instances in slide 6) p 1 is the number of instances such that buying = high and class = good (instances as follows) n 1 is the number of instances such that buying = high and class ≠ good (instances as follows) MeasuresValue p0p0 2 n0n0 20 p1p1 1 n1n1 3 InstanceBuyingMaintPersonsSafetyClass 5highmed4highgood InstanceBuyingMaintPersonsSafetyClass 17highlow2highunacc 18highmed4lowunacc 20high 4lowunacc 7

1 st Rule: Calculating info gain for candidate 2 For “maint = med”, maint = med -> class = good Given: p 0 is the number of instances such that class = good (instances in slide 6) n 0 is the number of instances such that class ≠ good (instances in slide 6) p 1 is the number of instances such that maint = med and class = good (instances as follows) n 1 is the number of instances such that maint = med and class ≠ good (instances as follows) MeasuresValue p0p0 2 n0n0 20 p1p1 2 n1n1 4 8 InstanceBuyingMaintPersonsSafetyClass 5highmed4highgood 21lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 6lowmed2 unacc 12med moremedacc 14med 4lowunacc 18highmed4lowunacc

1 st Rule: Calculating info gain for candidate 3 For “persons = 4”, persons = 4 -> class = good Given: p 0 is the number of instances such that class = good (instances in slide 6) n 0 is the number of instances such that class ≠ good (instances in slide 6) p 1 is the number of instances such that persons = 4 and class = good (instances as follows:) n 1 is the number of instances such that persons = 4 and class ≠ good (instances as follows:) MeasuresValue p0p0 2 n0n0 20 p1p1 2 n1n1 9 9 InstanceBuyingMaintPersonsSafetyClass 5highmed4highgood 21lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 4medhigh4lowunacc 9medvhigh4medacc 10medvhigh4medacc 11vhigh 4medunacc InstanceBuyingMaintPersonsSafetyClass 14med 4lowunacc 16medlow4medacc 18highmed4lowunacc 19medlow4 unacc 20high 4lowunacc

1 st Rule: Calculating info gain for candidate 4 For “safety = high”, safety = high -> class = good Given: p 0 is the number of instances such that class = good (instances in slide 6) n 0 is the number of instances such that class ≠ good (instances in slide 6) p 1 is the number of instances such that safety = high and class = good (instances as follows) n 1 is the number of instances such that safety = high and class ≠ good (instances as follows) MeasuresValue p0p0 2 n0n0 20 p1p1 2 n1n InstanceBuyingMaintPersonsSafetyClass 5highmed4highgood 21lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 7lowhigh2 unacc 17highlow2highunacc 22low 2highunacc

1 st Rule: Calculating info gain for candidate 5 For “buying = low”, buying = low -> class = good Given: p 0 is the number of instances such that class = good (instances in slide 6) n 0 is the number of instances such that class ≠ good (instances in slide 6) p 1 is the number of instances such that buying = low and class = good (instances as follows) n 1 is the number of instances such that buying = low and class ≠ good (instances as follows) MeasuresValue p0p0 2 n0n0 20 p1p1 1 n1n InstanceBuyingMaintPersonsSafetyClass 21lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 6lowmed2 unacc 7lowhigh2 unacc 8lowvhighmoremedacc 22low 2highunacc

1 st Rule: Choosing 1 st condition Now we have the information gain for each of the possible conditions for the antecedent of the rule, as shown in the following table. Possible ConditionsInformation Gain Buying = high= 1*(log 2 (1/(1+3)) - log 2 (2/(2+20))) = Buying = low= 1*(log 2 (1/(1+4)) - log 2 (2/(2+20))) = Maint = med= 1*(log 2 (2/(2+4)) - log 2 (2/(2+20))) = Persons = 4= 1*(log 2 (2/(2+9)) - log 2 (2/(2+20))) = Safety = high= 1*(log 2 (2/(2+3)) - log 2 (2/(2+20))) = Then, we select condition “safety = high” with highest information gain as the first antecedent of the rule. Since its gain is > 0 we add this condition to the rule: Safety = high -> class = good 12

1 st Rule: Checking termination criteria Then, we need to determine if the construction should stop. Since the current rule covers negative examples (shown in the table below), the construction continues to refine the rule. BuyingMaintPersonsSafetyClass lowhigh2 unacc highlow2highunacc low 2highunacc 13

Ripper 1st Rule: 2nd Candidate Conditions Next we attempt to find the second condition in the antecedent. We only need to look at possible conditions in the two instances where safety = high and class = good, shown in the following table. BuyingMaintPersonsSafetyClass highmed4highgood lowmed4highgood Rule : safety = high and … -> class = good Buying = highMaint = medPersons = 4Buying = low The list of possible conditions are in the table below. 14

Ripper 1 st Rule: Comparing 2 nd candidate conditions InstanceBuyingMaintPersonsSafetyClass 1highmed4highgood 2lowhigh2 unacc 3highlow2highunacc 4lowmed4highgood 5low 2highunacc Next, we determine the information gain for each of possible conditions for the rule. We will add one of them as the second condition in the antecedent of the rule. The instances used in the construction are shown in the following table. p 1 is the number of instances such that safety = high and class = good and the corresponding instances from the above table are: n 1 is the number of instances such that safety = high and class ≠ good and the corresponding instances from the above table are: InstanceBuyingMaintPersonsSafetyClass 2lowhigh2 unacc 3highlow2highunacc 5low 2highunacc InstanceBuyingMaintPersonsSafetyClass 1highmed4highgood 4lowmed4highgood 15

1 st Rule: Calculating info gain for 2 nd candidate 1 For “buying = high”, safety= high and buying = high -> class = good Given: p 1 is the number of instances such that safety = high and class = good (instances in slide 15) n 1 is the number of instances such that safety = high and class ≠ good (instances in slide 15) p 2 is the number of instances such that safety = high and buying = high and class = good (as follows) n 2 is the number of instances such that safety = high and buying = high and class ≠ good (as follows) MeasuresValue p1p1 2 n1n1 3 p2p2 1 n2n2 1 InstanceBuyingMaintPersonsSafetyClass 1highmed4highgood InstanceBuyingMaintPersonsSafetyClass 3highlow2highunacc 16

1 st Rule: Calculating info gain for 2 nd candidate 2 For “maint = med”, safety= high and maint = med-> class = good Given: p 1 is the number of instances such that safety = high and class = good (instances in slide 15) n 1 is the number of instances such that safety = high and class ≠ good (instances in slide 15) p 2 is the number of instances such that safety = high and maint = med and class = good (as follows) n 2 is the number of instances such that safety = high and maint = med and class ≠ good (no instances) MeasuresValue p1p1 2 n1n1 3 p2p2 2 n2n2 0 InstanceBuyingMaintPersonsSafetyClass 1highmed4highgood 4lowmed4highgood 17

1 st Rule: Calculating info gain for 2 nd candidate 3 For “persons = 4”, safety= high and persons = 4 -> class = good Given: p 1 is the number of instances such that safety = high and class = good (instances in slide 15) n 1 is the number of instances such that safety = high and class ≠ good (instances in slide 15) p 2 is the number of instances such that safety = high and persons = 4 and class = good (as follows) n 2 is the number of instances such that safety = high and persons = 4 and class ≠ good (no instances) MeasuresValue p1p1 2 n1n1 3 p2p2 2 n2n2 0 InstanceBuyingMaintPersonsSafetyClass 1highmed4highgood 4lowmed4highgood 18

1 st Rule: Calculating info gain for 2 nd candidate 4 For “buying = low”, safety= high and buying = low -> class = good Given: p 1 is the number of instances such that safety = high and class = good (instances in slide 15) n 1 is the number of instances such that safety = high and class ≠ good (instances in slide 15) p 2 is the number of instances such that safety = high and buying = low and class = good (as follows) n 2 is the number of instances such that safety = high and buying = low and class ≠ good (as follows) MeasuresValue p1p1 2 n1n1 3 p2p2 1 n2n2 2 InstanceBuyingMaintPersonsSafetyClass 4lowmed4highgood InstanceBuyingMaintPersonsSafetyClass 2lowhigh2 unacc 5low 2highunacc 19

1 st Rule: Choosing 2nd condition Now we have the information gain for each of the possible conditions for the second antecedent of the rule, as shown in the following table. Possible ConditionsInformation Gain Buying = high= 1*(log 2 (1/(1+1)) - log 2 (2/(2+3))) = Buying = low= 1*(log 2 (1/(1+2)) - log 2 (2/(2+3))) = Maint = med= 1*(log 2 (2/(2+0)) - log 2 (2/(2+3))) = Persons = 4= 1*(log 2 (2/(2+0)) - log 2 (2/(2+3))) = There is a tie for highest info gain. We can pick any of the two conditions that yield the maximum. Let’s select the first of the two: “Maint = med” with highest information gain. Since its gain is > 0, we add this condition to the rule: Safety = high and Maint = med -> class = good 20

1 st Rule: Checking termination criteria Then, we need to determine if the construction should stop. The current rule doesn’t cover any negative examples, only the positive examples below: BuyingMaintPersonsSafetyClass highmed4highgood lowmed4highgood Therefore, there is no need to add more conditions to the rule. RIPPER’s construction of the first rule is now complete! 21

Ripper : Pruning the First Rule The First rule: Safety = high and Maint = med -> class = good To prune the above rule, the RIPPER algorithm will: Prepare a validation set which is apart from the training dataset. Apply the following metric to evaluate the pruned rule over the validation set: where p: the number of positive examples in the validation set covered by the rule. n: the number of negative examples in the validation set covered by the rule. Pruning method: First we considerpruning the last condition of the rule: “Maint = med”. If the v value of the rule Safety = high -> class = good is no lower than the v value of the rule Safety = high and Maint = med -> class = good then: (1) remove the last condition “Maint = med” from the rule, and (2) repeat this pruning method recursively with Safety = high -> class = good Otherwise, stop the pruning procedure (that is, do not consider removing any other conditions of the rule). 22