Classification with Decision Trees and Rules Evgueni Smirnov.

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Ch 3. Decision Tree Learning
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Decision Tree Learning
Decision tree learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Decision Trees and Rule Induction
Artificial Intelligence 7. Decision trees
and Confidential NOTICE: Proprietary and Confidential This material is proprietary to A. Teredesai and GCCIS, RIT. Slide 1 Decision.
Mohammad Ali Keyvanrad
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Classification Algorithms
Decision Tree Learning
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Decision Trees Jeff Storey.
Presentation transcript:

Classification with Decision Trees and Rules Evgueni Smirnov

Overview Classification Problem Decision Trees for Classification Decision Rules for Classification

Classification Task Given: X is an instance space defined as {X i } i ∈ 1..N where X i is a discrete/continuous variable. Y is a finite class set. Training data D ⊆ X x Y. Find: Class y ∈ Y of an instance x ∈ X.

Instances, Classes, Instance Spaces friendly robots A class is a set of objects in a world that are unified by a reason. A reason may be a similar appearance, structure or function. Example. The set: {children, photos, cat, diplomas} can be viewed as a class “Most important things to take out of your apartment when it catches fire”.

head = square body = round smiling = yes holding = flag color = yellow X Instances, Classes, Instance Spaces friendly robots

head = square body = round smiling = yes holding = flag color = yellow X friendly robots H smiling = yes  friendly robots M Instances, Classes, Instance Spaces

X H    M Classification problem

Decision Trees for Classification Classification Problem Definition of Decision Trees Variable Selection: Impurity Reduction, Entropy, and Information Gain Learning Decision Trees Overfitting and Pruning Handling Variables with Many Values Handling Missing Values Handling Large Data: Windowing

Decision Trees for Classification A decision tree is a tree where: –Each interior node tests a variable –Each branch corresponds to a variable value –Each leaf node is labelled with a class (class node) A1 A2 A3 c1 c2 c1 c2 c1 a11 a12 a13 a21a22 a31 a32

A simple database: playtennis DayOutlookTemperatureHumidityWindPlay Tennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildNormalWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolHighStrongYes D8SunnyMildNormalWeakNo D9SunnyHotNormalWeakYes D10RainMildNormalStrongYes D11SunnyCoolNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo

Decision Tree For Playing Tennis Outlook sunnyovercastrainy Humidity Windy highnormal no falsetrue yes no

Classification with Decision Trees Classify(x: instance, node: variable containing a node of DT) if node is a classification node then –return the class of node; else –determine the child of node that match x. –return Classify(x, child). A1 A2 A3 c1 c2 c1 c2 c1 a11 a12 a13 a21a22 a31 a32

Decision Tree Learning Basic Algorithm: 1. X i  the “best" decision variable for a node N. 2. Assign X i as decision variable for the node N. 3. For each value of X i, create new descendant of N. 4. Sort training examples to leaf nodes. 5. IF training examples perfectly classified, THEN Stop. ELSE Iterate over new leaf nodes.

Variable Quality Measures _____________________________________ Outlook Temp Hum Wind Play Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Rain Mild Normal Weak Yes Rain Mild High Strong No Outlook ____________________________________ Outlook Temp Hum Wind Play Sunny Hot High Weak No Sunny Hot High Strong No Sunny Mild High Weak No Sunny Cool Normal Weak Yes Sunny Mild Normal Strong Yes _____________________________________ Outlook Temp Hum Wind Play Overcast Hot High Weak Yes Overcast Cool Normal Strong Yes Sunny Overcast Rain

Variable Quality Measures Let S be a sample of training instances and p j be the proportions of instances of class j (j=1,…,J) in S. Define an impurity measure I(S) that satisfies: –I(S) is minimum only when p i =1 and p j =0 for j  i (all objects are of the same class); –I(S) is maximum only when p j =1/J (there is exactly the same number of objects of all classes); –I(S) is symmetric with respect to p 1,…,p J;

Reduction of Impurity: Discrete Variables The “best” variable is the variable X i that determines a split maximizing the expected reduction of impurity: where S xij is the subset of instances from S s.t. X i =x ij.    j x ij S xij I S S SIXiXi SI ) ( || || )(),( X i S xi1 S xi2 S xij …….

Information Gain: Entropy Let S be a sample of training examples, and p + is the proportion of positive examples in S and p - is the proportion of negative examples in S. Then: entropy measures the impurity of S: E( S) = - p + log 2 p + – p - log 2 p -

Entropy Example In the Play Tennis dataset we had two target classes: yes and no Out of 14 instances, 9 classified yes, rest no OutlookTemp. Humidi ty WindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo Overcas t HotHighFalseYes RainyMildHighFalseYes RainyCoolNormalFalseYes RainyCoolNormalTrueNo Overcas t CoolNormalTrueYes OutlookTemp. Humidi ty Windyplay SunnyMildHighFalseNo SunnyCoolNormalFalseYes RainyMildNormalFalseYes SunnyMildNormalTrueYes Overcas t MildHighTrueYes Overcas t HotNormalFalseYes RainyMildHighTrueNo

Information Gain Information Gain is the expected reduction in entropy caused by partitioning the instances from S according to a given discrete variable. Gain(S, X i ) = E(S) - where S xij is the subset of instances from S s.t. X i =x ij. X i S xi1 S xi2 S xij …….

Example _____________________________________ Outlook Temp Hum Wind Play Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Rain Mild Normal Weak Yes Rain Mild High Strong No Outlook ____________________________________ Outlook Temp Hum Wind Play Sunny Hot High Weak No Sunny Hot High Strong No Sunny Mild High Weak No Sunny Cool Normal Weak Yes Sunny Mild Normal Strong Yes _____________________________________ Outlook Temp Hum Wind Play Overcast Hot High Weak Yes Overcast Cool Normal Strong Yes Sunny Overcast Rain Which attribute should be tested here? Gain (S sunny, Humidity) = = (3/5) (2/5) 0.0 =.970 Gain (S sunny, Temperature) = (2/5) (2/5) (1/5) 0.0 =.570 Gain (S sunny, Wind) = (2/5) (3/5).918 =.019

Continuous Variables Temp.Play 80No 85No 83Yes 75Yes 68Yes 65No 64Yes 72No 75Yes 70Yes 69Yes 72Yes 81Yes 71No 85 Yes81 Yes83 Yes75 Yes75 No80 Yes70 No71 No72 Yes72 Yes69 Yes68 No65 Yes64 PlayTemp. Sort Temp.< 64.5  I=0.048 Temp.< 84  I=0.113 Temp.< 80.5  I=0.000 Temp.< 77.5  I=0.025 Temp.< 73.5  I=0.001 Temp.< 70.5  I=0.045 Temp.< 66.5  I=0.010

ID3 Algorithm Informally: –Determine the variable with the highest information gain on the training set. –Use this variable as the root, create a branch for each of the values the attribute can have. –For each branch, repeat the process with subset of the training set that is classified by that branch.

Hypothesis Space Search in ID3 The hypothesis space is the set of all decision trees defined over the given set of variables. ID3’s hypothesis space is a compete space; i.e., the target tree is there! ID3 performs a simple-to- complex, hill climbing search through this space.

Hypothesis Space Search in ID3 The evaluation function is the information gain. ID3 maintains only a single current decision tree. ID3 performs no backtracking in its search. ID3 uses all training instances at each step of the search.

Decision Trees are Non-linear Classifiers A2<0.33 ? good A1<0.91 ? A1<0.23 ? A2<0.91 ? A2<0.75 ? A2<0.49 ? A2<0.65 ? good badgood bad good yesno

Posterior Class Probabilities Outlook SunnyOvercastRainy no: 2 pos and 3 neg P pos = 0.4, P neg = 0.6 Windy FalseTrue no: 2 pos and 0 neg P pos = 1.0, P neg = 0.0 no: 0 pos and 2 neg P pos = 0.0, P neg = 1.0 no: 3 pos and 0 neg P pos = 1.0, P neg = 0.0

Overfitting Definition: Given a hypothesis space H, a hypothesis h  H is said to overfit the training data if there exists some hypothesis h’  H, such that h has smaller error that h’ over the training instances, but h’ has a smaller error that h over the entire distribution of instances.

Reasons for Overfitting Noisy training instances. Consider an noisy training example: Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = No This instance affects the training instances: Outlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes Outlook sunnyovercastrainy HumidityWindy highnormal no falsetrue yes no

Reasons for Overfitting Outlook sunnyovercastrainy HumidityWindy highnormal no falsetrue yes no Windy true yes false Temp high yesno mild cool ? Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = No Outlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes

area with probably wrong predictions Reasons for Overfitting Small number of instances are associated with leaf nodes. In this case it is possible that for coincidental regularities to occur that are unrelated to the actual borders.

Approaches to Avoiding Overfitting Pre-pruning: stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data Post-pruning: Allow the tree to overfit the data, and then post-prune the tree.

Pre-pruning Outlook SunnyOvercastRainy HumidityWindy HighNormal no FalseTrue yes no Sunny Rainy Outlook Overcast no yes ? It is difficult to decide when to stop growing the tree. A possible scenario is to stop when the leaf nodes get less than m training instances. Here is an example for m =

Validation Set Validation set is a set of instances used to evaluate the utility of nodes in decision trees. The validation set has to be chosen so that it is unlikely to suffer from same errors or fluctuations as the set used for decision-tree training. Usually before pruning the training data is split randomly into a growing set and a validation set.

Reduced-Error Pruning (Sub-tree replacement) Split data into growing and validation sets. Pruning a decision node d consists of: 1.removing the subtree rooted at d. 2.making d a leaf node. 3.assigning d the most common classification of the training instances associated with d. Outlook sunnyovercastrainy HumidityWindy highnormal no falsetrue yes no 3 instances2 instances Accuracy of the tree on the validation set is 90%.

Reduced-Error Pruning (Sub-tree replacement) Split data into growing and validation sets. Pruning a decision node d consists of: 1.removing the subtree rooted at d. 2.making d a leaf node. 3.assigning d the most common classification of the training instances associated with d. Outlook sunnyovercastrainy Windy no falsetrue yes no Accuracy of the tree on the validation set is 92.4%.

Reduced-Error Pruning (Sub-tree replacement) Split data into growing and validation sets. Pruning a decision node d consists of: 1.removing the subtree rooted at d. 2.making d a leaf node. 3.assigning d the most common classification of the training instances associated with d. Do until further pruning is harmful: 1.Evaluate impact on validation set of pruning each possible node (plus those below it). 2.Greedily remove the one that most improves validation set accuracy. Outlook sunnyovercastrainy Windy no falsetrue yes no Accuracy of the tree on the validation set is 92.4%.

Outlook Humidity Wind no yes no yes Sunny Overcast Rain HighNormal Strong Weak Temp. noyes Mild Cool,Hot Outlook Humidity Wind no yes no yes Sunny Overcast Rain HighNormal Strong Weak yes Outlook Wind no yes no yes Sunny Overcast Rain Strong Weak Outlook no yes Sunny Overcast Rain yes T2T2T2T2 T1T1T1T1 T3T3T3T3 T4T4T4T4 T5T5T5T5 Error GS =0%, Error VS =10% Error GS =6%, Error VS =8% Error GS =13%, Error VS =15% Error GS =27%, Error VS =25% Error GS =33%, Error VS =35% Reduced-Error Pruning (Sub-tree replacement)

Reduced Error Pruning Example

Reduced-Error Pruning (Sub-tree raising) Split data into growing and validation sets. Raising a sub-tree with root d consists of: 1.removing the sub-tree rooted at the parent of d. 2.place d at the place of its parent. 3.Sort the training instances associated with the parent of d using the sub-tree with root d. Outlook sunnyovercastrainy HumidityWindy highnormal no falsetrue yes no 3 instances2 instances Accuracy of the tree on the validation set is 90%.

Reduced-Error Pruning (Sub-tree raising) Split data into growing and validation sets. Raising a sub-tree with root d consists of: 1.removing the sub-tree rooted at the parent of d. 2.place d at the place of its parent. 3.Sort the training instances associated with the parent of d using the sub-tree with root d. Outlook sunnyovercastrainy HumidityWindy highnormal no falsetrue yes no 3 instances2 instances Accuracy of the tree on the validation set is 90%.

Reduced-Error Pruning (Sub-tree raising) Split data into growing and validation sets. Raising a sub-tree with root d consists of: 1.removing the sub-tree rooted at the parent of d. 2.place d at the place of its parent. 3.Sort the training instances associated with the parent of d using the sub-tree with root d. Humidity highnormal noyes Accuracy of the tree on the validation set is 73%. So, No!

Rule Post-Pruning IF (Outlook = Sunny) & (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) & (Humidity = Normal) THEN PlayTennis = Yes ………. 1.Convert tree to equivalent set of rules. 2.Prune each rule independently of others. 3.Sort final rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances. Outlook sunnyovercastrainy HumidityWindy normal no falsetrue yes no false

Decision Tree are non-linear. Can we make them linear? A2<0.33 ? good A1<0.91 ? A1<0.23 ? A2<0.91 ? A2<0.75 ? A2<0.49 ? A2<0.65 ? good badgood bad good yesno

Oblique Decision Trees x + y < 1 Class = + Class = Test condition may involve multiple attributes More expressive representation Finding optimal test condition is computationally expensive!

Variables with Many Values Problem: –Not good splits: they fragment the data too quickly, leaving insufficient data at the next level –The reduction of impurity of such test is often high (example: split on the object id). Two solutions: –Change the splitting criterion to penalize variables with many values –Consider only binary splits Letter a b c yz …

Variables with Many Values Example: outlook in the playtennis –InfoGain(outlook) = –Splitinformation(outlook) = –Gainratio(outlook) = 0.246/1.577=0.156 < Problem: the gain ratio favours unbalanced tests

Variables with Many Values

Missing Values 1.If node n tests variable X i, assign most common value of X i among other instances sorted to node n. 2.If node n tests variable X i, assign a probability to each of possible values of X i. These probabilities are estimated based on the observed frequencies of the values of X i. These probabilities are used in the information gain measure (via info gain).    j x ij S xij I S S SIXiXi SI ) ( || || )(),(

Windowing If the data don’t fit main memory use windowing: 1.Select randomly n instances from the training data D and put them in window set W. 2.Train a decision tree DT on W. 3. Determine a set M of instances from D misclassified by DT. 4. W = W U M. 5. IF Not(StopCondition) THEN GoTo 2;

Summary Points 1.Decision tree learning provides a practical method for concept learning. 2.ID3-like algorithms search complete hypothesis space. 3.The inductive bias of decision trees is preference (search) bias. 4.Overfitting the training data is an important issue in decision tree learning. 5.A large number of extensions of the ID3 algorithm have been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc.

Learning Decision Rules Decision Rules Basic Sequential Covering Algorithm Learn-One-Rule Procedure Pruning

Definition of Decision Rules Example: If you run the Prism algorithm from Weka on the weather data you will get the following set of decision rules: if outlook = overcast then PlayTennis = yes if humidity = normal and windy = FALSE then PlayTennis = yes if temperature = mild and humidity = normal then PlayTennis = yes if outlook = rainy and windy = FALSE then PlayTennis = yes if outlook = sunny and humidity = high then PlayTennis = no if outlook = rainy and windy = TRUE then PlayTennis = no Definition: Decision rules are rules with the following form: if then concept C.

Why Decision Rules? Decision rules are more compact. Decision rules are more understandable. Example: Let X  {0,1}, Y  {0,1}, Z  {0,1}, W  {0,1}. The rules are: if X=1 and Y=1 then 1 if Z=1 and W=1 then 1 Otherwise 0; X 0 Y 10 1Z 10 0 W Z 10 0 W

Why Decision Rules? Decision boundaries of decision trees Decision boundaries of decision rules

How to Learn Decision Rules? 1.We can convert trees to rules 2.We can use specific rule-learning methods

Sequential Covering Algorithms function LearnRuleSet(Target, Attrs, Examples, Threshold): LearnedRules :=  Rule := LearnOneRule(Target, Attrs, Examples) while performance(Rule,Examples) > Threshold, do LearnedRules := LearnedRules  {Rule} Examples := Examples \ {examples covered by Rule} Rule := LearnOneRule(Target, Attrs, Examples) sort LearnedRules according to performance return LearnedRules

IF true THEN pos Illustration

IF A THEN pos Illustration

IF true THEN pos IF A THEN posIF A & B THEN pos Illustration

IF true THEN pos Illustration IF A & B THEN pos

IF true THEN posIF C THEN pos Illustration IF A & B THEN pos

IF true THEN posIF C THEN posIF C & D THEN pos Illustration IF A & B THEN pos

Learning One Rule To learn one rule we use one of the strategies below: Top-down: –Start with maximally general rule –Add literals one by one Bottom-up: –Start with maximally specific rule –Remove literals one by one

Bottom-up vs. Top-down Top-down: typically more general rules Bottom-up: typically more specific rules

Learning One Rule Bottom-up: Example-driven (AQ family). Top-down: Generate-then-Test (CN-2).

Example of Learning One Rule

Heuristics for Learning One Rule –When is a rule “good”? High accuracy; Less important: high coverage. –Possible evaluation functions: Relative frequency: nc/n, where nc is the number of correctly classified instances, and n is the number of instances covered by the rule; m-estimate of accuracy: (nc+ mp)/(n+m), where nc is the number of correctly classified instances, n is the number of instances covered by the rule, p is the prior probablity of the class predicted by the rule, and m is the weight of p. Entropy.

How to Arrange the Rules 1.The rules are ordered according to the order they have been learned. This order is used for instance classification. 2.The rules are ordered according to their accuracy. This order is used for instance classification. 3.The rules are not ordered but there exists a strategy how to apply the rules (e.g., an instance covered by conflicting rules gets the classification of the rule that classifies correctly more training instances; if an instance is not covered by any rule, then it gets the classification of the majority class represented in the training data).

Approaches to Avoiding Overfitting Pre-pruning: stop learning the decision rules before they reach the point where they perfectly classify the training data Post-pruning: allow the decision rules to overfit the training data, and then post- prune the rules.

Post-Pruning 1.Split instances into Growing Set and Pruning Set; 2.Learn set SR of rules using Growing Set; 3.Find the best simplification BSR of SR. 4.while (Accuracy(BSR, Pruning Set) > Accuracy(SR, Pruning Set) ) do 4.1 SR = BSR; 4.2 Find the best simplification BSR of SR. 5. return BSR;

Incremental Reduced Error Pruning D1D1 D2D2 D3D3 D3D3 D 22 D1D1 D 21 Post-pruning

Incremental Reduced Error Pruning 1.Split Training Set into Growing Set and Validation Set; 2.Learn rule R using Growing Set; 3.Prune the rule R using Validation Set; 4.if performance(R, Training Set) > Threshold 4.1 Add R to Set of Learned Rules 4.2 Remove in Training Set the instances covered by R; 4.2 go to 1; 5. else return Set of Learned Rules

Summary Points 1.Decision rules are easier for human comprehension than decision trees. 2.Decision rules have simpler decision boundaries than decision trees. 3.Decision rules are learned by sequential covering of the training instances.

Lab 1: Some Details

Model Evaluation Techniques Evaluation on the training set: too optimistic Training set Classifier Training set

Model Evaluation Techniques Hold-out Method: depends on the make-up of the test set. Training set Classifier Test set Data To improve the precision of the hold-out method: it is repeated many times.

Model Evaluation Techniques k-fold Cross Validation Classifier Data train testtraintesttraintesttrain

Intro to outlook {sunny, overcast, temperature {hot, mild, humidity {high, windy {TRUE, play {TRUE, sunny,hot,high,FALSE,FALSE sunny,hot,high,TRUE,FALSE overcast,hot,high,FALSE,TRUE rainy,mild,high,FALSE,TRUE rainy,cool,normal,FALSE,TRUE rainy,cool,normal,TRUE,FALSE overcast,cool,normal,TRUE,TRUE ………….

References Mitchell, Tom. M Machine Learning. New York: McGraw-Hill Quinlan, J. R Induction of decision trees. Machine Learning Stuart Russell, Peter Norvig, Artificial Intelligence: A Modern Approach. New Jersey: Prantice Hall