Decision Tree Saed Sayad 9/21/2018.

Slides:

Advertisements

Similar presentations

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Advertisements

Decision Trees Decision tree representation ID3 learning algorithm

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Machine Learning in Real World: C4.5

Decision Tree Algorithm (C4.5)

ICS320-Foundations of Adaptive and Learning Systems

C4.5 - pruning decision trees

Classification Techniques: Decision Tree Learning

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.

Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.

Decision Trees Decision tree representation Top Down Construction

Ch 3. Decision Tree Learning

Decision Tree Pruning Methods Validation set – withhold a subset (~1/3) of training data to use for pruning –Note: you should randomize the order of training.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Decision Tree Learning

Decision tree learning

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Mohammad Ali Keyvanrad

Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.

Machine Learning Lecture 10 Decision Tree Learning 1.

CpSc 810: Machine Learning Decision Tree Learning.

Decision-Tree Induction & Decision-Rule Induction

CS690L Data Mining: Classification

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Decision Tree Learning

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.

Decision Tree Learning

ICS320-Foundations of Adaptive and Learning Systems

Machine Learning Inductive Learning and Decision Trees

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Università di Milano-Bicocca Laurea Magistrale in Informatica

CS 9633 Machine Learning Decision Tree Learning

Decision Tree Learning

Decision trees (concept learnig)

Machine Learning Lecture 2: Decision Tree Learning.

Decision trees (concept learnig)

Classification Algorithms

Decision Tree Learning

C4.5 - pruning decision trees

Data Science Algorithms: The Basic Methods

Decision Trees: Another Example

Artificial Intelligence

Lecture 3: Decision Tree Learning

Data Science Algorithms: The Basic Methods

Machine Learning Techniques for Data Mining

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Classification with Decision Trees

Machine Learning Chapter 3. Decision Tree Learning

Decision Trees.

Decision Trees Decision tree representation ID3 learning algorithm

INTRODUCTION TO Machine Learning

Decision Trees Jeff Storey.

Data Mining CSCI 307, Spring 2019 Lecture 15

Data Mining CSCI 307, Spring 2019 Lecture 6

Data Mining CSCI 307, Spring 2019 Lecture 9

Presentation transcript:

Decision Tree Saed Sayad 9/21/2018

Decision Tree (Mitchell 97) Decision tree induction is a simple but powerful learning paradigm. In this method a set of training examples is broken down into smaller and smaller subsets while at the same time an associated decision tree get incrementally developed. At the end of the learning process, a decision tree covering the training set is returned. 9/21/2018

Dataset Outlook Temp Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Cool Normal 9/21/2018

Decision Tree Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Attribute Node Value Node Leaf Node 9/21/2018

Frequency Tables Play Play No 5 / 14 = 0.36 Sort Yes 9 / 14 = 0.64 9/21/2018

Frequency Tables … Play Outlook Temp Humidity Windy Play 9/21/2018 Outlook | No Yes -------------------------------------------- Sunny | 3 2 Overcast | 0 4 Rainy | 2 3 Outlook Temp Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Cool Normal Temp | No Yes -------------------------------------------- Hot | 2 2 Mild | 2 4 Cool | 1 3 Humidity | No Yes -------------------------------------------- High | 4 3 Normal | 1 6 Windy | No Yes -------------------------------------------- False | 2 6 True | 3 3 9/21/2018

Entropy Entropy measures the impurity of S S is a set of examples Entropy(S) = - p log2 p - q log2 q Entropy measures the impurity of S S is a set of examples p is the proportion of positive examples q is the proportion of negative examples 9/21/2018

Entropy: One Variable Play Entropy(Play) = -p log2 p - q log2 q Yes No 9 / 14 = 0.64 5 / 14 = 0.36 Entropy(Play) = -p log2 p - q log2 q = - (0.64 * log2 0.64) - (0.36 * log2 0.36) = 0.94 9/21/2018

Entropy: One Variable Example: Entropy(5,3,2) = - (0.5 * log2 0.5) - (0.3 * log2 0.3) - (0.2 * log2 0.2) = 1.49 9/21/2018

Entropy: Two Variables Play Outlook | No Yes -------------------------------------------- Sunny | 3 2 | 5 Overcast | 0 4 | 4 Rainy | 2 3 | 5 | 14 Size of the subset Size of the set E (Play,Outlook) = (5/14)*0.971 + (4/14)*0.0 + (5/14)*0.971 = 0.693 9/21/2018

Information Gain Gain(S, A) = E(S) – E(S, A) Example: Gain(Play,Outlook) = 0.940 – 0.693 = 0.247 9/21/2018

Selecting The Root Node Play=[9+,5-] E=0.940 Outlook Sunny Overcast Rain [2+, 3-] [4+, 0-] [3+, 2-] E=0.971 E=0.971 E=0.0 Gain(Play,Outlook) = 0.940 – ((5/14)*0.971 + (4/14)*0.0 + (5/14)*0.971) = 0.247 9/21/2018

Selecting The Root Node … Play=[9+,5-] E=0.940 Temp Hot Mild Cool [3+, 1-] [2+, 2-] [4+, 2-] E=0.811 E=1.0 E=0.918 Gain(Play,Temp) = 0.940 – ((4/14)*1.0 + (6/14)*0.918 + (4/14)*0.811) = 0.029 9/21/2018

Selecting The Root Node … Play=[9+,5-] E=0.940 Humidity High Normal [3+, 4-] [6+, 1-] E=0.592 E=0.985 Gain(Play,Humidity) = 0.940 – ((7/14)*0.985 + (7/14)*0.592) = 0.152 9/21/2018

Selecting The Root Node … Play=[9+,5-] E=0.940 Windy Weak Strong [6+, 2-] [3+, 3-] E=0.811 E=1.0 Gain(Play,Wind) = 0.940 – ((8/14)*0.811 + (6/14)*1.0) = 0.048 9/21/2018

Selecting The Root Node … Play Outlook Gain=0.247 Temperature Gain=0.029 Humidity Gain=0.152 Windy Gain=0.048 9/21/2018

Decision Tree - Classification Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Attribute Node Value Node Leaf Node 9/21/2018

ID3 Algorithm A  the “best” decision attribute for next node Assign A as decision attribute for node 3. For each value of A create new descendant Sort training examples to leaf node according to the attribute value of the branch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leaf nodes. 9/21/2018

Converting Tree to Rules Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes R1: IF (Outlook=Sunny) AND (Humidity=High) THEN Play=No R2: IF (Outlook=Sunny) AND (Humidity=Normal) THEN Play=Yes R3: IF (Outlook=Overcast) THEN Play=Yes R4: IF (Outlook=Rain) AND (Wind=Strong) THEN Play=No R5: IF (Outlook=Rain) AND (Wind=Weak) THEN Play=Yes 9/21/2018

Decision Tree - Regression 9/21/2018

Dataset Numeric Outlook Temp Humidity Windy Players Sunny Hot High False 25 True 30 Overcast 46 Rainy Mild 45 Cool Normal 52 23 43 35 38 48 44 9/21/2018

Decision Tree - Regression Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak 30 45 50 55 25 Attribute Node Value Node Leaf Node 9/21/2018

Standard Deviation and Mean Players 25 30 46 45 52 23 43 35 38 48 44 SD (Players) = 9.32 Mean (Players) = 39.79 9/21/2018

Standard Deviation and Mean Players Standard Deviation and Mean Outlook | SD Mean -------------------------------------------- Sunny | 7.78 35.20 Overcast | 3.49 46.25 Rainy | 10.87 39.2 Outlook Temp Humidity Windy Players Sunny Hot High False 25 True 30 Overcast 46 Rainy Mild 45 Cool Normal 52 23 43 35 38 48 44 Temp | SD Mean -------------------------------------------- Hot | 8.95 36.25 Mild | 7.65 42.67 Cool | 10.51 39.00 Humidity | SD Mean -------------------------------------------- High | 9.36 37.57 Normal | 8.73 42.00 Windy | SD Mean -------------------------------------------- False | 7.87 41.36 True | 10.59 37.67 9/21/2018

Standard Deviation versus Entropy Decision Tree Classification Regression 9/21/2018

Information Gain versus Standard Error Reduction Decision Tree Classification Regression 9/21/2018

Selecting The Root Node Play=[14] SD=9.32 Outlook Sunny Overcast Rain [5] [5] [4] SD=10.87 SD=7.78 SD=3.49 SDR(Play,Outlook) = 9.32 - ((5/14)*7.78 + (4/14)*3.49 + (5/14)*10.87) = 1.662 9/21/2018

Selecting The Root Node … Play=[14] SD=9.32 Temp Hot Mild Cool [4] [4] [6] SD=10.51 SD=8.95 SD=7.65 SDR(Play,Temp) =9.32 - ((4/14)*8.95 + (6/14)*7.65 + (4/14)*10.51) =0.481 9/21/2018

Selecting The Root Node … Play=[14] SD= 9.32 Humidity High Normal [7] [7] SD=8.73 SD=9.36 SDR(Play,Humidity) =9.32 - ((7/14)*9.36 + (7/14)*8.73) =0.275 9/21/2018

Selecting The Root Node … Play=[14] SD= 9.32 Windy Weak Strong [8] [6] SD=7.87 SD=10.59 SDR(Play,Humidity) =9.32 - ((8/14)*7.87 + (6/14)*10.59) =0.284 9/21/2018

Selecting The Root Node … Players Outlook SDR=1.662 Temperature SDR=0.481 Humidity SDR=0.275 Windy SDR=0.284 9/21/2018

Decision Tree - Regression Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak 30 45 50 55 25 Attribute Node Value Node Leaf Node 9/21/2018

Decision Tree - Issues Working with Continuous Attributes – Discretization Overfitting and Pruning Super Attributes; attributes with many values Working with Missing Values Attributes with Different Costs 9/21/2018

Discretization Equally probable intervals This strategy creates a set of N intervals with the same number of elements. Equal width intervals The original range of values is divided into N intervals with the same range. Entropy based For each numeric attribute, instances are sorted and, for each possible threshold, a binary <, >= test is considered and evaluated in exactly the same way that a categorical attribute would be. 9/21/2018

Avoid Overfitting Overfitting when our learning algorithm continues develop hypotheses that reduce training set error at the cost of an increased test set error. Stop growing when data split not statistically significant (Chi2 test) Grow full tree then post-prune Minimum description length (MDL): Minimize: size(tree) + size(misclassifications(tree)) 9/21/2018

Post-pruning First, build full tree Then, prune it Fully-grown tree shows all attribute interactions Problem: some subtrees might be due to chance effects Two pruning operations: Subtree replacement Subtree raising Possible strategies: error estimation significance testing MDL principle 9/21/2018 witten & eibe

Subtree replacement Bottom-up Consider replacing a tree only after considering all its subtrees 9/21/2018 witten & eibe

Subtree Replacement 9/21/2018 witten & eibe

Error Estimation Transformed value for f : (i.e. subtract the mean and divide by the standard deviation) Resulting equation: Solving for p: 9/21/2018 witten & eibe

Error Estimation Error estimate for subtree is weighted sum of error estimates for all its leaves Error estimate for a node (upper bound): If c = 25% then z = 0.69 (from normal distribution) f is the error on the training data N is the number of instances covered by the leaf 9/21/2018 witten & eibe

Combined using ratios 6:2:6 gives 0.51 f = 5/14 e = 0.46 e < 0.51 so prune! f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 Combined using ratios 6:2:6 gives 0.51 9/21/2018 witten & eibe

Super Attributes The information gain equation, G(S,A) is biased toward attributes that have a large number of values over attributes that have a smaller number of values. Theses ‘Super Attributes’ will easily be selected as the root, result in a broad tree that classifies perfectly but performs poorly on unseen instances. We can penalize attributes with large numbers of values by using an alternative method for attribute selection, referred to as GainRatio. GainRatio(S,A) = Gain(S,A) / SplitInformation(S,A) 9/21/2018

Super Attributes … Outlook | No Yes -------------------------------------------- Sunny | 3 2 | 5 Overcast | 0 4 | 4 Rainy | 2 3 | 5 | 14 Split (Play,Outlook)= - (5/14*log2(5/14)+4/14*log2(4/15)+5/14*log2(5/14)) = 1.577 Gain (Play,Outlook) = 0.247 Gain Ratio (Play,Outlook) = 0.247/1.577 = 0.156 9/21/2018

Missing Values Most common value Most common value at node K Mean or Median Nearest Neighbor … 9/21/2018

Attributes with Different Costs Sometimes the best attribute for splitting the training elements is very costly. In order to make the overall decision process more cost effective we may wish to penalize the information gain of an attribute by its cost. 9/21/2018

Decision Trees: are simple, quick and robust are non-parametric can handle complex datasets can use any combination of categorical and continuous variables and missing values are not incremental and adaptive sometimes are not easy to be read … 9/21/2018

Thank You! 9/21/2018

Outlook Humidity Windy Temperature Sunny Overcast Rainy High Normal Yes No Yes Yes No Yes No Yes No Windy Temperature False True Yes No Yes No Hot Mild Cool Yes No Yes No Yes No 9/21/2018

Standard Deviation Outlook Windy Humidity Temperature 25 30 35 38 48 46 43 52 44 45 23 Sunny Overcast Rainy SD=7.78 SD=3.49 SD=10.87 Windy False True 25 46 45 52 35 38 44 30 23 43 48 52 SD=10.59 Standard Deviation SD=7.87 Humidity Temperature High Normal 25 30 46 45 35 52 52 23 43 38 46 48 44 Hot Mild Cool 25 30 46 44 45 35 46 48 52 30 52 23 43 38 SD=9.36 SD=8.73 SD=8.95 SD=10.51 SD=7.65 9/21/2018

Overfitting (Mitchell 97) Consider error of hypothesis h over Training data: errortrain(h) Entire distribution D of data: errorD(h) Hypothesis hH overfits training data if there is an alternative hypothesis h’H such that errortrain(h) < errortrain(h’) and errorD(h) > errorD(h’) 9/21/2018

Pruning Pruning steps: Step 1. Grow the Decision Tree with respect to the Training Set, Step 2. Randomly Select and Remove a Node. Step 3. Replace the node with its majority classification. Step 4. If the performance of the modified tree is just as good or better on the validation set as the current tree then set the current tree equal to the modified tree. While (not done) goto Step 2. 9/21/2018

Pruning … Rule Post-Pruning: Step 1. Grow the Decision Tree with respect to the Training Set, Step 2. Convert the tree into a set of rules. Step 3. Remove antecedents that result in a reduction of the validation set error rate. Step 4. Sort the resulting list of rules based on their accuracy and use this sorted list as a sequence for classifying unseen instances. 9/21/2018