Constructing Decision Trees. A Decision Tree Example The weather data example. ID codeOutlookTemperatureHumidityWindyPlay abcdefghijklmnabcdefghijklmn.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Lecture 9.
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning in Real World: C4.5
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Decision Tree Algorithm
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification: Decision Trees
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Decision Trees Chapter 18 From Data to Knowledge.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Experimental Evaluation
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Chapter 4: Algorithms CS 795.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Learning from Observations Chapter 18 Through
Decision-Tree Induction & Decision-Rule Induction
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Decision Trees MSE 2400 EaLiCaRA Dr. Tom Way.
Decision Trees by Muhammad Owais Zahid
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
C4.5 - pruning decision trees
Teori Keputusan (Decision Theory)
Artificial Intelligence
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Advanced Artificial Intelligence
Machine Learning Techniques for Data Mining
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Machine Learning Chapter 3. Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Constructing Decision Trees

A Decision Tree Example The weather data example. ID codeOutlookTemperatureHumidityWindyPlay abcdefghijklmnabcdefghijklmn Sunny Overcast Rainy Overcast Sunny Rainy Sunny Overcast Rainy Hot Mild Cool Mild Cool Mild Hot Mild High Normal High Normal High Normal High False True False True False True False True No Yes No Yes No Yes No

~continues Outlook humiditywindy yes noyes no sunnyovercastrainy highnormalfalsetrue Decision tree for the weather data.

The Process of Constructing a Decision Tree Select an attribute to place at the root of the decision tree and make one branch for every possible value. Repeat the process recursively for each branch.

Which Attribute Should Be Placed at a Certain Node One common approach is based on the information gained by placing a certain attribute at this node.

Information Gained by Knowing the Result of a Decision In the weather data example, there are 9 instances of which the decision to play is “yes” and there are 5 instances of which the decision to play is “no’. Then, the information gained by knowing the result of the decision is

The General Form for Calculating the Information Gain Entropy of a decision = P 1, P 2, …, P n are the probabilities of the n possible outcomes.

Information Further Required If “Outlook” Is Placed at the Root Outlook yes no yes no sunnyovercastrainy

Information Gained by Placing Each of the 4 Attributes Gain(outlook) = bits – bits = bits. Gain(temperature) = bits. Gain(humidity) = bits. Gain(windy) = bits.

The Strategy for Selecting an Attribute to Place at a Node Select the attribute that gives us the largest information gain. In this example, it is the attribute “Outlook”. Outlook 2 “yes” 3 “no” 4 “yes” 3 “yes” 2 “no” sunnyovercastrainy

The Recursive Procedure for Constructing a Decision Tree The operation discussed above is applied to each branch recursively to construct the decision tree. For example, for the branch “Outlook = Sunny”, we evaluate the information gained by applying each of the remaining 3 attributes. Gain(Outlook=sunny;Temperature) = – 0.4 = Gain(Outlook=sunny;Humidity) = – 0 = Gain(Outlook=sunny;Windy) = – = 0.02

Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch “Outlook = rainy”. Gain(Outlook=rainy;Temperature) = – = 0.02 Gain(Outlook=rainy;Humidity) = – = 0.02 Gain(Outlook=rainy;Windy) =0.971 – 0 = 0.971

The Over-fitting Issue Over-fitting is caused by creating decision rules that work accurately on the training set based on insufficient quantity of samples. As a result, these decision rules may not work well in more general cases.

Example of the Over-fitting Problem in Decision Tree Construction 11 “Yes” and 9 “No” samples; prediction = “Yes” 8 “Yes” and 9 “No” samples; prediction = “No” 3 “Yes” and 0 “No” samples; prediction = “Yes” A i =0A i =1

Hence, with the binary split, we gain more information. However, if we look at the pessimistic error rate, i.e. the upper bound of the confidence interval of the error rate, we may get different conclusion. The formula for the pessimistic error rate is Note that the pessimistic error rate is a function of the confidence level used.

The pessimistic error rates under 95% confidence are

Therefore, the average pessimistic error rate at the children is Since the pessimistic error rate increases with the split, we do not want to keep the children. This practice is called “tree pruning”.

Tree Pruning based on  2 Test of Independence We construct the corresponding contingency table Ai=0Ai=0 Ai=1Ai=1 Yes3811 No “Yes” and 9 “No” samples; 8 “Yes” and 9 “No” samples; 3 “Yes” and 0 “No samples; A i =0A i =1

Therefore, we should not split the subroot node, if we require that the  2 statistic must be larger than  2 k,0.05, where k is the degree of freedom of the corresponding contingency table.

Constructing Decision Trees based on  2 test of Independence Using the following example, we can construct a contingency table accordingly. 75 “Yes”s out of 100 samples; Prediction = “Yes” 45 “Yes”s out of 50 samples; 20 “Yes”s out of 25 samples; 10 “Yes”s out of 25 samples; A i =0A i =1 A i =2

Therefore, we may say that the split is statistically robust.

Assume that we have another attribute A j to consider A j =0A j =1 Yes No “Yes” out of 100 samples; 50 “Yes” out of 75 samples; 25 “Yes” out of 25 samples; A j =0A j =1

Now, both A i and A j pass our criterion. How should we make our selection? We can make our selection based on the significance levels of the two contingency tables.

Therefore, A i is preferred over A j.

If a subtree is as follows  2 = < In this case, we do not want to carry out the split. 15 “Yes”s out of 20 samples; 9 “Yes”s out of 10 samples; 4 “Yes”s out of 5 samples; 2 “Yes”s out of 5 samples; Termination of Split due to Low Significance level

A More Realistic Example and Some Remarks In the following example, a bank wants to derive a credit evaluation tree for future use based on the records of existing customers. As the data set shows, it is highly likely that the training data set contains inconsistencies. Furthermore, some values may be missing. Therefore, for most cases, it is impossible to derive perfect decision trees, i.e. decision trees with 100% accuracy.

~continues AttributesClass EducationAnnual IncomeAgeOwn HouseSexCredit ranking CollegeHighOldYesMaleGood High school-----MiddleYesMaleGood High schoolMiddleYoungNoFemaleGood CollegeHighOldYesMalePoor CollegeHighOldYesMaleGood CollegeMiddleYoungNoFemaleGood High schoolHighOldYesMalePoor CollegeMiddle -----FemaleGood High schoolMiddleYoungNoMalePoor

~continues A quality measure of decision trees can be based on the accuracy. There are alternative measures depending on the nature of applications. Overfitting is a problem caused by making the derived decision tree work accurately for the training set. As a result, the decision tree may work less accurately in the real world.

~continues There are two situations in which overfitting may occur: insufficient number of samples at the subroot. some attributes are highly branched. A conventional practice for handling missing values is to treat them as possible attribute values. That is, each attribute has one additional attribute value corresponding to the missing value.

Alternative Measures of Quality of Decision Trees The recall rate and precision are two widely used measures. where C is the set of samples in the class and C’ is the set of samples which the decision tree puts into the class.

~continues A situation in which the recall rate is the main concern: “A bank wants to find all the potential credit card customers”. A situation in which precision is the main concern: “A bank wants to find a decision tree for credit approval.”