Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
RIPPER Fast Effective Rule Induction
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Classification and Prediction
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
Classification Continued
Three kinds of learning
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Classification.
Decision Tree Learning
Decision tree LING 572 Fei Xia 1/16/06.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Tree Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Chapter 6 Classification and Prediction
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Practice Lecture 17
©Jiawei Han and Micheline Kamber
Decision Trees Jeff Storey.
Presentation transcript:

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning Avoiding overfitting through pruningAvoiding overfitting through pruning Numeric and missing attributesNumeric and missing attributes

Example of a Decision Tree Example: Learning to classify stars. Luminosity Mass Type A Type B Type C > T1 <= T1 > T2 <= T2

Short vs Long Hypotheses We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. Why is this the right thing to do? Occam’s Razor: Prefer the simplest hypothesis that fits the data. Back since William of Occam (1320). Great debate in the philosophy of science.

Issues in Decision Tree Learning Practical issues while building a decision tree can be enumerated as follows: 1)How deep should the tree be? 2)How do we handle continuous attributes? 3)What is a good splitting function? 4)What happens when attribute values are missing? 5)How do we improve the computational efficiency?

How deep should the tree be? Overfitting the Data A tree overfits the data if we let it grow deep enough so that it begins to capture “aberrations” in the data that harm the predictive power on unseen examples: size t2 t3 humidity Possibly just noise, but the tree is grown larger to capture these examples

Overtting the Data: Definition Assume a hypothesis space H. We say a hypothesis h in H overfits a dataset D if there is another hypothesis h’ in H where h has better classification accuracy than h’ on D but worse classification accuracy than h’ on D’ Size of the tree training data testing data overfitting

Causes for Overtting the Data What causes a hypothesis to overfit the data? 1)Random errors or noise Examples have incorrect class label or Examples have incorrect class label or incorrect attribute values. incorrect attribute values. 2)Coincidental patterns By chance examples seem to deviate from a pattern due to By chance examples seem to deviate from a pattern due to the small size of the sample. the small size of the sample. Overfitting is a serious problem that can cause strong performance degradation.

Solutions for Overtting the Data There are two main classes of solutions: 1)Stop the tree early before it begins to overfit the data. + In practice this solution is hard to implement because it + In practice this solution is hard to implement because it is not clear what is a good stopping point. is not clear what is a good stopping point. 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing problem shows up. Then prune the tree as a post-processing step. step. + This method has found great popularity in the machine + This method has found great popularity in the machine learning community. learning community.

Decision Tree Pruning 1.) Grow the tree to learn the training data training data 2.) Prune tree to avoid overfitting the data the data

Methods to Validate the New Tree Training and Validation Set Approach Divide dataset D into a training set TR and a Divide dataset D into a training set TR and a validation set TE validation set TE Build a decision tree on TR Build a decision tree on TR Test pruned trees on TE to decide the best final tree. Test pruned trees on TE to decide the best final tree. Dataset D Training TR Validation TE

Training and Validation There are two approaches: A.Reduced Error Pruning B.Rule Post-Pruning Dataset D Training TR (normally 2/3 of D) Validation TE (normally 1/3 of D)

Reduced Error Pruning Main Idea: 1) Consider all internal nodes in the tree. 2)For each node check if removing it (along with the subtree below it) and assigning the most common class to it does below it) and assigning the most common class to it does not harm accuracy on the validation set. not harm accuracy on the validation set. 3)Pick the node n* that yields the best performance and prune its subtree. its subtree. 4) Go back to (2) until no more improvements are possible.

Example Original Tree Possible trees after pruning:

Example Pruned Tree Possible trees after 2 nd pruning:

Example Process continues until no improvement is observed on the validation set: Size of the tree validation data Stop pruning the tree

Reduced Error Pruning Disadvantages:  If the original data set is small, separating examples away for validation may leave you with few examples for training. validation may leave you with few examples for training. Dataset D Training TR Testing TE Small dataset Training set is too small and so is the validation set

Rule Post-Pruning Main Idea: 1) Convert the tree into a rule-based system. 2)Prune every single rule first by removing redundant conditions. conditions. 3) Sort rules by accuracy.

Example x1 x2 x3 A B A C Original tree Rules: ~x1 & ~x2 -> Class A ~x1 & x2 -> Class B x1 & ~x3 -> Class A x1 & x3 -> Class C Possible rules after pruning (based on validation set): ~x1 -> Class A ~x1 & x2 -> Class B ~x3 -> Class A ~x3 -> Class A x1 & x3 -> Class C

Advantages of Rule Post-Pruning  The language is more expressive.  Improves on interpretability.  Pruning is more flexible.  In practice this method yields high accuracy performance.

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning Avoiding overfitting through pruningAvoiding overfitting through pruning Numeric and missing attributesNumeric and missing attributes

Discretizing Continuous Attributes Example: attribute temperature. 1) Order all values in the training set 2) Consider only those cut points where there is a change of class 3) Choose the cut point that maximizes information gain temperature

Claude Shannon 1916 – 2001 Funded information theory on 1948 with his paper: “A Mathematical Theory of Communication” Awarded the Alfred Noble American Institute of American Engineers Award for his master’s thesis. Worked at MIT, Bell Labs. Met with Alan Turing, Marvin Minsky, John von Neumann, and Albert Einstein. Creator of the “Ultimate Machine”.

Missing Attribute Values We are at a node n in the decision tree. Different approaches: 1)Assign the most common value for that attribute in node n. 2)Assign the most common value in n among examples with the same classification as X. same classification as X. 3)Assign a probability to each value of the attribute based on the frequency of those values in node n. Each fraction is propagated frequency of those values in node n. Each fraction is propagated down the tree. down the tree. Example: X = (luminosity > T1, mass = ?)

Summary  Decision-tree induction is a popular approach to classification that enables us to interpret the output hypothesis. that enables us to interpret the output hypothesis.  The hypothesis space is very powerful: all possible DNF formulas.  We prefer shorter trees than larger trees.  Overfitting is an important issue in decision-tree induction.  Different methods exist to avoid overfitting like reduced-error pruning and rule post-processing. pruning and rule post-processing.  Techniques exist to deal with continuous attributes and missing attribute values. attribute values.