Fall 2004 TDIDT Learning CS478 - Machine Learning.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning - ID3
Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Classification Algorithms
Decision Tree Approach in Data Mining
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Induction of Decision Trees
Decision Trees Decision tree representation Top Down Construction
Classification Continued
Decision Trees an Introduction.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Classification II.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Longin Jan Latecki Temple University
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Decision Trees.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision Tree Learning
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Presentation transcript:

Fall 2004 TDIDT Learning CS478 - Machine Learning

Decision Tree Internal nodes  tests on some property Branches from internal nodes  values of the associated property Leaf nodes  classifications An individual is classified by traversing the tree from its root to a leaf

Sample Decision Tree

Decision Tree Learning Learning consists of constructing a decision tree that allows the classification of objects. Given a set of training instances, a decision tree is said to represent the classifications if it properly classifies all of the training instances (i.e., is consistent).

TDIDT Function Induce-Tree(Example-set, Properties) If all elements in Example-set are in the same class, then return a leaf node labeled with that class Else if Properties is empty, then return a leaf node labeled with the majority class in Example-set Else Select P from Properties (*) Remove P from Properties Make P the root of the current tree For each value V of P Create a branch of the current tree labeled by V Partition_V  Elements of Example-set with value V for P Induce-Tree(Partition_V, Properties) Attach result to branch V

Illustrative Training Set

ID3 Example (I)

ID3 Example (II)

ID3 Example (III)

Non-Uniqueness Decision trees are not unique: Given a set of training instances, there generally exists a number of decision trees that represent the classifications The learning problem states that we should seek not only consistency but also generalization. So, …

TDIDT’s Question Given a training set, which of all of the decision trees consistent with that training set has the greatest likelihood of correctly classifying unseen instances of the population?

ID3’s (Approximate) Bias ID3 (and family) prefers the simplest decision tree that is consistent with the training set. Occam’s Razor Principle: “It is vain to do with more what can be done with less...Entities should not be multiplied beyond necessity.” i.e., always accept the simplest answer that fits the data / avoid unnecessary constraints.

ID3’s Property Selection Each property of an instance may be thought of as contributing a certain amount of information to its classification. For example, determine shape of an object: number of sides contributes a certain amount of information to the goal; color contributes a different amount of information. ID3 measures the information gained by making each property the root of the current subtree and subsequently chooses the property that produces the greatest information gain.

Discussion (I) In terms of learning as search, ID3 works as follows: Search space = set of all possible decision trees Operations = adding tests to a tree Form of hill-climbing: ID3 adds a subtree to the current tree and continues its search (no backtracking, local minima) It follows that ID3 is very efficient, but its performance depends on the criteria for selecting properties to test (and their form)

Discussion (II) ID3 handles only discrete attributes. Extensions to numerical attributes have been proposed, the most famous being C5.0 Experience shows that TDIDT learners tend to produce very good results on many problems Trees are most attractive when end users want interpretable knowledge from their data

Entropy (I) Let S be a set examples from c classes Where pi is the proportion of examples of S belonging to class i. (Note, we define 0log0=0)

Entropy (II) Intuitively, the smaller the entropy, the purer the partition Based on Shannon’s information theory (c=2): If p1=1 (resp. p2=1), then receiver knows example is positive (resp. negative). No message need be sent. If p1=p2=0.5, then receiver needs to be told the class of the example. 1-bit message must be sent. If 0<p1<1, then receiver needs a less than 1 bit on average to know the class of the example.

Information Gain Let p be a property with n outcomes The information gained by partitioning a set S according to p is: Where Si is the subset of S for which property p has its ith value

Play Tennis What is the ID3 induced tree? Overcast Hot High Weak Yes OUTLOOK TEMERATURE HUMIDITY WIND PLAY TENNIS Overcast Hot High Weak Yes Normal Sunny No Mild Strong Rain Cool What is the ID3 induced tree?

ID3’s Splitting Criterion The objective of ID3 at each split is to increase information gain, or equivalently, to lower entropy. It does so as much as possible Pros: Easy to do Cons: May lead to overfitting

Overfitting Given a hypothesis space H, a hypothesis hH is said to overfit the training data if there exists some alternative hypothesis h’ H, such that h has smaller error than h’ over the training examples, but h’ has smaller error than h over the entire distribution of instances

Avoiding Overfitting Two alternatives Stop growing the tree, before it begins to overfit (e.g., when data split is not statistically significant) Grow the tree to full (overfitting) size and post-prune it Either way, when do I stop? What is the correct final tree size?

Approaches Use only training data and a statistical test to estimate whether expanding/pruning is likely to produce an improvement beyond the training set Use MDL to minimize size(tree) + size(misclassifications(tree)) Use a separate validation set to evaluate utility of pruning Use richer node conditions and accuracy

Reduced Error Pruning Split dataset into training and validation sets Induce a full tree from the training set While the accuracy on the validation set increases Evaluate the impact of pruning each subtree, replacing its root by a leaf labeled with the majority class for that subtree Remove the subtree that most increases validation set accuracy (greedy approach)

Rule Post-pruning Split dataset into training and validation sets Induce a full tree from the training set Convert the tree into an equivalent set of rules For each rule Remove any preconditions that result in increased rule accuracy on the validation set Sort the rules by estimated accuracy Classify new examples using the new ordered set of rules

Discussion Reduced-error pruning produces the smallest version of the most accurate subtree Rule post-pruning is more fine-grained and possibly the most used method In all cases, pruning based on a validation set is problematic when the amount of available data is limited

Accuracy vs Entropy ID3 uses entropy to build the tree and accuracy to prune it Why not use accuracy in the first place? How? How does it compare with entropy? Is there a way to make it work?

Other Issues The text briefly discusses the following aspects of decision tree learning: Continuous-valued attributes Alternative splitting criteria (e.g., for attributes with many values) Accounting for costs

Unknown Attribute Values Alternatives: Remove examples with missing attribute values Treat missing value as a distinct, special value of the attribute Replace missing value with most common value of the attribute Overall At node n At node n with same class label Use probabilities