Decision Tree Learning

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning - ID3
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
Decision Trees Decision tree representation Top Down Construction
Decision Trees Chapter 18 From Data to Knowledge.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch 3. Decision Tree Learning
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Mohammad Ali Keyvanrad
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Decision-Tree Induction & Decision-Rule Induction
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Ch9: Decision Trees 9.1 Introduction A decision tree:
Machine Learning: Decision Tree Learning
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Decision Trees Decision tree representation ID3 learning algorithm
Decision Trees Berlin Chen
Presentation transcript:

Decision Tree Learning Chapter 3 Decision Tree Learning Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting

Review example: Image Categorization (two phases) Training Labels Training Images Classifier Training Training Image Features Trained Classifier Image Features Testing Test Image Trained Classifier Outdoor Prediction

Occam’s Razor: prefer the simplest hypothesis consistent with data Inductive Learning Learning a function from examples Occam’s Razor: prefer the simplest hypothesis consistent with data One of the most widely used inductive learnings: Decision Tree Learning

Decision Tree Example Each internal node corresponds to a test + : Filled with blue - : Filled with red I Color Shape Size + - big small round square red green blue Each internal node corresponds to a test Each branch corresponds to a result of the test Each leaf node assigns a classification

PlayTennis: Training Examples Sample, attribute, target_attribute

A Decision Tree for the concept PlayTennis Outlook? Humidity? Wind? Sunny Overcast Rain Yes High Normal Strong Weak No Finding the most suitable attribute for the root Finding redundant attributes, like temperature

Convert a tree to rule

Decision trees can represent any Boolean function If (O=Sunny AND H=Normal) OR (O=Overcast) OR (O=Rain AND W=Weak) then YES “A disjunction of conjunctions of constraints on attribute values”

Decision trees can represent any Boolean function In the worst case, it need exponentially many nodes XOR, as an extreme case

Decision tree, decision boundaries

Decision Regions

Decision Trees One of the most widely used and practical methods for inductive inference Approximates discrete-valued functions can be extended to continuous valued functions Can be used for classification (most common) or regression problems رگراسیون: تحلیل رابطه بین دو پارامتر پیوسته، فیت کردن یک خط روی چند نقطه

Decision Trees for Regression (Continuous values)

Divide and Conquer Internal decision nodes Leaves Univariate: Uses a single attribute, xi Discrete xi : n-way split for n possible values Continuous xi : Binary split : xi > wm Multivariate: Uses more than one attributes Leaves Classification: Class labels Regression: Numeric Once the tree is trained, a new instance is classified by starting at the root and following the path as dictated by the test results for this instance.

Decision tree learning algorithm Learning process: finding the tree from the training set For a given training set, there are many trees that code it without any error Finding the smallest tree is NP-complete (Quinlan 1986), hence we are forced to use some (local) search algorithm to find reasonable solutions

ID3: The basic decision tree learning algorithm Basic idea: A decision tree can be constructed by considering attributes of instances one by one. Which attribute should be considered first? The height of a decision tree depends on the order attributes that are considered. ==> Entropy

Which Attribute is ”best”? True False [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-] Entropy: Large Entropy => more information

Entropy Entropy(S) = -p+ log2 p+ - p- log2 p- S is training examples p+ is the proportion of positive examples p- is the proportion of negative examples Exercise: Calculate the Entropy in two cases (Note that: 0Log20 =0): P+ = 0.5, P- = 0.5 P+ = 1, P- = 0 Question: Why (-) in the equation?

Entropy A measure of uncertainty Example: Information theory Probability of Head in a fair coin Same for a coin with two head Information theory آنتروپی زیاد=عدم قطعیت زیاد=بیشتر تصادفی بودن پدیده=اطلاعات بیشتر= نیاز به تعداد بیت های بیشتر برای کد کردن

Entropy For multi-class problems with c categories, entropy generalizes to: Q: Why log2?

Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A where Sv is the subset of S having value v for attribute A

Information Gain Entropy(S) = ?, Gain(S,A1)=?, Gain(S,A2)=? True False [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-] Entropy(S) = ?, Gain(S,A1)=?, Gain(S,A2)=? Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99

Information Gain Entropy([18+,33-]) = 0.94 Entropy([11+,2-]) = 0.62 True False [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-] Entropy([18+,33-]) = 0.94 Entropy([11+,2-]) = 0.62 Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.11 Entropy([21+,5-]) = 0.71 Entropy([8+,30-]) = 0.74 Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27 A1 is higher in the tree

ID3 for the playTennis example

ID3: The Basic Decision Tree Learning Algorithm What is the “best” attribute? [“best” = with highest information gain] Answer: Outlook

ID3 (Cont’d) Yes Outlook Humidity Wind D14 What are the Sunny Rain Overcast D6 D1 D8 D10 D3 D14 D4 D11 D12 D9 D2 D7 D5 D13 Yes What are the “best” next attributes? Humidity Wind

PlayTennis Decision Tree Outlook? Humidity? Wind? Sunny Overcast Rain Yes High Normal Strong Weak No

Stopping criteria each leaf-node contains examples of one type, or algorithm ran out of attributes

ID3

Over fitting in Decision Trees Why “over”-fitting? A model can become more complex than the true target function (concept) when it tries to satisfy noisy data as well. hypothesis complexity accuracy on training data on test data

Over fitting in Decision Trees

Overfitting Example Testing Ohms Law: V = IR Perfect fit to training data with an 9th degree polynomial (can fit n points exactly with an n-1 degree polynomial) Experimentally measure 10 points Fit a curve to the Resulting data. current (I) voltage (V) Ohm was wrong, we have found a more accurate function!

Overfitting Example Testing Ohms Law: V = IR current (I) voltage (V) Better generalization with a linear function that fits training data less accurately.

Avoiding over-fitting the data How can we avoid overfitting? There are 2 approaches: Early stopping: stop growing the tree before it perfectly classifies the training data Pruning: grow full tree, then prune Reduced error pruning Rule post-pruning Pruning approach is found more useful in practice.

Other issues in Decision tree learning Incorporating continuous valued attributes Alternative measures for selecting attributes Handling training examples with missing attribute value Handling attributes with different costs

Strengths and Advantages of Decision Trees Rule extraction from trees A decision tree can be used for feature extraction (e.g. seeing which features are useful) Interpretability: human experts may verify and/or discover patterns It is a compact and fast classification method

Your Assignments HW1 is uploaded, Due date: 94/08/14 Proposal: same due date One page maximum Include the following information: Project title Data set Project idea. approximately two paragraphs. Software you will need to write. Papers to read. Include 1-3 relevant papers. Teammate (if any) and work division. We expect projects done in a group to be more substantial than projects done individually.