Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
RIPPER Fast Effective Rule Induction
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
C4.5 - pruning decision trees
Chapter 7 – Classification and Regression Trees
Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Credit Card Applicants’ Credibility Prediction with Decision Tree n Dan Xiao n Jerry Yang.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Data Mining: Classification
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Scaling up Decision Trees. Decision tree learning.
CS690L Data Mining: Classification
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Decision Tree Learning
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Using Classification Trees to Decide News Popularity
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
1 Illustration of the Classification Task: Learning Algorithm Model.
CS4445/B12 Provided by: Kenneth J. Loomis. genrecritics-reviewsratingIMAXlikes comedythumbs-upRFALSEno comedythumbs-upRTRUEno comedyneutralRFALSEno actionthumbs-downPG-13TRUEno.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Trees with Numeric Tests. Industrial-strength algorithms  For an algorithm to be useful in a wide range of real- world applications it must:
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /13/98 Finished Administrative –Final exam Tuesday, Mar. 17, 2:30-4:20 p.m., here –Additional help today after class Saturday
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Illustrating Classification Task
Fast Effective Rule Induction
Decision Trees with Numeric Tests
C4.5 - pruning decision trees
Decision Trees (suggested time: 30 min)
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
© 2013 ExcelR Solutions. All Rights Reserved An Introduction to Creating a Perfect Decision Tree.
Decision Trees Greg Grudic
ID3 Algorithm.
NeC4.5 —— Neural Ensemble Based C4.5
Clustering.
Machine Learning: Lecture 3
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
CSCI N317 Computation for Scientific Applications Unit Weka
ECE/CS/ME 539 Artificial Neural Networks Final Project
INTRODUCTION TO Machine Learning
Arko Barman COSC 6335 Data Mining
Presentation transcript:

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning Avoiding overfitting through pruningAvoiding overfitting through pruning Numeric and Missing attributesNumeric and Missing attributes

Illustration Example: Learning to classify stars. Luminosity Mass Type A Type B Type C > r1 <= r1 > r2 <= r2

Definition A decision-tree learning algorithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corresponds to a class. A decision-tree learning algorithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corresponds to a class. There are two types of nodes: There are two types of nodes: Internal node.- Splits into different branches according to the different values the corresponding attribute can take. Example: luminosity r1. Internal node.- Splits into different branches according to the different values the corresponding attribute can take. Example: luminosity r1. Terminal Node.- Decides the class assigned to the example. Terminal Node.- Decides the class assigned to the example.

Classifying Examples Luminosity Mass Type A Type B Type C > r1 <= r1 > r2 <= r2 X = (Luminosity r2) Assigned Class

Appropriate Problems for Decision Trees  Attributes are both numeric and nominal.  Target function takes on a discrete number of values.  Data may have errors.  Some examples may have missing attribute values.

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning Avoiding overfitting through pruningAvoiding overfitting through pruning Numeric and Missing attributesNumeric and Missing attributes

Historical Information Ross Quinlan – Induction of Decision Trees. Machine Learning Journal 1: , 1986 (over 8 thousand citations)

Historical Information Leo Breiman – CART (Classification and Regression Trees), 1984.

Mechanism There are different ways to construct trees from data. We will concentrate on the top-down, greedy search approach: Basic idea: 1. Choose the best attribute a* to place at the root of the tree. 1. Choose the best attribute a* to place at the root of the tree. 2. Separate training set D into subsets { D1, D2,.., Dk } where 2. Separate training set D into subsets { D1, D2,.., Dk } where each subset Di contains examples having the same value for a* each subset Di contains examples having the same value for a* 3. Recursively apply the algorithm on each new subset until 3. Recursively apply the algorithm on each new subset until examples have the same class or there are few of them. examples have the same class or there are few of them.

Illustration Attributes: size and humidity Size has two values: >r1 or r1 or <= r1 Humidity has three values: >r2, (>r3 and r2, (>r3 and <=r2), <= r3 size humidity r1 r2 r3 Class P: poisonous Class N: not-poisonous

Illustration humidity r1 r2 r3 Suppose we choose size as the best attribute: size P > r1 <= r1 Class P: poisonous Class N: not-poisonous Class N: not-poisonous ?

Illustration humidity r1 r2 r3 Suppose we choose humidity as the next best attribute: size P > r1 <= r1 humidity P NP NP >r2 <= r3 > r3 & r3 & <= r2

Formal Mechanism Create a root for the tree Create a root for the tree If all examples are of the same class or the number of examples If all examples are of the same class or the number of examples is below a threshold return that class is below a threshold return that class If no attributes available return majority class If no attributes available return majority class Let a* be the best attribute Let a* be the best attribute For each possible value v of a* For each possible value v of a* Add a branch below a* labeled “a = v” Add a branch below a* labeled “a = v” Let Sv be the subsets of example where attribute a*=v Let Sv be the subsets of example where attribute a*=v Recursively apply the algorithm to Sv Recursively apply the algorithm to Sv

What attribute is the best to split the data? Let us remember some definitions from information theory. A measure of uncertainty or entropy that is associated to a random variable X is defined as H(X) = - Σ pi log pi where the logarithm is in base 2. This is the “average amount of information or entropy of a finite complete probability scheme” (Introduction to I. Theory by Reza F.).

P(A) = 1/256, P(B) = 255/256 P(A) = 1/256, P(B) = 255/256 H(X) = bit H(X) = bit P(A) = 1/2, P(B) = 1/2 P(A) = 1/2, P(B) = 1/2 H(X) = 1 bit H(X) = 1 bit P(A) = 7/16, P(B) = 9/16 P(A) = 7/16, P(B) = 9/16 H(X) = bit H(X) = bit There are two possible complete events A and B (Example: flipping a biased coin).

Entropy is a function concave downward bit

Illustration Attributes: size and humidity Size has two values: >r1 or r1 or <= r1 Humidity has three values: >r2, (>r3 and r2, (>r3 and <=r2), <= r3 size humidity r1 r2 r3 Class P: poisonous Class N: not-poisonous

Splitting based on Entropy sizer1 r2 r3 humidity Size divides the sample in two. S1 = { 6P, 0NP} S2 = { 3P, 5NP} S1 S2 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) -(5/8)log2(5/8)

Splitting based on Entropy sizer1 r2 r3 humidity humidity divides the sample in three. S1 = { 2P, 2NP} S2 = { 5P, 0NP} S3 = { 2P, 3NP} S1 S3 H(S1) = 1 H(S2) = 0 H(S3) = -(2/5)log2(2/5) -(3/5)log2(3/5) -(3/5)log2(3/5) S2

Information Gain IG(A) = H(S) - Σv (Sv/S) H (Sv) H(S) is the entropy of all examples H(Sv) is the entropy of one subsample after partitioning S based on all possible values of attribute A.

Components of IG(A) sizer1 r2 r3 humidity S1 S2 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) -(5/8)log2(5/8) H(S) = -(9/14)log2(9/14) -(5/14)log2(5/14) -(5/14)log2(5/14) |S1|/|S| = 6/14 |S2|/|S| = 8/14

Components of IG(A) sizer1 r2 r3 humidity S1 S2 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) -(5/8)log2(5/8) H(S) = -(9/14)log2(9/14) -(5/14)log2(5/14) -(5/14)log2(5/14) |S1|/|S| = 6/14 |S2|/|S| = 8/14

Gain Ratio Let’s define the entropy of the attribute: H(A) = - Σ pj log pj Where pj is the probability that attribute A takes value Vj. Then GainRatio(A) = IG(A) / H(A)

Gain Ratio sizer1 r2 r3 humidity S1 S2 H(size) = -(6/14)log2(6/14) -(8/14)log2(8/14) where |S1|/|S| = 6/14 |S2|/|S| = 8/14