ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Classification with Multiple Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Classification Techniques: Decision Tree Learning
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Ensemble Learning (2), Tree and Forest
Decision Tree Learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Learning from observations
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Tree Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Decision Trees.
Classification and Regression Trees
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
LECTURE 20: DECISION TREES
Ch9: Decision Trees 9.1 Introduction A decision tree:
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
ECE 471/571 – Lecture 12 Decision Tree.
Introduction to Data Mining, 2nd Edition by
Random Survival Forests
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Decision trees.
Classification with CART
INTRODUCTION TO Machine Learning
LECTURE 18: DECISION TREES
ECE – Pattern Recognition Lecture 13 – Decision Tree
INTRODUCTION TO Machine Learning 2nd Edition
Presentation transcript:

ECE 471/571 – Lecture 20 Decision Tree 11/19/15

2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering

3 Some Terminologies Decision tree Root Link (branch) - directional Leaf Descendent node

4 CART Classification and regression trees A generic tree growing methodology Issues studied How many splits from a node? Which property to test at each node? When to declare a leaf? How to prune a large, redundant tree? If the leaf is impure, how to classify? How to handle missing data?

5 Number of Splits Binary tree Expressive power and comparative simplicity in training

6 Node Impurity – Occam ’ s Razor The fundamental principle underlying tree creation is that of simplicity: we prefer simple, compact tree with few nodes Occam's (or Ockham's) razor is a principle attributed to the 14th century logician and Franciscan friar; William of Occam. Ockham was the village in the English county of Surrey where he was born. The principle states that “ Entities should not be multiplied unnecessarily. ” "when you have two competing theories which make exactly the same predictions, the one that is simpler is the better. “ Stephen Hawking explains in A Brief History of Time: “ We could still imagine that there is a set of laws that determines events completely for some supernatural being, who could observe the present state of the universe without disturbing it. However, such models of the universe are not of much interest to us mortals. It seems better to employ the principle known as Occam's razor and cut out all the features of the theory which cannot be observed. ” Everything should be made as simple as possible, but not simpler

7 Property Query and Impurity Measurement We seek a property query T at each node N that makes the data reaching the immediate descendent nodes as pure as possible We want i(N) to be 0 if all the patterns reach the node bear the same category label Entropy impurity (information impurity)

8 Other Impurity Measurements Variance impurity (2-category case) Gini impurity Misclassification impurity

9 Choose the Property Test? Choose the query that decreases the impurity as much as possible N L, N R : left and right descendent nodes i(N L ), i(N R ): impurities P L : fraction of patterns at node N that will go to N L Solve for extrema (local extrema)

Example Node N: 90 patterns in  1 10 patterns in  2 Split candidate: 70  1 patterns & 0  2 patterns to the right 20  1 patterns & 10  2 patterns to the left 10

11 When to Stop Splitting? Two extreme scenarios Overfitting (each leaf is one sample) High error rate Approaches Validation and cross-validation  90% of the data set as training data  10% of the data set as validation data Use threshold  Unbalanced tree  Hard to choose threshold Minimum description length (MDL)  i(N) measures the uncertainty of the training data  Size of the tree measures the complexity of the classifier itself

12 When to Stop Splitting? (cont ’ ) Use stopping criterion based on the statistical significance of the reduction of impurity Use chi-square statistic Whether the candidate split differs significantly from a random split

13 Pruning Another way to stop splitting Horizon effect Lack of sufficient look ahead Let the tree fully grow, i.e. beyond any putative horizon, then all pairs of neighboring leaf nodes are considered for elimination

14 Instability

15 Other Methods Quinlan ’ s ID3 C4.5 (successor and refinement of ID3)

MATLAB Routine classregtree Classification tree vs. Regression tree If the target variable is categorical or numeric 16

17