ECE 471/571 – Lecture 12 Decision Tree.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 9: Decision Trees
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Classification with Multiple Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification Techniques: Decision Tree Learning
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sparse vs. Ensemble Approaches to Supervised Learning
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
ICS 273A Intro Machine Learning
Sparse vs. Ensemble Approaches to Supervised Learning
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensemble Learning (2), Tree and Forest
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Decision tree LING 572 Fei Xia 1/16/06.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Decision Trees.
Classification and Regression Trees
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
LECTURE 20: DECISION TREES
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Trees, bagging, boosting, and stacking
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Roberto Battiti, Mauro Brunato
Random Survival Forests
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
INTRODUCTION TO Machine Learning
LECTURE 18: DECISION TREES
ECE – Pattern Recognition Lecture 13 – Decision Tree
INTRODUCTION TO Machine Learning 2nd Edition
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
STT : Intro. to Statistical Learning
Presentation transcript:

ECE 471/571 – Lecture 12 Decision Tree

Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering

Some Terminologies Decision tree Root Link (branch) - directional Leaf Descendent node

CART Classification and regression trees A generic tree growing methodology Issues studied How many splits from a node? Which property to test at each node? When to declare a leaf? How to prune a large, redundant tree? If the leaf is impure, how to classify? How to handle missing data?

Number of Splits Binary tree Expressive power and comparative simplicity in training

Node Impurity – Occam’s Razor The fundamental principle underlying tree creation is that of simplicity: we prefer simple, compact tree with few nodes http://math.ucr.edu/home/baez/physics/occam.html Occam's (or Ockham's) razor is a principle attributed to the 14th century logician and Franciscan friar; William of Occam. Ockham was the village in the English county of Surrey where he was born. The principle states that “Entities should not be multiplied unnecessarily.” "when you have two competing theories which make exactly the same predictions, the one that is simpler is the better.“ Stephen Hawking explains in A Brief History of Time: “We could still imagine that there is a set of laws that determines events completely for some supernatural being, who could observe the present state of the universe without disturbing it. However, such models of the universe are not of much interest to us mortals. It seems better to employ the principle known as Occam's razor and cut out all the features of the theory which cannot be observed.” Everything should be made as simple as possible, but not simpler

Property Query and Impurity Measurement We seek a property query T at each node N that makes the data reach the immediate descendent nodes as pure as possible We want i(N) to be 0 if all the patterns reach the node bear the same category label Entropy impurity (information impurity)

Other Impurity Measurements Variance impurity (2-category case) Gini impurity Misclassification impurity

Choose the Property Test? Choose the query that decreases the impurity as much as possible NL, NR: left and right descendent nodes i(NL), i(NR): impurities PL: fraction of patterns at node N that will go to NL Solve for extrema (local extrema)

Example Node N: Split candidate: 90 patterns in w1 10 patterns in w2 70 w1 patterns & 0 w2 patterns to the right 20 w1 patterns & 10 w2 patterns to the left If using Gini impurity, i(N)=1-0.9*0.9-0.1*0.1=1-0.81-0.01=0.18, di(n) = 0.18 0.13=0.3*(1-2/3*2/3-1/3*1/3)+0.7*(1-1x1)=0.3*4/9=0.13 If using misclassification impurity, i(N)=0.1, di(N)=0.10.1

When to Stop Splitting? Two extreme scenarios Approaches Overfitting (each leaf is one sample) High error rate Approaches Validation and cross-validation 90% of the data set as training data 10% of the data set as validation data Use threshold Unbalanced tree Hard to choose threshold Minimum description length (MDL) i(N) measures the uncertainty of the training data Size of the tree measures the complexity of the classifier itself Size: number of nodes

When to Stop Splitting? (cont’) Use stopping criterion based on the statistical significance of the reduction of impurity Use chi-square statistic Whether the candidate split differs significantly from a random split

Pruning Another way to stop splitting Horizon effect Lack of sufficient look ahead Let the tree fully grow, i.e. beyond any putative horizon, then all pairs of neighboring leaf nodes are considered for elimination

Instability

Other Methods Quinlan’s ID3 C4.5 (successor and refinement of ID3) http://www.rulequest.com/Personal/

MATLAB Routine classregtree Classification tree vs. Regression tree If the target variable is categorical or numeric http://www.mathworks.com/help/stats/classregtree.html e.g., predicted price of a consumer good

http://www.simafore.com/blog/bid/62482/2-main-differences-between-classification-and-regression-trees

Random Forest Potential issue with decision trees Prof. Leo Breiman Ensemble learning methods Bagging (Bootstrap aggregating): Proposed by Breiman in 1994 to improve the classification by combining classifications of randomly generated training sets Random forest: bagging + random selection of features at each node to determine a split overfitting

MATLAB Implementation B = TreeBagger(nTrees, train_x, train_y); pred = B.predict(test_x);

Reference [CART] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1984. [Bagging] L. Breiman, “Bagging predictors,” Machine Learning, 24(2):123-140, August 1996. (citation: 16,393) [RF] L. Breiman, “Random forests,” Machine Learning, 45(1):5-32, October 2001.