Statistical Learning Dong Liu Dept. EEIS, USTC.

Slides:



Advertisements
Similar presentations
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Advertisements

Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
CSci 8980: Data Mining (Fall 2002)
Ensemble Learning: An Introduction
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Ensemble Learning (2), Tree and Forest
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Learning from observations
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Illustration of the Classification Task: Learning Algorithm Model.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
10. Decision Trees and Markov Chains for Gene Finding.
Illustrating Classification Task
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
DECISION TREES An internal node represents a test on an attribute.
Computational Intelligence: Methods and Applications
Trees, bagging, boosting, and stacking
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Decision Tree Saed Sayad 9/21/2018.
Data Mining Classification: Basic Concepts and Techniques
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Basic Concepts and Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
©Jiawei Han and Micheline Kamber
Presentation transcript:

Statistical Learning Dong Liu Dept. EEIS, USTC

Chapter 8. Decision Tree Tree model Tree building Tree pruning Tree and ensemble 2019/2/23 Chap 8. Decision Tree

Taxonomy How does the biologist determine the category of an animal? A hierarchy of rules Kingdom of Animalia Phylum of Chordata Class of Mammalia Order of Carnivora … 2019/2/23 Chap 8. Decision Tree

Taxonomy as tree 2019/2/23 Chap 8. Decision Tree

Decision tree Refund MarSt TaxInc YES NO Yes No Married Single, Divorced <= 80K > 80K 2019/2/23 Chap 8. Decision Tree

Using decision tree Start from the root of the tree NO Refund Yes No MarSt Single, Divorced Married NO TaxInc NO <= 80K > 80K NO YES 2019/2/23 Chap 8. Decision Tree

Tree models A tree model consists of a set of conditions and a set of base models, organized in a tree Each internal node represents a condition on input attributes One condition is a division (split) of input space Each leaf node represents a base model Classification: a class (simplest case) or a classifier Regression: a constant (simplest case) or a regressor 2019/2/23 Chap 8. Decision Tree

Chapter 8. Decision Tree Tree model Tree building Tree pruning Tree and ensemble 2019/2/23 Chap 8. Decision Tree

Tree induction Assume we have defined the form of base models How to find out the optimal tree structure (set of conditions, division of input space)? Exhaustive search is computationally expensive Heuristic approach: Hunt’s algorithm 2019/2/23 Chap 8. Decision Tree

Hunt’s algorithm Input: A set of training data 𝒟={ 𝒙 𝑛 , 𝑦 𝑛 } Output: A classification tree or regression tree 𝑇 Function 𝑇 = Hunt_Algorithm(𝒟) If 𝒟 need not or cannot be divided, return a leaf node Else Find an attribute of 𝒙, say 𝑥 𝑑 , and decide a condition 𝑔( 𝑥 𝑑 ). Divide 𝒟 into 𝒟 1 , 𝒟 2 ,…, according to the output of 𝑔 𝑥 𝑑 𝑇 1 = Hunt_Algorithm( 𝒟 1 ), 𝑇 2 = Hunt_Algorithm( 𝒟 2 ), … Let 𝑇 1 , 𝑇 2 ,…, be the children of 𝑇 Return 𝑇 2019/2/23 Chap 8. Decision Tree

Example of Hunt’s algorithm (1) 𝒟= 𝑇= Refund Yes No 𝑇 1 =? 𝑇 2 =? D need be divided? Yes! D’s class labels have different values D can be divided? Yes! D’s input attributes have different values 2019/2/23 Chap 8. Decision Tree

Example of Hunt’s algorithm (2) 𝒟 1 = 𝑇= Refund Yes No NO 𝑇 1 =? 𝑇 2 =? D1 need be divided? No! T1 is a leaf node 2019/2/23 Chap 8. Decision Tree

Example of Hunt’s algorithm (3) 𝒟 2 = 𝑇= Refund Yes No NO MarSt 𝑇 2 =? Single, Divorced Married 𝑇 21 =? 𝑇 22 =? 2019/2/23 Chap 8. Decision Tree

Example of Hunt’s algorithm (4) 𝒟 22 = 𝑇= Refund Yes No NO MarSt Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES D22 can be divided? No! T22 is a leaf node 2019/2/23 Chap 8. Decision Tree

Find an attribute and decide a condition MarSt Single Married Divorced Discrete values: Multi-way or Two-way Which attribute & which condition shall be selected? MarSt {Single, Divorced} Married MarSt Single {Married, Divorced} Define a criterion that describes the “gain” of dividing a set into several subsets Continuous values: Two-way or Multi-way 2019/2/23 Chap 8. Decision Tree

Purity of set Purity of set describes how easily the set can be classified E.g. two sets with 0-1 classes: {0,0,0,0,0,0,0,0,0,1} vs. {0,1,0,1,0,1,0,1,0,1} Measures ( 𝑝 0 and 𝑝 1 stands for the percentage of class-0 and class-1) Entropy: − 𝑝 0 log 𝑝 0 − 𝑝 1 log 𝑝 1 Gini index: 1− 𝑝 0 2 − 𝑝 1 2 Misclassification error (if taking the dominant class): min⁡( 𝑝 0 , 𝑝 1 ) 2019/2/23 Chap 8. Decision Tree

Criterion to find attribute and decide condition Information gain 𝑔=𝐻 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐻( 𝒟 𝑖 ) where 𝐻 𝒟 is the entropy Information gain ratio 𝑔𝑟= 𝑔 − 𝑖 | 𝒟 𝑖 | |𝒟| log | 𝒟 𝑖 | |𝒟| , suppress too many subsets Gini index gain 𝑔𝑖𝑔=𝐺 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐺( 𝒟 𝑖 ) where 𝐺(𝒟) is the Gini index 2019/2/23 Chap 8. Decision Tree

Example: Gini index gain Before split: 𝐺 𝒟 =0.42 After split with {TaxInc <= 97}: 𝑔𝑖𝑔=0.12 Training samples 2019/2/23 Chap 8. Decision Tree

Chapter 8. Decision Tree Tree model Tree building Tree pruning Tree and ensemble 2019/2/23 Chap 8. Decision Tree

Control the complexity of tree Using Hunt’s algorithm, we build a tree as accurate as possible, which may incur over-fitting Two manners to control the complexity (thus to avoid over-fitting) Early termination: stop splitting, if the gain is less than a threshold, or if the tree is too deep, or if the set is too small Tree pruning: remove branches from the tree so as to minimize the joint cost 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇| where 𝐶 𝑇 is empirical risk (e.g. error rate of training data), |𝑇| is tree complexity (e.g. number of leaf nodes) 2019/2/23 Chap 8. Decision Tree

Tree pruning example (1) Refund Yes No NO MarSt 3 correct Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =1/10 𝑇 =4 2019/2/23 Chap 8. Decision Tree

Tree pruning example (2) We have different pruning selections Refund Yes No MarSt Single, Divorced Married NO MarSt Married 3 correct Single, Divorced TaxInc 𝑇 22 =? NO <= 80K > 80K 3 correct 1 error TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES 2 correct 1 error 1 correct 3 correct 2 error NO YES 1 correct 3 correct 𝐶 𝑇 =3/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

Tree pruning example (2) We have different pruning selections Refund Refund Yes No Yes No NO MarSt NO MarSt 3 correct Single, Divorced Married Married 3 correct Single, Divorced YES 𝑇 22 =? NO TaxInc 𝑇 22 =? NO 3 correct 1 error 2 correct 1 error <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

Tree pruning example (2) We have different pruning selections Refund Refund Yes No Yes No NO TaxInc NO MarSt 3 correct <= 80K > 80K Married 3 correct Single, Divorced NO YES TaxInc 𝑇 22 =? NO 3 correct 1 error 3 correct <= 80K > 80K 2 correct 1 error NO YES 𝐶 𝑇 =1/10 𝑇 =3 1 correct 3 correct 2019/2/23 Chap 8. Decision Tree

Tree pruning example (2) Select the tree with minimal 𝐶(𝑇) Refund Yes No NO TaxInc YES <= 80K > 80K MarSt Married Single, Divorced TaxInc YES NO <= 80K > 80K 𝑇 22 =? Refund Yes No NO MarSt Married Single, Divorced YES 𝑇 22 =? 𝐶 𝑇 =1/10 𝑇 =3 𝐶 𝑇 =3/10 𝑇 =3 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

Tree pruning example (3) Continue pruning, keep 2 leaf nodes Refund Yes No MarSt Single, Divorced Married NO YES YES 𝑇 22 =? NO 3 correct 4 correct 3 error 3 correct 3 error 3 correct 1 error TaxInc 𝐶 𝑇 =3/10 𝑇 =2 <= 80K > 80K 𝐶 𝑇 =4/10 𝑇 =2 NO YES 3 correct 1 error 3 correct 3 error 𝐶 𝑇 =4/10 𝑇 =2 2019/2/23 Chap 8. Decision Tree

Tree pruning example (4) Continue pruning, keep 1 leaf node NO 6 correct 4 error 𝐶 𝑇 =4/10 𝑇 =1 2019/2/23 Chap 8. Decision Tree

Tree pruning example (5) In summary 𝐶(𝑇)=1/10,|𝑇|=4 𝐶(𝑇)=1/10,|𝑇|=3 𝐶(𝑇)=3/10,|𝑇|=2 𝐶(𝑇)=4/10,|𝑇|=1 Therefore, according to 𝛼, optimal trees are different 𝛼≥0.15: one leaf node 0≤𝛼≤0.15: three leaf nodes 𝛼=0: four leaf nodes Refund Yes No NO MarSt Married Single, Divorced TaxInc YES <= 80K > 80K 𝑇 22 =? Refund Yes No NO TaxInc YES <= 80K > 80K NO 2019/2/23 Chap 8. Decision Tree

Chapter 8. Decision Tree Tree model Tree building Tree pruning Tree and ensemble 2019/2/23 Chap 8. Decision Tree

Decision tree for regression Consider the simplest case: each leaf node corresponds to a constant Each time to find attribute and decide condition, is to minimize the (e.g. quadratic) cost The final regression tree is indeed a piecewise constant function min 𝑑,𝑡 [ min 𝑐 1 𝑥 𝑖𝑑 ≤𝑡 𝑦 𝑖 − 𝑐 1 2 + min 𝑐 2 𝑥 𝑖𝑑 >𝑡 ( 𝑦 𝑖 − 𝑐 2 ) 2 ] 2019/2/23 Chap 8. Decision Tree

Equivalence of decision tree and boosting tree for regression Hunt’s algorithm: “divide and conquer”, conditions + base models Boosting: linear combination of base models Model 1 Model 3 两种做法得到的都是分段常函数,是等价的。 Model 1 + Model 2 + Model 3 Model 2 Each model is a constant Each model is a decision stump 2019/2/23 Chap 8. Decision Tree

Implementation ID3: use information gain C4.5: use information gain ratio (by default), one of most famous classification algorithm CART: use Gini index (for classification) and quadratic cost (for regression), only 2-way split According to 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇|, increase 𝛼 gradually to get a series of subtrees. Determine which subtree is optimal according to validation (or cross validation) 2019/2/23 Chap 8. Decision Tree

Remarks on tree models Easy to interpret Irrelevant/redundant attributes can be filtered out Good at discrete variables How to handle complex conditions e.g. 𝑥 1 + 𝑥 2 <𝑐? Oblique tree 2019/2/23 Chap 8. Decision Tree

Random forest Combination of decision tree and ensemble learning According to bagging, firstly generate multiple datasets (bootstrap samples), each of which gives rise to a tree model During tree building, consider a random subset of features when splitting 2019/2/23 Chap 8. Decision Tree

Chap 5. Non-Parametric Supervised Learning Chapter summary Dictionary Toolbox Decision tree Gini index Pruning (of decision tree) CART C4.5 Hunt’s algorithm Information gain, ~ ratio Random forest 2019/2/23 Chap 5. Non-Parametric Supervised Learning