Statistical Learning Dong Liu Dept. EEIS, USTC.

Statistical Learning Dong Liu Dept. EEIS, USTC

Chapter 8. Decision Tree Tree model Tree building Tree pruning
Tree and ensemble 2019/2/23 Chap 8. Decision Tree

Taxonomy How does the biologist determine the category of an animal?
A hierarchy of rules Kingdom of Animalia Phylum of Chordata Class of Mammalia Order of Carnivora … 2019/2/23 Chap 8. Decision Tree

Taxonomy as tree 2019/2/23 Chap 8. Decision Tree

Decision tree Refund MarSt TaxInc YES NO Yes No Married
Single, Divorced <= 80K > 80K 2019/2/23 Chap 8. Decision Tree

Using decision tree Start from the root of the tree NO Refund Yes No
MarSt Single, Divorced Married NO TaxInc NO <= 80K > 80K NO YES 2019/2/23 Chap 8. Decision Tree

Tree models A tree model consists of a set of conditions and a set of base models, organized in a tree Each internal node represents a condition on input attributes One condition is a division (split) of input space Each leaf node represents a base model Classification: a class (simplest case) or a classifier Regression: a constant (simplest case) or a regressor 2019/2/23 Chap 8. Decision Tree

Tree induction Assume we have defined the form of base models
How to find out the optimal tree structure (set of conditions, division of input space)? Exhaustive search is computationally expensive Heuristic approach: Hunt’s algorithm 2019/2/23 Chap 8. Decision Tree

Hunt’s algorithm Input: A set of training data 𝒟={ 𝒙 𝑛 , 𝑦 𝑛 }
Output: A classification tree or regression tree 𝑇 Function 𝑇 = Hunt_Algorithm(𝒟) If 𝒟 need not or cannot be divided, return a leaf node Else Find an attribute of 𝒙, say 𝑥 𝑑 , and decide a condition 𝑔( 𝑥 𝑑 ). Divide 𝒟 into 𝒟 1 , 𝒟 2 ,…, according to the output of 𝑔 𝑥 𝑑 𝑇 1 = Hunt_Algorithm( 𝒟 1 ), 𝑇 2 = Hunt_Algorithm( 𝒟 2 ), … Let 𝑇 1 , 𝑇 2 ,…, be the children of 𝑇 Return 𝑇 2019/2/23 Chap 8. Decision Tree

Example of Hunt’s algorithm (1)
𝒟= 𝑇= Refund Yes No 𝑇 1 =? 𝑇 2 =? D need be divided? Yes! D’s class labels have different values D can be divided? Yes! D’s input attributes have different values 2019/2/23 Chap 8. Decision Tree

𝒟 1 = 𝑇= Refund Yes No NO 𝑇 1 =? 𝑇 2 =? D1 need be divided? No! T1 is a leaf node 2019/2/23 Chap 8. Decision Tree

𝒟 2 = 𝑇= Refund Yes No NO MarSt 𝑇 2 =? Single, Divorced Married 𝑇 21 =? 𝑇 22 =? 2019/2/23 Chap 8. Decision Tree

𝒟 22 = 𝑇= Refund Yes No NO MarSt Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES D22 can be divided? No! T22 is a leaf node 2019/2/23 Chap 8. Decision Tree

Find an attribute and decide a condition
MarSt Single Married Divorced Discrete values: Multi-way or Two-way Which attribute & which condition shall be selected? MarSt {Single, Divorced} Married MarSt Single {Married, Divorced} Define a criterion that describes the “gain” of dividing a set into several subsets Continuous values: Two-way or Multi-way 2019/2/23 Chap 8. Decision Tree

Purity of set Purity of set describes how easily the set can be classified E.g. two sets with 0-1 classes: {0,0,0,0,0,0,0,0,0,1} vs. {0,1,0,1,0,1,0,1,0,1} Measures ( 𝑝 0 and 𝑝 1 stands for the percentage of class-0 and class-1) Entropy: − 𝑝 0 log 𝑝 0 − 𝑝 1 log 𝑝 1 Gini index: 1− 𝑝 0 2 − 𝑝 1 2 Misclassification error (if taking the dominant class): min⁡( 𝑝 0 , 𝑝 1 ) 2019/2/23 Chap 8. Decision Tree

Criterion to find attribute and decide condition
Information gain 𝑔=𝐻 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐻( 𝒟 𝑖 ) where 𝐻 𝒟 is the entropy Information gain ratio 𝑔𝑟= 𝑔 − 𝑖 | 𝒟 𝑖 | |𝒟| log | 𝒟 𝑖 | |𝒟| , suppress too many subsets Gini index gain 𝑔𝑖𝑔=𝐺 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐺( 𝒟 𝑖 ) where 𝐺(𝒟) is the Gini index 2019/2/23 Chap 8. Decision Tree

Example: Gini index gain
Before split: 𝐺 𝒟 =0.42 After split with {TaxInc <= 97}: 𝑔𝑖𝑔=0.12 Training samples 2019/2/23 Chap 8. Decision Tree

Control the complexity of tree
Using Hunt’s algorithm, we build a tree as accurate as possible, which may incur over-fitting Two manners to control the complexity (thus to avoid over-fitting) Early termination: stop splitting, if the gain is less than a threshold, or if the tree is too deep, or if the set is too small Tree pruning: remove branches from the tree so as to minimize the joint cost 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇| where 𝐶 𝑇 is empirical risk (e.g. error rate of training data), |𝑇| is tree complexity (e.g. number of leaf nodes) 2019/2/23 Chap 8. Decision Tree

Tree pruning example (1)
Refund Yes No NO MarSt 3 correct Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =1/10 𝑇 =4 2019/2/23 Chap 8. Decision Tree

We have different pruning selections Refund Yes No MarSt Single, Divorced Married NO MarSt Married 3 correct Single, Divorced TaxInc 𝑇 22 =? NO <= 80K > 80K 3 correct 1 error TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES 2 correct 1 error 1 correct 3 correct 2 error NO YES 1 correct 3 correct 𝐶 𝑇 =3/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

We have different pruning selections Refund Refund Yes No Yes No NO MarSt NO MarSt 3 correct Single, Divorced Married Married 3 correct Single, Divorced YES 𝑇 22 =? NO TaxInc 𝑇 22 =? NO 3 correct 1 error 2 correct 1 error <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

We have different pruning selections Refund Refund Yes No Yes No NO TaxInc NO MarSt 3 correct <= 80K > 80K Married 3 correct Single, Divorced NO YES TaxInc 𝑇 22 =? NO 3 correct 1 error 3 correct <= 80K > 80K 2 correct 1 error NO YES 𝐶 𝑇 =1/10 𝑇 =3 1 correct 3 correct 2019/2/23 Chap 8. Decision Tree

Select the tree with minimal 𝐶(𝑇) Refund Yes No NO TaxInc YES <= 80K > 80K MarSt Married Single, Divorced TaxInc YES NO <= 80K > 80K 𝑇 22 =? Refund Yes No NO MarSt Married Single, Divorced YES 𝑇 22 =? 𝐶 𝑇 =1/10 𝑇 =3 𝐶 𝑇 =3/10 𝑇 =3 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree

Continue pruning, keep 2 leaf nodes Refund Yes No MarSt Single, Divorced Married NO YES YES 𝑇 22 =? NO 3 correct 4 correct 3 error 3 correct 3 error 3 correct 1 error TaxInc 𝐶 𝑇 =3/10 𝑇 =2 <= 80K > 80K 𝐶 𝑇 =4/10 𝑇 =2 NO YES 3 correct 1 error 3 correct 3 error 𝐶 𝑇 =4/10 𝑇 =2 2019/2/23 Chap 8. Decision Tree

Continue pruning, keep 1 leaf node NO 6 correct 4 error 𝐶 𝑇 =4/10 𝑇 =1 2019/2/23 Chap 8. Decision Tree

In summary 𝐶(𝑇)=1/10,|𝑇|=4 𝐶(𝑇)=1/10,|𝑇|=3 𝐶(𝑇)=3/10,|𝑇|=2 𝐶(𝑇)=4/10,|𝑇|=1 Therefore, according to 𝛼, optimal trees are different 𝛼≥0.15: one leaf node 0≤𝛼≤0.15: three leaf nodes 𝛼=0: four leaf nodes Refund Yes No NO MarSt Married Single, Divorced TaxInc YES <= 80K > 80K 𝑇 22 =? Refund Yes No NO TaxInc YES <= 80K > 80K NO 2019/2/23 Chap 8. Decision Tree

Decision tree for regression
Consider the simplest case: each leaf node corresponds to a constant Each time to find attribute and decide condition, is to minimize the (e.g. quadratic) cost The final regression tree is indeed a piecewise constant function min 𝑑,𝑡 [ min 𝑐 𝑥 𝑖𝑑 ≤𝑡 𝑦 𝑖 − 𝑐 min 𝑐 𝑥 𝑖𝑑 >𝑡 ( 𝑦 𝑖 − 𝑐 2 ) 2 ] 2019/2/23 Chap 8. Decision Tree

Equivalence of decision tree and boosting tree for regression
Hunt’s algorithm: “divide and conquer”, conditions + base models Boosting: linear combination of base models Model 1 Model 3 两种做法得到的都是分段常函数，是等价的。 Model 1 + Model 2 + Model 3 Model 2 Each model is a constant Each model is a decision stump 2019/2/23 Chap 8. Decision Tree

Implementation ID3: use information gain
C4.5: use information gain ratio (by default), one of most famous classification algorithm CART: use Gini index (for classification) and quadratic cost (for regression), only 2-way split According to 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇|, increase 𝛼 gradually to get a series of subtrees. Determine which subtree is optimal according to validation (or cross validation) 2019/2/23 Chap 8. Decision Tree

Remarks on tree models Easy to interpret
Irrelevant/redundant attributes can be filtered out Good at discrete variables How to handle complex conditions e.g. 𝑥 1 + 𝑥 2 <𝑐? Oblique tree 2019/2/23 Chap 8. Decision Tree

Random forest Combination of decision tree and ensemble learning
According to bagging, firstly generate multiple datasets (bootstrap samples), each of which gives rise to a tree model During tree building, consider a random subset of features when splitting 2019/2/23 Chap 8. Decision Tree

Chap 5. Non-Parametric Supervised Learning
Chapter summary Dictionary Toolbox Decision tree Gini index Pruning (of decision tree) CART C4.5 Hunt’s algorithm Information gain, ~ ratio Random forest 2019/2/23 Chap 5. Non-Parametric Supervised Learning

Statistical Learning Dong Liu Dept. EEIS, USTC.

Similar presentations

Presentation on theme: "Statistical Learning Dong Liu Dept. EEIS, USTC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Learning Dong Liu Dept. EEIS, USTC.

Similar presentations

Presentation on theme: "Statistical Learning Dong Liu Dept. EEIS, USTC."— Presentation transcript:

Similar presentations

About project

Feedback