Download presentation
Presentation is loading. Please wait.
Published byTobias Griffin Modified over 6 years ago
1
Statistical Learning Dong Liu Dept. EEIS, USTC
2
Chapter 8. Decision Tree Tree model Tree building Tree pruning
Tree and ensemble 2019/2/23 Chap 8. Decision Tree
3
Taxonomy How does the biologist determine the category of an animal?
A hierarchy of rules Kingdom of Animalia Phylum of Chordata Class of Mammalia Order of Carnivora … 2019/2/23 Chap 8. Decision Tree
4
Taxonomy as tree 2019/2/23 Chap 8. Decision Tree
5
Decision tree Refund MarSt TaxInc YES NO Yes No Married
Single, Divorced <= 80K > 80K 2019/2/23 Chap 8. Decision Tree
6
Using decision tree Start from the root of the tree NO Refund Yes No
MarSt Single, Divorced Married NO TaxInc NO <= 80K > 80K NO YES 2019/2/23 Chap 8. Decision Tree
7
Tree models A tree model consists of a set of conditions and a set of base models, organized in a tree Each internal node represents a condition on input attributes One condition is a division (split) of input space Each leaf node represents a base model Classification: a class (simplest case) or a classifier Regression: a constant (simplest case) or a regressor 2019/2/23 Chap 8. Decision Tree
8
Chapter 8. Decision Tree Tree model Tree building Tree pruning
Tree and ensemble 2019/2/23 Chap 8. Decision Tree
9
Tree induction Assume we have defined the form of base models
How to find out the optimal tree structure (set of conditions, division of input space)? Exhaustive search is computationally expensive Heuristic approach: Hunt’s algorithm 2019/2/23 Chap 8. Decision Tree
10
Hunt’s algorithm Input: A set of training data 𝒟={ 𝒙 𝑛 , 𝑦 𝑛 }
Output: A classification tree or regression tree 𝑇 Function 𝑇 = Hunt_Algorithm(𝒟) If 𝒟 need not or cannot be divided, return a leaf node Else Find an attribute of 𝒙, say 𝑥 𝑑 , and decide a condition 𝑔( 𝑥 𝑑 ). Divide 𝒟 into 𝒟 1 , 𝒟 2 ,…, according to the output of 𝑔 𝑥 𝑑 𝑇 1 = Hunt_Algorithm( 𝒟 1 ), 𝑇 2 = Hunt_Algorithm( 𝒟 2 ), … Let 𝑇 1 , 𝑇 2 ,…, be the children of 𝑇 Return 𝑇 2019/2/23 Chap 8. Decision Tree
11
Example of Hunt’s algorithm (1)
𝒟= 𝑇= Refund Yes No 𝑇 1 =? 𝑇 2 =? D need be divided? Yes! D’s class labels have different values D can be divided? Yes! D’s input attributes have different values 2019/2/23 Chap 8. Decision Tree
12
Example of Hunt’s algorithm (2)
𝒟 1 = 𝑇= Refund Yes No NO 𝑇 1 =? 𝑇 2 =? D1 need be divided? No! T1 is a leaf node 2019/2/23 Chap 8. Decision Tree
13
Example of Hunt’s algorithm (3)
𝒟 2 = 𝑇= Refund Yes No NO MarSt 𝑇 2 =? Single, Divorced Married 𝑇 21 =? 𝑇 22 =? 2019/2/23 Chap 8. Decision Tree
14
Example of Hunt’s algorithm (4)
𝒟 22 = 𝑇= Refund Yes No NO MarSt Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES D22 can be divided? No! T22 is a leaf node 2019/2/23 Chap 8. Decision Tree
15
Find an attribute and decide a condition
MarSt Single Married Divorced Discrete values: Multi-way or Two-way Which attribute & which condition shall be selected? MarSt {Single, Divorced} Married MarSt Single {Married, Divorced} Define a criterion that describes the “gain” of dividing a set into several subsets Continuous values: Two-way or Multi-way 2019/2/23 Chap 8. Decision Tree
16
Purity of set Purity of set describes how easily the set can be classified E.g. two sets with 0-1 classes: {0,0,0,0,0,0,0,0,0,1} vs. {0,1,0,1,0,1,0,1,0,1} Measures ( 𝑝 0 and 𝑝 1 stands for the percentage of class-0 and class-1) Entropy: − 𝑝 0 log 𝑝 0 − 𝑝 1 log 𝑝 1 Gini index: 1− 𝑝 0 2 − 𝑝 1 2 Misclassification error (if taking the dominant class): min( 𝑝 0 , 𝑝 1 ) 2019/2/23 Chap 8. Decision Tree
17
Criterion to find attribute and decide condition
Information gain 𝑔=𝐻 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐻( 𝒟 𝑖 ) where 𝐻 𝒟 is the entropy Information gain ratio 𝑔𝑟= 𝑔 − 𝑖 | 𝒟 𝑖 | |𝒟| log | 𝒟 𝑖 | |𝒟| , suppress too many subsets Gini index gain 𝑔𝑖𝑔=𝐺 𝒟 − 𝑖 𝒟 𝑖 𝒟 𝐺( 𝒟 𝑖 ) where 𝐺(𝒟) is the Gini index 2019/2/23 Chap 8. Decision Tree
18
Example: Gini index gain
Before split: 𝐺 𝒟 =0.42 After split with {TaxInc <= 97}: 𝑔𝑖𝑔=0.12 Training samples 2019/2/23 Chap 8. Decision Tree
19
Chapter 8. Decision Tree Tree model Tree building Tree pruning
Tree and ensemble 2019/2/23 Chap 8. Decision Tree
20
Control the complexity of tree
Using Hunt’s algorithm, we build a tree as accurate as possible, which may incur over-fitting Two manners to control the complexity (thus to avoid over-fitting) Early termination: stop splitting, if the gain is less than a threshold, or if the tree is too deep, or if the set is too small Tree pruning: remove branches from the tree so as to minimize the joint cost 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇| where 𝐶 𝑇 is empirical risk (e.g. error rate of training data), |𝑇| is tree complexity (e.g. number of leaf nodes) 2019/2/23 Chap 8. Decision Tree
21
Tree pruning example (1)
Refund Yes No NO MarSt 3 correct Single, Divorced Married TaxInc 𝑇 22 =? NO <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =1/10 𝑇 =4 2019/2/23 Chap 8. Decision Tree
22
Tree pruning example (2)
We have different pruning selections Refund Yes No MarSt Single, Divorced Married NO MarSt Married 3 correct Single, Divorced TaxInc 𝑇 22 =? NO <= 80K > 80K 3 correct 1 error TaxInc 𝑇 22 =? NO <= 80K > 80K NO YES 2 correct 1 error 1 correct 3 correct 2 error NO YES 1 correct 3 correct 𝐶 𝑇 =3/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree
23
Tree pruning example (2)
We have different pruning selections Refund Refund Yes No Yes No NO MarSt NO MarSt 3 correct Single, Divorced Married Married 3 correct Single, Divorced YES 𝑇 22 =? NO TaxInc 𝑇 22 =? NO 3 correct 1 error 2 correct 1 error <= 80K > 80K 2 correct 1 error NO YES 1 correct 3 correct 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree
24
Tree pruning example (2)
We have different pruning selections Refund Refund Yes No Yes No NO TaxInc NO MarSt 3 correct <= 80K > 80K Married 3 correct Single, Divorced NO YES TaxInc 𝑇 22 =? NO 3 correct 1 error 3 correct <= 80K > 80K 2 correct 1 error NO YES 𝐶 𝑇 =1/10 𝑇 =3 1 correct 3 correct 2019/2/23 Chap 8. Decision Tree
25
Tree pruning example (2)
Select the tree with minimal 𝐶(𝑇) Refund Yes No NO TaxInc YES <= 80K > 80K MarSt Married Single, Divorced TaxInc YES NO <= 80K > 80K 𝑇 22 =? Refund Yes No NO MarSt Married Single, Divorced YES 𝑇 22 =? 𝐶 𝑇 =1/10 𝑇 =3 𝐶 𝑇 =3/10 𝑇 =3 𝐶 𝑇 =2/10 𝑇 =3 2019/2/23 Chap 8. Decision Tree
26
Tree pruning example (3)
Continue pruning, keep 2 leaf nodes Refund Yes No MarSt Single, Divorced Married NO YES YES 𝑇 22 =? NO 3 correct 4 correct 3 error 3 correct 3 error 3 correct 1 error TaxInc 𝐶 𝑇 =3/10 𝑇 =2 <= 80K > 80K 𝐶 𝑇 =4/10 𝑇 =2 NO YES 3 correct 1 error 3 correct 3 error 𝐶 𝑇 =4/10 𝑇 =2 2019/2/23 Chap 8. Decision Tree
27
Tree pruning example (4)
Continue pruning, keep 1 leaf node NO 6 correct 4 error 𝐶 𝑇 =4/10 𝑇 =1 2019/2/23 Chap 8. Decision Tree
28
Tree pruning example (5)
In summary 𝐶(𝑇)=1/10,|𝑇|=4 𝐶(𝑇)=1/10,|𝑇|=3 𝐶(𝑇)=3/10,|𝑇|=2 𝐶(𝑇)=4/10,|𝑇|=1 Therefore, according to 𝛼, optimal trees are different 𝛼≥0.15: one leaf node 0≤𝛼≤0.15: three leaf nodes 𝛼=0: four leaf nodes Refund Yes No NO MarSt Married Single, Divorced TaxInc YES <= 80K > 80K 𝑇 22 =? Refund Yes No NO TaxInc YES <= 80K > 80K NO 2019/2/23 Chap 8. Decision Tree
29
Chapter 8. Decision Tree Tree model Tree building Tree pruning
Tree and ensemble 2019/2/23 Chap 8. Decision Tree
30
Decision tree for regression
Consider the simplest case: each leaf node corresponds to a constant Each time to find attribute and decide condition, is to minimize the (e.g. quadratic) cost The final regression tree is indeed a piecewise constant function min 𝑑,𝑡 [ min 𝑐 𝑥 𝑖𝑑 ≤𝑡 𝑦 𝑖 − 𝑐 min 𝑐 𝑥 𝑖𝑑 >𝑡 ( 𝑦 𝑖 − 𝑐 2 ) 2 ] 2019/2/23 Chap 8. Decision Tree
31
Equivalence of decision tree and boosting tree for regression
Hunt’s algorithm: “divide and conquer”, conditions + base models Boosting: linear combination of base models Model 1 Model 3 两种做法得到的都是分段常函数,是等价的。 Model 1 + Model 2 + Model 3 Model 2 Each model is a constant Each model is a decision stump 2019/2/23 Chap 8. Decision Tree
32
Implementation ID3: use information gain
C4.5: use information gain ratio (by default), one of most famous classification algorithm CART: use Gini index (for classification) and quadratic cost (for regression), only 2-way split According to 𝐶 𝛼 𝑇 =𝐶 𝑇 +𝛼|𝑇|, increase 𝛼 gradually to get a series of subtrees. Determine which subtree is optimal according to validation (or cross validation) 2019/2/23 Chap 8. Decision Tree
33
Remarks on tree models Easy to interpret
Irrelevant/redundant attributes can be filtered out Good at discrete variables How to handle complex conditions e.g. 𝑥 1 + 𝑥 2 <𝑐? Oblique tree 2019/2/23 Chap 8. Decision Tree
34
Random forest Combination of decision tree and ensemble learning
According to bagging, firstly generate multiple datasets (bootstrap samples), each of which gives rise to a tree model During tree building, consider a random subset of features when splitting 2019/2/23 Chap 8. Decision Tree
35
Chap 5. Non-Parametric Supervised Learning
Chapter summary Dictionary Toolbox Decision tree Gini index Pruning (of decision tree) CART C4.5 Hunt’s algorithm Information gain, ~ ratio Random forest 2019/2/23 Chap 5. Non-Parametric Supervised Learning
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.