Download presentation
Presentation is loading. Please wait.
1
Decision Tree Pruning
2
Problem Statement We like to output small decision tree Model Selection The building is done until zero training error Option I : Stop Early Small decrease in index function Cons: may miss structure Option 2: Prune after building.
3
Pruning Input: tree T Sample: S Output: Tree T’ Basic Pruning: T’ is a sub-tree of T Can only replace inner nodes by leaves More advanced: Replace an inner node by one of its children
4
Reduced Error Pruning Split the sample to two part S 1 and S 2 Use S 1 to build a tree. Use S 2 to sample whether to prune. Process every inner node v After all its children has been process Compute the observed error of T v and leaf(v) If leaf(v) has less errors replace T v by leaf(v)
5
Reduced Error Pruning: Example
6
Pruning: CV & SRM Generate for each pruning size compute the minimal error pruning At most m different sub-trees Select between the prunings Cross Validation Structural Risk Minimization Any other index method
7
Finding the minimum pruning Procedure Compute Inputs: k : number of errors T : tree S : sample Output: P : pruned tree size : size of P
8
Procedure compute IF IsLeaf(T) IF Errors(T) k THEN size=1 ELSE size = P=T; return; IF Errors(root(T)) k size=1; P=root(T); return;
9
Procedure compute For i = 0 to k DO Call Compute(i, T[0], S 0, size i,0,P i.0 ) Call Compute(k-i, T[1], S 1, size i,1,P i.1 ) size = minimum {size i,0 + size i,1 +1} I = arg min {size i,0 + size i,1 +1} P = MakeTree(root(T),P I,0, P I,1 } What is the time complexity?
10
Cross Validation Split the sample S 1 and S 2 Build a tree using S 1 Compute the candidate pruning Select using S 2 Output the tree with smallest error on S 2
11
SRM Build a Tree T using S Compute the candidate pruning k d the size of the pruning with d errors Select using the SRM formula
12
Drawbacks Running time Since |T| = O(m) Running time O(m 2 ) Many passes over the data Significant drawback for large data sets
13
Linear Time Pruning Single Bottom-up pass linear time Use SRM like formula Local soundness Competitiveness to any pruning
14
Algorithm Process a node after processing its children Local parameters: T v current sub-tree at v, of size size v S v sample reaching v, of size m v l v length of path leading to v Local Test: obs(T v,S v ) + a(m v,size v,l v, ) > obs (root(T v ),S v )
15
The function a() Parameters: paths(H,l) set of paths of length l over H. trees(H,s) set of trees of size s over H. Formula:
16
The function a() Finite Class H |paths(H,l)| < |H| l. |trees(H,s)| < (4|H|) s. Formula: Infinite Classes: VC-dim
17
Example l v =3 size v mvmv m a(m v,size v,l v, )
18
Local uniform convergence Sample S S c = { x S |c(x)=1}, m c =|S c | Finite classes C and H e(h|c) = Pr[ h(x) f(x) | c(x)=1 ] obs(h|c) Lemma:with probability 1-
19
Global Analysis Notation T original tree (depends on S) T * pruned tree T opt optimal tree r v = (l v +size v )log|H| +log (m v / ) a(m v,size v,l v, ) = O( sqrt{ r v /m v }) 1
20
Sub-Tree Property Lemma: with probability 1- T * is a sub-tree of T opt Proof: Assume the all the local lemmas hold. Each pruning reduces the error. Assume T * has a subtree outside T opt Adding that subtree to T opt will improve it!
21
Comparing T * and T opt Additional pruned nodes: V={v 1, …, v t } Additional error: e(T * ) - e(T opt ) = (e(v i )-e(T * vi ))Pr[v i ] Claim: With high probability
22
Analysis Lemma: With probability 1- If Pr[v i ] > 12(l opt log |H|+ log 1/ )/m =b THEN Pr[v i ] > 2obs(v i ) Proof: Relative Chernoff Bound. Union over |H| l paths. V’ = {v i V | Pr[v i ]>b}
23
Analysis of Sum over V-V’ bounded by s opt b
24
Analysis of Sum of m v < l opt size opt Sum of r v <size opt (l opt log |H|+ log m/ ) Putting it all together
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.