Decision Tree Pruning problem of overfit approaches

Name: Decision Tree Pruning problem of overfit approaches
Uploaded: 2017-10-20T20:41:56+00:00
Duration: PTM4S19
Channel: Rhoda Pierce
Description: Decision Tree Pruning problem of overfit approaches

Decision Tree Pruning problem of overfit approaches
error rate on training data is overly optimistic generalization problem: larger trees can be less accurate on unseen test data approaches early stopping criteria (e.g. c2) use a validation set (test acc on subset of ex) convert to rules use a complexity measure to optimize size of tree

Reduced-Error Pruning
set aside a Validation set withhold a subset (~1/3) of training data to use for pruning Note: you should randomize the order of training examples For each node: Sum the errors over existing subtree Calculate error on same node if converted to a leaf with majority class label Prune node with highest reduction in error Repeat until error starts to increase have to re-calculate errors for all nodes after each pass

2+,3- 4+,2- 3+,2- 2+,1- 2+ 2+,2- 2-

Pessimistic Pruning (see Mingers, 1989)
Avoids needs to use validation set, can train on more examples Use conservative estimate of true error at each node, based on training examples “Continuity correction” to error rate at each node: add 1/2*N to observed errors, for N the number of leaves in sub-tree Prune node unless est. error rate of subtree is more than 1 std error below that for pruned: r’subtree<r’pruned-SE

Minimum-Error Pruning
recursive (bottom-up) decide whether to convert each internal node to a leaf error estimate (on training data): for each node, compare estimated error for pruned node versus weighted average of esimtated error rates in sub-trees k is number of classes nc is numer of ex in majority class

Rule Post-Pruning Convert tree to rules (one for each path from root to a leaf) Remove antecedents that most decrease error rate on validation set (or use pessimistic est. of error); until start to increase Sort final rule set by accuracy Compare first rule to: Outlook=sunny->No Humidity=high->No Calculate accuracy of 3 rules based on validation set and pick best version. Outlook=sunny ^ humidity=high -> No Outlook=sunny ^ humidity=normal -> Yes Outlook=overcast -> Yes Outlook=rain ^ wind=strong -> No Outlook=rain ^ wind=weak -> Yes

Cost-Complexity Pruning (see Mingers, 1989; 'err-comp')
On training examples, initial tree has no errors, but replacing subtrees with leaves increases errors “cost-complexity” – a measure of avg. error reduced per leaf Calculate number of errors for each node if collapsed to leaf compare to errors in leaves, taking into account more nodes used R(26,pruned)=15/200 R(26,subtree)=10/200 Cost-complexity is balanced when: R(n,pr)+a=R(n,su)+aN(su) 15/200+a=10/200+4a a=0.0083

Calculate a for each node; prune node with smallest a
Repeat, creating a series of trees T0,T1,T2… of decreasing size Pick tree with min error on validation set …or smallest tree within one standard error of minimum

Decision Tree Pruning problem of overfit approaches

Similar presentations

Presentation on theme: "Decision Tree Pruning problem of overfit approaches"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Tree Pruning problem of overfit approaches

Similar presentations

Presentation on theme: "Decision Tree Pruning problem of overfit approaches"— Presentation transcript:

Similar presentations

About project

Feedback