Download presentation
Presentation is loading. Please wait.
Published byKimberly Morton Modified over 9 years ago
1
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi (eduardopoggi@yahoo.com.ar)eduardopoggi@yahoo.com.ar Ernesto Mislej (emislej@google.com)emislej@google.com otoño de 2005
2
2 Decision Trees Definition Mechanism Splitting Functions Hypothesis Space and Bias Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes
3
3 Example of a Decision Tree Example: Learning to classify stars. Luminosity Mass Type A Type B Type C > T1 <= T1 > T2 <= T2
4
4 Short vs Long Hypotheses We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. Why is this the right thing to do? Occam’s Razor: Prefer the simplest hypothesis that fits the data. Back since William of Occam (1320). Great debate in the philosophy of science.
5
5 Issues in Decision Tree Learning Practical issues while building a decision tree can be enumerated as follows: 1)How deep should the tree be? 2)How do we handle continuous attributes? 3)What is a good splitting function? 4)What happens when attribute values are missing? 5)How do we improve the computational efficiency?
6
6 How deep should the tree be? Overfitting the Data A tree overfits the data if we let it grow deep enough so that it begins to capture “aberrations” in the data that harm the predictive power on unseen examples: size t2 t3 humidity Possibly just noise, but the tree is grown deeper to capture these examples
7
7 Overtting the Data: Definition Assume a hypothesis space H. We say a hypothesis h in H overfits a dataset D if there is another hypothesis h’ in H where h has better classification accuracy than h’ on D but worse classification accuracy than h’ on D’. 0.5 0.6 0.7 0.8 0.9 1.0 Size of the tree training data testing data overfitting
8
8 Causes for Overtting the Data What causes a hypothesis to overfit the data? 1)Random errors or noise Examples have incorrect class label or Examples have incorrect class label or incorrect attribute values. incorrect attribute values. 2)Coincidental patterns By chance examples seem to deviate from a pattern due to By chance examples seem to deviate from a pattern due to the small size of the sample. the small size of the sample. Overfitting is a serious problem that can cause strong performance degradation.
9
9 Solutions for Overtting the Data There are two main classes of solutions: 1)Stop the tree early before it begins to overfit the data. + In practice this solution is hard to implement because it + In practice this solution is hard to implement because it is not clear what is a good stopping point. is not clear what is a good stopping point. 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing problem shows up. Then prune the tree as a post-processing step. step. + This method has found great popularity in the machine + This method has found great popularity in the machine learning community. learning community.
10
10 Decision Tree Pruning 1.) Grow the tree to learn the training data training data 2.) Prune tree to avoid overfitting the data the data
11
11 Methods to Validate the New Tree 1.Training and Validation Set Approach Divide dataset D into a training set TR and a testing set TE Divide dataset D into a training set TR and a testing set TE Build a decision tree on TR Build a decision tree on TR Test pruned trees on TE to decide the best final tree. Test pruned trees on TE to decide the best final tree. Dataset D Training TR Testing TE
12
12 Methods to Validate the New Tree 2. Use a statistical test Use all dataset D for training Use all dataset D for training Use a statistical test to decide if you should expand Use a statistical test to decide if you should expand the node or not (e.g., chi squared). the node or not (e.g., chi squared). Should I expand or not?
13
13 Methods to Validate the New Tree 3.Use an encoding scheme to capture the size of the tree and the errors made by the tree. errors made by the tree. Use all dataset D for training Use all dataset D for training Use the encoding scheme to know when to stop Use the encoding scheme to know when to stop growing the tree. growing the tree. The method is know as minimum description length The method is know as minimum description length principle. principle.
14
14 Training and Validation There are two approaches: A.Reduced Error Pruning B.Rule Post-Pruning Dataset D Training TR (normally 2/3 of D) Testing TE (normally 1/3 of D)
15
15 Reduced Error Pruning Main Idea: 1) Consider all internal nodes in the tree. 2)For each node check if removing it (along with the subtree below it) and assigning the most common class to it does below it) and assigning the most common class to it does not harm accuracy on the validation set. not harm accuracy on the validation set. 3)Pick the node n* that yields the best performance and prune its subtree. its subtree. 4) Go back to (2) until no more improvements are possible.
16
16 Example Original Tree Possible trees after pruning:
17
17 Example Pruned Tree Possible trees after 2 nd pruning:
18
18 Example Process continues until no improvement is observed on the validation set: 0.5 0.6 0.7 0.8 0.9 1.0 Size of the tree validation data Stop pruning the tree
19
19 Reduced Error Pruning Disadvantages: If the original data set is small, separating examples away for validation may leave you with few examples for training. validation may leave you with few examples for training. Dataset D Training TR Testing TE Small dataset Training set is too small and so is the validation set
20
20 Rule Post-Pruning Main Idea: 1) Convert the tree into a rule-based system. 2)Prune every single rules first by removing redundant conditions. conditions. 3) Sort rules by accuracy.
21
21 Example x1 x2 x3 A B A C 10 110 0 Original tree Rules: ~x1 & ~x2 -> Class A ~x1 & x2 -> Class B x1 & ~x3 -> Class A x1 & x3 -> Class C Possible rules after pruning (based on validation set): ~x1 -> Class A ~x1 & x2 -> Class B ~x3 -> Class A ~x3 -> Class A x1 & x3 -> Class C
22
22 Advantages of Rule Post-Pruning The language is more expressive Improves on interpretability Pruning is more flexible In practice this method yields high accuracy performance
23
23 Decision Trees Definition Mechanism Splitting Functions Hypothesis Space and Bias Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes
24
24 Discretizing Continuous Attributes Example: attribute temperature. 1) Order all values in the training set 2) Consider only those cut points where there is a change of class 3) Choose the cut point that maximizes information gain temperature 97 97.5 97.6 97.8 98.5 99.0 99.2 100 102.2 102.6 103.2
25
25 Missing Attribute Values We are at a node n in the decision tree. Different approaches: 1)Assign the most common value for that attribute in node n. 2)Assign the most common value in n among examples with the same classification as X. same classification as X. 3)Assign a probability to each value of the attribute based on the frequency of those values in node n. Each fraction is propagated frequency of those values in node n. Each fraction is propagated down the tree. down the tree. Example: X = (luminosity > T1, mass = ?)
26
26 Summary Decision-tree induction is a popular approach to classification that enables us to interpret the output hypothesis. that enables us to interpret the output hypothesis. The hypothesis space is very powerful: all possible DNF formulas. We prefer shorter trees than larger trees. Overfitting is an important issue in decision-tree induction. Different methods exist to avoid overfitting like reduced-error pruning and rule post-processing. pruning and rule post-processing. Techniques exist to deal with continuous attributes and missing attribute values. attribute values.
27
27 Tareas Leer Cap 3 de Mitchel desde 3.7 en adelante
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.