Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka
Outline What is a decision tree? How to build a decision tree Entropy Information Gain Overfitting Generalization performance Pruning Lecture slides http://www.jaist.ac.jp/~tsuruoka/lectures/
Decision trees Chapter 3 of Mitchell, T., Machine Learning (1997) Disjunction of conjunctions Successfully applied to a broad range of tasks Diagnosing medical cases Assessing credit risk of loan applications Nice characteristics Understandable to human Robust to noise
A decision tree Concept: PlayTennis Outlook Humidity Wind Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes
Classification by a decision tree Instance <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong> Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes
Disjunction of conjunctions (Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ^ Wind = Weak) Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes
Problems suited to decision trees Instanced are represented by attribute-value pairs The target function has discrete target values Disjunctive descriptions may be required The training data may contain errors The training data may contain missing attribute values
Training data Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14
Which attribute should be tested at each node? We want to build a small decision tree Information gain How well a given attribute separates the training examples according to their target classification Reduction in entropy Entropy (im)purity of an arbitrary collection of examples
Entropy If there are only two classes In general,
Information Gain The expected reduction in entropy achieved by splitting the training examples
Example
Coumpiting Information Gain Humidity Wind High Normal Weak Strong
Which attribute is the best classifier? Information gain
Splitting training data with Outlook {D1,D2,…,D14} [9+,5-] Outlook Sunny Overcast Rain {D1,D2,D8,D9,D11} [2+,3-] {D3,D7,D12,D13} [4+,0-] {D4,D5,D6,D10,D14} [3+,2-] Yes ? ?
Overfitting Growing each branch of the tree deeply enough to perfectly classify the training examples is not a good strategy. The resulting tree may overfit the training data Overfitting The tree can explain the training data very well but performs poorly on new data
Alleviating the overfitting problem Several approaches Stop growing the tree earlier Post-prune the tree How can we evaluate the classification performance of the tree for new data? The available data are separated into two sets of examples: a training set and a validation (development) set
Validation (development) set Use a portion of the original training data to estimate the generalization performance. Original training set Training set Validation set Test set Test set