Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence 7. Decision trees

Similar presentations


Presentation on theme: "Artificial Intelligence 7. Decision trees"— Presentation transcript:

1 Artificial Intelligence 7. Decision trees
Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka

2 Outline What is a decision tree? How to build a decision tree
Entropy Information Gain Overfitting Generalization performance Pruning Lecture slides

3 Decision trees Chapter 3 of Mitchell, T., Machine Learning (1997)
Disjunction of conjunctions Successfully applied to a broad range of tasks Diagnosing medical cases Assessing credit risk of loan applications Nice characteristics Understandable to human Robust to noise

4 A decision tree Concept: PlayTennis Outlook Humidity Wind Sunny
Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

5 Classification by a decision tree
Instance <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong> Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

6 Disjunction of conjunctions
(Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ^ Wind = Weak) Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

7 Problems suited to decision trees
Instanced are represented by attribute-value pairs The target function has discrete target values Disjunctive descriptions may be required The training data may contain errors The training data may contain missing attribute values

8 Training data Day Outlook Temperature Humidity Wind PlayTennis D1
Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14

9 Which attribute should be tested at each node?
We want to build a small decision tree Information gain How well a given attribute separates the training examples according to their target classification Reduction in entropy Entropy (im)purity of an arbitrary collection of examples

10 Entropy If there are only two classes In general,

11 Information Gain The expected reduction in entropy achieved by splitting the training examples

12 Example

13 Coumpiting Information Gain
Humidity Wind High Normal Weak Strong

14 Which attribute is the best classifier?
Information gain

15 Splitting training data with Outlook
{D1,D2,…,D14} [9+,5-] Outlook Sunny Overcast Rain {D1,D2,D8,D9,D11} [2+,3-] {D3,D7,D12,D13} [4+,0-] {D4,D5,D6,D10,D14} [3+,2-] Yes ? ?

16 Overfitting Growing each branch of the tree deeply enough to perfectly classify the training examples is not a good strategy. The resulting tree may overfit the training data Overfitting The tree can explain the training data very well but performs poorly on new data

17 Alleviating the overfitting problem
Several approaches Stop growing the tree earlier Post-prune the tree How can we evaluate the classification performance of the tree for new data? The available data are separated into two sets of examples: a training set and a validation (development) set

18 Validation (development) set
Use a portion of the original training data to estimate the generalization performance. Original training set Training set Validation set Test set Test set


Download ppt "Artificial Intelligence 7. Decision trees"

Similar presentations


Ads by Google