Artificial Intelligence 7. Decision trees

Artificial Intelligence 7. Decision trees
Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka

Outline What is a decision tree? How to build a decision tree
Entropy Information Gain Overfitting Generalization performance Pruning Lecture slides

Decision trees Chapter 3 of Mitchell, T., Machine Learning (1997)
Disjunction of conjunctions Successfully applied to a broad range of tasks Diagnosing medical cases Assessing credit risk of loan applications Nice characteristics Understandable to human Robust to noise

A decision tree Concept: PlayTennis Outlook Humidity Wind Sunny
Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

Classification by a decision tree
Instance <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong> Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

Disjunction of conjunctions
(Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ^ Wind = Weak) Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

Problems suited to decision trees
Instanced are represented by attribute-value pairs The target function has discrete target values Disjunctive descriptions may be required The training data may contain errors The training data may contain missing attribute values

Training data Day Outlook Temperature Humidity Wind PlayTennis D1
Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14

Which attribute should be tested at each node?
We want to build a small decision tree Information gain How well a given attribute separates the training examples according to their target classification Reduction in entropy Entropy (im)purity of an arbitrary collection of examples

Entropy If there are only two classes In general,

Information Gain The expected reduction in entropy achieved by splitting the training examples

Example

Coumpiting Information Gain
Humidity Wind High Normal Weak Strong

Which attribute is the best classifier?
Information gain

Splitting training data with Outlook
{D1,D2,…,D14} [9+,5-] Outlook Sunny Overcast Rain {D1,D2,D8,D9,D11} [2+,3-] {D3,D7,D12,D13} [4+,0-] {D4,D5,D6,D10,D14} [3+,2-] Yes ? ?

Overfitting Growing each branch of the tree deeply enough to perfectly classify the training examples is not a good strategy. The resulting tree may overfit the training data Overfitting The tree can explain the training data very well but performs poorly on new data

Alleviating the overfitting problem
Several approaches Stop growing the tree earlier Post-prune the tree How can we evaluate the classification performance of the tree for new data? The available data are separated into two sets of examples: a training set and a validation (development) set

Validation (development) set
Use a portion of the original training data to estimate the generalization performance. Original training set Training set Validation set Test set Test set

Artificial Intelligence 7. Decision trees

Similar presentations

Presentation on theme: "Artificial Intelligence 7. Decision trees"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Artificial Intelligence 7. Decision trees

Similar presentations

Presentation on theme: "Artificial Intelligence 7. Decision trees"— Presentation transcript:

Similar presentations

About project

Feedback