Download presentation
Presentation is loading. Please wait.
1
Machine Learning: Decision Tree Learning
CMPT 420 / CMPG 720
2
Learning from Examples
An agent is learning if it improves its performance on future tasks after making observations about the world. One class of learning problem: from a collection of input-output pairs, learn a function that predicts the output for new inputs. e.g., weather forecast, games
4
Why learning? The designer cannot anticipate all changes
A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. Programmers sometimes have no idea how to program a solution recognizing faces
5
Types of Learning Supervised learning Unsupervised learning
example input-output pairs and learns a function e.g., spam detector Unsupervised learning correct answers not given e.g., clustering
7
Supervised Learning Learning a function/rule from specific input-output pairs is also called inductive learning. Given a training set of N example pairs: (x1,y1), (x2,y2), ..., (xN, yN) target unknown function y = f(x) Problem: find a hypothesis h such that h ≈ f h is generalized well if it correctly predicts the value of y for novel examples (test set).
8
Supervised Learning When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. Boolean or binary classification e.g., spam detector, male/female face When y is a number (tomorrow’s temperature), the problem is called regression.
9
Inductive learning method
The points are in the (x,y) plane, where y = f(x). We approximate f with h. Construct/adjust h to agree with f on training set
10
Inductive learning method
Construct/adjust h to agree with f on training set E.g., linear fitting:
11
Inductive learning method
Construct/adjust h to agree with f on training set E.g., curve fitting:
12
Inductive learning method
Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
13
Inductive learning method
Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: How to choose from among multiple consistent hypotheses?
14
Inductive learning method
Ockham’s razor: prefer the simplest hypothesis consistent with data (14th-century English philosopher William of Ockham)
15
Learning decision trees
One of the simplest and yet most successful forms of machine learning. A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output.
16
Learning decision trees
Problem: decide whether to wait for a table at a restaurant, based on the following attributes: Alternate: is there an alternative restaurant nearby? Bar: is there a comfortable bar area to wait in? Fri/Sat: is today Friday or Saturday? Hungry: are we hungry? Patrons: number of people in the restaurant (None, Some, Full) Price: price range ($, $$, $$$) Raining: is it raining outside? Reservation: have we made a reservation? Type: kind of restaurant (French, Italian, Thai, Burger) WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
17
Attribute-based representations
Examples described by attribute values A training set of 12 examples E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)
18
Decision tree
19
Decision tree no Price and Type
21
Goal: to find the most compact decision tree
22
Constructing the Decision Tree
Recursion: divides the problem up into smaller subproblems that can be solved recursively.
23
Constructing the Decision Tree
Recursion: Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively.
24
Choosing a good attribute
Which is a better choice?
25
Attribute-based representations
26
Attribute-based representations
27
Attribute-based representations
28
Choosing the Best Attribute:
Information theory (Shannon and Weaver 49) Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit)
29
Formula for Entropy Suppose we have a collection of 10 examples,
5 positive, 5 negative: H(1/2,1/2) = -1/2log21/2 -1/2log21/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log log2.99 = .08 bits
30
Choosing a good attribute
Which is a better choice?
31
Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Choose the attribute with the largest IG
32
Example contd. Decision tree learned from the 12 examples:
33
Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.