Download presentation
Presentation is loading. Please wait.
1
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
2
D Trees 2 Classifying Apples & Pears
3
D Trees 3 Width Height >55<55 <59>59 Apple Pear A Decision Tree
4
D Trees 4 A Decision Tree Width Height >55<55 <59>59 Apple Pear Height/Width <1.2>1.2 Apple Pear
5
D Trees 5 Decision Trees Each internal node tests an attribute Each branch corresponds to an attribute value Each leaf node assigns a classification Cannot readily represent , , XOR (A B) , (C D E) M of N
6
D Trees 6 When to consider D-Trees Instances described by attribute-value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Classification can be done using a few features Examples Equipment or medical diagnosis Credit risk analysis
7
D Trees 7 D-Tree Example AlternativeWhether there is a suitable alternative restaurant nearby. BarIs there a comfortable bar area? Fri/SatTrue on Friday or Saturday nights. HungaryHow hungry is the subject? PatronsHow many people in the restaurant? PricePrice range. RainingIs it raining outside? Reservation Does the subject have a reservation? TypeType of Restaurant. Stay?Stay or Go
8
D Trees 8 D-Tree Example
9
D Trees 9 D-Tree Example Very good D-Tree Classifies all examples correctly Very few nodes Objective in building a decision tree is to choose attributes so as to minimise the depth of the tree
10
D Trees 10 Top-down induction of D-Trees 1. A the “best” decision attribute for next node 2.Assign A as decision attribute for node 3. For each value of A create new descendant of node 4. Sort training examples to leaf nodes 5. If training examples perfectly classified, Then Stop, Else repeat recursively over leaf nodes Which attribute is best?
11
D Trees 11 Good and Bad Attributes A perfect attribute divides examples into categories of one type. (e.g. Patrons) A poor attribute produces categories of mixed type. (e.g. Type) How can we measure this?
12
D Trees 12 Entropy S is a sample of training examples p is the proportion of positive examples in S q is the proportion of positive examples in S Entropy measures the impurity of S Entropy(S) = -plog 2 (p) -qlog 2 (q)
13
D Trees 13 Entropy Entropy(S) = expected number of bits needed to encode class (p or q) of randomly drawn members of S (under optimal shortest length code) Why? Information theory: optimal length code assigns - log 2 (q) bits to messages having probability p. So, expected number of bits to encode messages in ratio p:q of random members of S. -p(log 2 (p)) -q(log 2 (q)) i.e. Entropy(S) = -plog 2 (p) -qlog 2 (q)
14
D Trees 14 Information Gain Gain(S,A) = expected reduction in entropy due to sorting on A
15
D Trees 15 D-Tree Example
16
D Trees 16 Minimal D-Tree
17
D Trees 17 Summary ML avoids some KE effort Recursive algorithm for bulding D-Trees Using informatio gain (Entropy) to select discriminating attribute Example Important People Claude Shannon http://en.wikipedia.org/wiki/Claude_Shannon William of Ockham http://en.wikipedia.org/wiki/William_of_Ockham
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.