Download presentation
Presentation is loading. Please wait.
1
ID3 and Decision tree by Tuan Nguyen May 2008
2
ID3 algorithm Is the algorithm to construct a decision tree
ID3 and Decision tree Is the algorithm to construct a decision tree Using Entropy to generate the information gain The best value then be selected May 2008
3
E(S) = -(p+)*log2(p+ ) - (p_ )*log2(p_ )
Entropy ID3 and Decision tree The complete formula for entropy is: E(S) = -(p+)*log2(p+ ) - (p_ )*log2(p_ ) Where p+ is the positive samples Where p_ is the negative samples Where S is the sample of attributions May 2008
4
Example E(A) = -29/(29+35)*log2(29/(29+35)) –
ID3 and Decision tree The Entropy of A1 is computed as the following: A1=? True False [21+, 5-] [8+, 30-] [29+,35-] E(A) = -29/(29+35)*log2(29/(29+35)) – 35/(35+29)log2(35/(35+29)) = E(TRUE) = - 21/(21+5)*log2(21/(21+5)) – 5/(5+21)*log2(5/(5+21)) = E(FALSE) = -8/(8+30)*log2(8/(8+30)) – 30/(30+8)*log2(30/(30+8)) = The Entropy of True: The Entropy of False: May 2008
5
Information Gain ID3 and Decision tree Gain (Sample, Attributes) or Gain (S,A) is expected reduction in entropy due to sorting S on attribute A So, for the previous example, the Information gain is calculated: G(A1) = E(A1) - (21+5)/(29+35) * E(TRUE) - (8+30)/(29+35) * E(FALSE) = E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE) = – 26/64 * – 38/64* = Gain(S,A) = Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv) May 2008
6
The complete example Consider the following table
ID3 and Decision tree Consider the following table Day Outlook Temp. Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 Cold D10 D11 D12 D13 D14 May 2008
7
Decision tree We want to build a decision tree for the tennis matches
ID3 and Decision tree We want to build a decision tree for the tennis matches The schedule of matches depend on the weather (Outlook, Temperature, Humidity, and Wind) So to apply what we know to build a decision tree based on this table May 2008
8
Example ID3 and Decision tree Calculating the information gains for each of the weather attributes: For the Wind For the Humidity For the Outlook May 2008
9
For the Wind Wind Weak Strong [6+, 2-] [3+, 3-] S=[9+,5-] E=0.940
ID3 and Decision tree Wind Weak Strong [6+, 2-] [3+, 3-] S=[9+,5-] E=0.940 Gain(S,Wind): = (8/14)* (6/14)*1.0 =0.048 May 2008
10
For the Humidity Humidity High Normal [6+, 1-] S=[9+,5-] E=0.940
ID3 and Decision tree For the Humidity Humidity High Normal [6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 [3+, 4-] May 2008
11
For the Outlook Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940
ID3 and Decision tree Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 E=0.971 Over cast [4+, 0] E=0.0 Gain(S,Outlook) =0.940-(5/14)* (4/14)*0.0 – (5/14)*0.0971 =0.247 May 2008
12
Complete tree Then here is the complete tree: ID3 and Decision tree
Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] May 2008
13
Reference: Dr. Lee’s Slides, San Jose State University, Spring 2007
"Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996 "Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989 May 2008
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.