Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007.

Similar presentations


Presentation on theme: "By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007."— Presentation transcript:

1 By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007

2 Overview  Introduction  Entropy  Information Gain  Detailed Example Walkthrough  Conclusion  References

3 Introduction  ID3 algorithm is a greedy algorithm for decision tree construction developed by Ross Quinlan in 1987.  ID3 algorithm uses information gain to select best attribute: Max-Gain (highest gain) for splitting Attribute with most useful information to split

4 Entropy  Measure the impurity or randomness of an example collection.  A quantitative measurement of the homogeneity of a set of examples.  In other words, it tells us how well an attribute separating the given examples according to the target classification class.

5 Entropy (cont.)  Entropy (S) = -P positive log 2 P positive – P negative log 2 P negative Where: - P positive = proportion of positive examples - P negative = proportion of negative examples Example: If S is a collection of 14 examples with 9 YES and 5 NO, then: Entropy(S) = - (9/14) log 2 (9/14) - (5/14) log 2 (5/14) = 0.940

6 Entropy (cont.)  More than two values: Entropy(S) = ∑ -p(i) log 2 p(i) - Result will be between 0 and 1. - Special cases: AgeIncomeBuys Computer <= 20LowYes 21…40HighYes >40MediumYes AgeIncomeBuys Computer <15LowNo >= 25HighYes If Entropy(S) = 1(max value) members are split equally between the two classes (min uniformity, max randomness) If Entropy(S) = 0 all members in S belong to strictly one class (max uniformity, min randomness)

7 Information Gain  A statistical property measures how well a given attribute separates example collection into target classes.  ID3 algorithm uses highest information (most useful for classification) to select best attribute

8 Information Gain (cont.)  Gain(S, A) = Entropy(S) – ∑((|S v | / |S|) *Entropy(S v )) Where:  A is an attribute of collection S  S v = subset of S for which attribute A has value v  |S v | = number of elements in S v  |S| = number of elements in S

9 Information Gain (cont.) Example: Collection S = 14 examples (9 YES - 5 NO) Wind speed is one attribute of S = {Weak, Strong} - Weak = 8 occurrences (6 YES - 2 NO) - Strong = 6 occurrences (3 YES - 3 NO) Calculation: Entropy(S) = - (9/14) log 2 (9/14) - (5/14) log 2 (5/14) = 0.940 Entropy(S weak ) = - (6/8)*log 2 (6/8) - (2/8)*log 2 (2/8) = 0.811 Entropy(S strong ) = - (3/6)*log 2 (3/6) - (3/6)*log 2 (3/6) = 1.00 Gain(S,Wind) = Entropy(S) - (8/14)*Entropy(S weak ) - (6/14)*Entropy(S strong ) = 0.940 - (8/14)*0.811 - (6/14)*1.00 = 0.048 - For each attribute in S, the gain is calculated and the highest gain is used in the root node or decision node.

10 Example Walkthrough  Example of company sending out some promotion to various houses and recording a few facts about each house and also whether people responded or not: DistrictHouse TypeIncomePrevious CustomerOutcome SuburbanDetachedHighNoNothing SuburbanDetachedHighRespondedNothing RuralDetachedHighNoResponded UrbanSemi-detachedHighNoResponded UrbanSemi-detachedLowNoResponded UrbanSemi-detachedLowRespondedNothing RuralSemi-detachedLowResponded SuburbanTerraceHighNoNothing SuburbanSemi-detachedLowNoResponded UrbanTerraceLowNoResponded SuburbanTerraceLowResponded RuralTerraceHighResponded RuralDetachedLowNoResponded UrbanTerraceHighRespondedNothing

11 Example Walkthrough (cont.) DistrictHouse TypeIncomePrevious CustomerOutcome SuburbanDetachedHighNoNothing SuburbanDetachedHighRespondedNothing RuralDetachedHighNoResponded UrbanSemi-detachedHighNoResponded UrbanSemi-detachedLowNoResponded UrbanSemi-detachedLowRespondedNothing RuralSemi-detachedLowResponded SuburbanTerraceHighNoNothing SuburbanSemi-detachedLowNoResponded UrbanTerraceLowNoResponded SuburbanTerraceLowResponded RuralTerraceHighResponded RuralDetachedLowNoResponded UrbanTerraceHighRespondedNothing The target classification is “Outcome” which can be “Responded” or “Nothing”. The attributes in collection are “District, House Type, Income, Previous Customer, and Outcome”. They have the following values: - District = {Suburban, Rural, Urban} - House Type = {Detached, Semi-detached, Terrace} - Income = {High, Low} - Previous Customer = {No, Responded} - Outcome = {Nothing, Responded}

12 Example Walkthrough (cont.) DistrictHouse TypeIncomePrevious Customer Outcome SuburbanDetachedHighNoNothing SuburbanDetachedHighRespondedNothing RuralDetachedHighNoResponded UrbanSemi-detachedHighNoResponded UrbanSemi-detachedLowNoResponded UrbanSemi-detachedLowRespondedNothing RuralSemi-detachedLowResponded SuburbanTerraceHighNoNothing SuburbanSemi-detachedLowNoResponded UrbanTerraceLowNoResponded SuburbanTerraceLowResponded RuralTerraceHighResponded RuralDetachedLowNoResponded UrbanTerraceHighRespondedNothing Detailed Calculation for Gain(S, District): Entropy (S = [9/14 responses, 5/14 no responses]) = -9/14 log 2 9/14 - 5/14 log 2 5/14 = 0.40978 + 0.5305 = 0.9403 Entropy(S District = Suburban = [2/5 responses, 3/5 no responses]) = -2/5 log 2 2/5 – 3/5 log 2 3/5 = 0.5288 + 0.4422 = 0.9709 Entropy(S District = Rural = [4/4 responses, 0/4 no responses]) = -4/4 log 2 4/4 = 0 Entropy(S District = Urban = [3/5 responses, 2/5 no responses]) = -3/5 log 2 3/5 – 2/5 log 2 2/5 = 0.4422 + 0.5288 = 0.9709 Gain(S, District) = Entropy(S) – ((5/14) * Entropy(SDistrict = Suburban) + (5/14) * Entropy(SDistrict = Urban) + (4/14) * Entropy(SDistrict = Rural)) = 0.9403 – ((5/14)*0.9709 + (5/14)*0 + (4/14)*0.9709) = 0.9403 – 0.3468 – 0 – 0.34678 = 0.2468

13 Example Walkthrough (cont.) So we now have: Gain(S, District) = 0.2468 Apply the same process to the remaining 3 attributes of S, we get: - Gain(S,House Type) = 0.049 - Gain(S,Income) = 0.151 - Gain(S,Previous Customer) = 0.048 Comparing the information gain of the four attributes, we see that “District” has the highest value.  District will be the root node of the decision tree. So far the decision tree will look like following: District ??? Suburban Rural Urban

14 Example Walkthrough (cont.) Apply the same process to the left side of the root node (Suburban), we get: - Entropy(S suburban ) = 0.970 - Gain(S suburban,House Type) = 0.570 - Gain(S suburban,Income) = 0.970 - Gain(S suburban,Previous Customer) = 0.019 The information gain of “Income” is highest:  Income will be the decision node. The decision tree will look like following: District Income??? Suburban Rural Urban

15 Example Walkthrough (cont.) For the center of the root node (Rural), it is a special case because: - Entropy(S Rura l) = 0  all members in S Rural belong to strictly one target classification class (responded) Thus, we skip all the calculation and add the corresponding target classification value to the tree. The decision will look like following: District IncomeResponded??? Suburban Rural Urban

16 Example Walkthrough (cont.) Apply the same process to the right side of the root node (Urban), we get: - Entropy(S urban ) = 0.970 - Gain(S urban,House Type) = 0.019 - Gain(S urban,Income) = 0.019 - Gain(S urban,Previous Customer) = 0.970 The information gain of “Previous Customer” is highest:  Previous Customer will be the decision node. The decision tree will look like following: District Income Previous Customer Suburban Rural Urban Responded

17 Now, with “Income” and “Previous Customer” as decision nodes, we no longer can split the decision tree based on the attributes because it has reach the target classification class. For “Income” side, we have High  Nothing and Low  Responded. For “Previous Customer” side, we have No  Responded and Yes  Nothing  The final decision tree will look like following: District Income Previous Customer Suburban Rural Urban Responded Nothing HighLow No Yes

18 Conclusion  ID3 algorithm is easy to use if we know how it works.  Industry has shown that ID3 has been effective for data mining.  ID3 algorithm is one of the most important techniques in data mining.

19 References  Dr. Lee’s Slides, San Jose State University, Spring 2007  "Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996  "Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989  http://www.cise.ufl.edu/~ddd/cap6635/Fall- 97/Short-papers/2.htm  http://decisiontrees.net/node/27


Download ppt "By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007."

Similar presentations


Ads by Google