Iterative Dichotomiser 3 (ID3) Algorithm Medha Pradhan CS 157B, Spring 2007
Agenda Basics of Decision Tree Introduction to ID3 Entropy and Information Gain Two Examples
Basics What is a decision tree? A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf node Decision node: Specifies a test of some attribute Leaf node: Indicates classification of an example
ID3 Invented by J. Ross Quinlan Employs a top-down greedy search through the space of possible decision trees. Greedy because there is no backtracking. It picks highest values first. Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).
Entropy Entropy measures the impurity of an arbitrary collection of examples. For a collection S, entropy is given as: For a collection S having positive and negative examples Entropy(S) = -p + log 2 p + - p - log 2 p - where p + is the proportion of positive examples and p - is the proportion of negative examples In general, Entropy(S) = 0 if all members of S belong to the same class. Entropy(S) = 1 (maximum) when all members are split equally.
Information Gain Measures the expected reduction in entropy. The higher the IG, more is the expected reduction in entropy. where Values(A) is the set of all possible values for attribute A, S v is the subset of S for which attribute A has value v.
Example 1 Sample training data to determine whether an animal lays eggs. Independent/Condition attributes Dependent/ Decision attributes AnimalWarm- blooded FeathersFurSwimsLays Eggs OstrichYes No Yes CrocodileNo Yes RavenYes No Yes AlbatrossYes No Yes DolphinYesNo YesNo KoalaYesNoYesNo
Entropy(4Y,2N): -(4/6)log 2 (4/6) – (2/6)log 2 (2/6) = Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims
For attribute ‘Warm-blooded’: Values(Warm-blooded) : [Yes,No] S = [4Y,2N] S Yes = [3Y,2N] E(S Yes ) = S No = [1Y,0N] E(S No ) = 0 (all members belong to same class) Gain(S,Warm-blooded) = – [(5/6)* (1/6)*0] = For attribute ‘Feathers’: Values(Feathers) : [Yes,No] S = [4Y,2N] S Yes = [3Y,0N] E(S Yes ) = 0 S No = [1Y,2N] E(S No ) = Gain(S,Feathers) = – [(3/6)*0 + (3/6)* ] =
For attribute ‘Fur’: Values(Fur) : [Yes,No] S = [4Y,2N] S Yes = [0Y,1N] E(S Yes ) = 0 S No = [4Y,1N] E(S No ) = Gain(S,Fur) = – [(1/6)*0 + (5/6)*0.7219] = For attribute ‘Swims’: Values(Swims) : [Yes,No] S = [4Y,2N] S Yes = [1Y,1N] E(S Yes ) = 1 (equal members in both classes) S No = [3Y,1N] E(S No ) = Gain(S,Swims) = – [(2/6)*1 + (4/6)* ] =
Gain(S,Warm-blooded) = Gain(S,Feathers) = Gain(S,Fur) = Gain(S,Swims) = Gain(S,Feathers) is maximum, so it is considered as the root node Feathers YN [Ostrich, Raven, Albatross] [Crocodile, Dolphin, Koala] Lays Eggs ? Anim al War m- blood ed Feath ers FurSwim s Lays Eggs Ostric h Yes No Yes Croco dile No Yes RavenYes No Yes Albatr oss Yes No Yes Dolph in YesNo YesNo KoalaYesNoYesNo The ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’
We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-] Entropy(S) = -(1/3)log 2 (1/3) – (2/3)log 2 (2/3) = AnimalWarm- blooded FeathersFurSwimsLays Eggs CrocodileNo Yes DolphinYesNo YesNo KoalaYesNoYesNo
For attribute ‘Warm-blooded’: Values(Warm-blooded) : [Yes,No] S = [1Y,2N] S Yes = [0Y,2N] E(S Yes ) = 0 S No = [1Y,0N] E(S No ) = 0 Gain(S,Warm-blooded) = – [(2/3)*0 + (1/3)*0] = For attribute ‘Fur’: Values(Fur) : [Yes,No] S = [1Y,2N] S Yes = [0Y,1N] E(S Yes ) = 0 S No = [1Y,1N] E(S No ) = 1 Gain(S,Fur) = – [(1/3)*0 + (2/3)*1] = For attribute ‘Swims’: Values(Swims) : [Yes,No] S = [1Y,2N] S Yes = [1Y,1N] E(S Yes ) = 1 S No = [0Y,1N] E(S No ) = 0 Gain(S,Swims) = – [(2/3)*1 + (1/3)*0] = Gain(S,Warm-blooded) is maximum
The final decision tree will be: Feathers YN Lays eggs Warm-blooded Y N Lays Eggs Does not lay eggs
Example 2 Factors affecting sunburn NameHairHeightWeightLotionSunburned SarahBlondeAverageLightNoYes DanaBlondeTallAverageYesNo AlexBrownShortAverageYesNo AnnieBlondeShortAverageNoYes EmilyRedAverageHeavyNoYes PeteBrownTallHeavyNo JohnBrownAverageHeavyNo KatieBlondeShortLightYesNo
S = [3+, 5-] Entropy(S) = -(3/8)log 2 (3/8) – (5/8)log 2 (5/8) = Find IG for all 4 attributes: Hair, Height, Weight, Lotion For attribute ‘Hair’: Values(Hair) : [Blonde, Brown, Red] S = [3+,5-] S Blonde = [2+,2-] E(S Blonde ) = 1 S Brown = [0+,3-]E(S Brown ) = 0 S Red = [1+,0-]E(S Red ) = 0 Gain(S,Hair) = – [(4/8)*1 + (3/8)*0 + (1/8)*0] =
For attribute ‘Height’: Values(Height) : [Average, Tall, Short] S Average = [2+,1-] E(S Average ) = S Tall = [0+,2-]E(S Tall ) = 0 S Short = [1+,2-]E(S Short ) = Gain(S,Height) = – [(3/8)* (2/8)*0 + (3/8)* ] = For attribute ‘Weight’: Values(Weight) : [Light, Average, Heavy] S Light = [1+,1-] E(S Light ) = 1 S Average = [1+,2-]E(S Average ) = S Heavy = [1+,2-]E(S Heavy ) = Gain(S,Weight) = – [(2/8)*1 + (3/8)* (3/8)* ] = For attribute ‘Lotion’: Values(Lotion) : [Yes, No] SYes = [0+,3-] E(S Yes ) = 0 S No = [3+,2-]E(S No ) = Gain(S,Lotion) = – [(3/8)*0 + (5/8)* ] =
Gain(S,Hair) = Gain(S,Height) = Gain(S,Weight) = Gain(S,Lotion) = Gain(S,Hair) is maximum, so it is considered as the root node NameHairHeightWeigh t LotionSunbur ned SarahBlondeAverag e LightNoYes DanaBlondeTallAverag e YesNo AlexBrownShortAverag e YesNo AnnieBlondeShortAverag e NoYes EmilyRedAverag e HeavyNoYes PeteBrownTallHeavyNo JohnBrownAverag e HeavyNo KatieBlondeShortLightYesNo Hair Blonde Red Brown [Sarah, Dana, Annie, Katie] [Emily] [Alex, Pete, John] Sunburned Not Sunburned ?
Repeating again: S = [Sarah, Dana, Annie, Katie] S: [2+,2-] Entropy(S) = 1 Find IG for remaining 3 attributes Height, Weight, Lotion For attribute ‘Height’: Values(Height) : [Average, Tall, Short] S = [2+,2-] S Average = [1+,0-] E(S Average ) = 0 S Tall = [0+,1-]E(S Tall ) = 0 S Short = [1+,1-]E(S Short ) = 1 Gain(S,Height) = 1 – [(1/4)*0 + (1/4)*0 + (2/4)*1] = 0.5 NameHairHeightWeightLotionSunburned SarahBlondeAverageLightNoYes DanaBlondeTallAverageYesNo AnnieBlondeShortAverageNoYes KatieBlondeShortLightYesNo
For attribute ‘Weight’: Values(Weight) : [Average, Light] S = [2+,2-] S Average = [1+,1-] E(S Average ) = 1 S Light = [1+,1-]E(S Light ) = 1 Gain(S,Weight) = 1 – [(2/4)*1 + (2/4)*1] = 0 For attribute ‘Lotion’: Values(Lotion) : [Yes, No] S = [2+,2-] S Yes = [0+,2-] E(S Yes ) = 0 S No = [2+,0-]E(S No ) = 0 Gain(S,Lotion) = 1 – [(2/4)*0 + (2/4)*0] = 1 Therefore, Gain(S,Lotion) is maximum
In this case, the final decision tree will be Hair Blonde Red Brown Sunburned Not Sunburned Lotion Y N Sunburned Not Sunburned
References "Machine Learning", by Tom Mitchell, McGraw-Hill, 1997 "Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June html 1.html Professor Sin-Min Lee, SJSU.