Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning
Each branch corresponds to attribute value Each internal node has a splitting predicate Each leaf node assigns a classification Machine Learning
Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. Machine Learning
If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. For multi-class problems with c categories, entropy generalizes to: Machine Learning
Gain (S, A) = expected reduction in entropy due to sorting on A Values (A) is the set of all possible values for attribute A, Sv is the subset of S which attribute A has value v Gain(S,A) is the expected reduction in entropy caused by knowing the values of attribute A. Machine Learning
Humidity High Normal 3+,4- 6+,1- E =.985 E =.592 Machine Learning
Humidity Wind High NormalWeakStrong 3+,4- 6+, ,3- E =.985 E =.592 E =.811 E =1.0 Machine Learning
Humidity Wind High NormalWeakStrong 3+,4- 6+, ,3- E =.985 E =.592 E =.811 E =1.0 Gain(S, Humidity ) = / / = Gain(S, Wind ) = / / = Machine Learning
Outlook OvercastRain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+, Gain(S, Outlook ) = Machine Learning
Outlook Gain(S, Wind ) =0.048 Gain(S, Humidity ) =0.151 Gain(S, Temperature ) =0.029 Gain(S, Outlook ) =0.246 Machine Learning
Outlook OvercastRain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- Yes?? Machine Learning
Outlook Overcast Rain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- Yes??.97-(3/5) 0-(2/5) 0 = (2/5) 1 = (2/5) 1 - (3/5).92=.02 Machine Learning
Outlook Overcast Rain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- YesHumidity? NormalHigh NoYes Machine Learning
Outlook Overcast Rain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- YesHumidityWind NormalHigh No Yes WeakStrong No Yes Machine Learning
Person Hair Length WeightAge Class Homer0”25036 M Vardhan10”15034 F Kumar2”9010 M Lisa6”788 F Maggie4”201 F Abe1”17070 M Selma8”16041 F Sai10”18038 M Krusty6”20045 M Machine Learning
Hair Length <= 5? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(1F,3M) = -(1/4)log 2 (1/4) - (3/4)log 2 (3/4) = Entropy(3F,2M) = -(3/5)log 2 (3/5) - (2/5)log 2 (2/5) = Gain(Hair Length <= 5) = – (4/9 * /9 * ) = Let us try splitting on Hair length Machine Learning
Weight <= 160? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(4F,1M) = -(4/5)log 2 (4/5) - (1/5)log 2 (1/5) = Entropy(0F,4M) = -(0/4)log 2 (0/4) - (4/4)log 2 (4/4) = 0 Let us try splitting on Weight Machine Learning
age <= 40? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(3F,3M) = -(3/6)log 2 (3/6) - (3/6)log 2 (3/6) = 1 Entropy(1F,2M) = -(1/3)log 2 (1/3) - (2/3)log 2 (2/3) = Let us try splitting on Age Machine Learning
Weight <= 160? yesno Hair Length <= 2? yes no Male Female Machine Learning
Weight <= 160? yesno Hair Length <= 2? yes no Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female Machine Learning
Thank you!