A C B
Will I play tennis today? Features – Outlook: {Sun, Overcast, Rain} – Temperature:{Hot, Mild, Cool} – Humidity:{High, Normal, Low} – Wind:{Strong, Weak} Labels – Binary classification task: Y = {+, -} 2
Will I play tennis today? OTHWPlay? 1SHHW- 2SHHS- 3OHHW+ 4RMHW+ 5RCNW+ 6RCNS- 7OCNS+ 8SMHW- 9SCNW+ 10RMNW+ 11SMNS+ 12OMHS+ 13OHNW+ 14RMHS- 3 Outlook:S(unny), O(vercast), R(ainy) Temperature: H(ot), M(edium), C(ool) Humidity:H(igh), N(ormal), L(ow) Wind:S(trong), W(eak)
Consider data with two Boolean attributes (A,B). : 50 examples : 0 examples : 100 examples
Consider data with two Boolean attributes (A,B). : 50 examples : 0 examples 3 examples : 100 examples
111
Will I play tennis today? OTHWPlay? 1SHHW- 2SHHS- 3OHHW+ 4RMHW+ 5RCNW+ 6RCNS- 7OCNS+ 8SMHW- 9SCNW+ 10RMNW+ 11SMNS+ 12OMHS+ 13OHNW+ 14RMHS- 7 Outlook:S(unny), O(vercast), R(ainy) Temperature: H(ot), M(edium), C(ool) Humidity:H(igh), N(ormal), L(ow) Wind:S(trong), W(eak)
Information Gain: Outlook OTHWPlay? 1SHHW- 2SHHS- 3OHHW+ 4RMHW+ 5RCNW+ 6RCNS- 7OCNS+ 8SMHW- 9SCNW+ 10RMNW+ 11SMNS+ 12OMHS+ 13OHNW+ 14RMHS- 8 Outlook = sunny: p = 2/5 n = 3/5H S = Outlook = overcast: p = 4/4 n = 0H o = 0 Outlook = rainy: p = 3/5 n = 2/5H R = Expected entropy: (5/14)× (4/14)×0 + (5/14)×0.971 = Information gain: – = 0.246
Information Gain: Humidity OTHWPlay? 1SHHW- 2SHHS- 3OHHW+ 4RMHW+ 5RCNW+ 6RCNS- 7OCNS+ 8SMHW- 9SCNW+ 10RMNW+ 11SMNS+ 12OMHS+ 13OHNW+ 14RMHS- 9 Humidity = high: p = 3/7 n = 4/7H h = Humidity = Normal: p = 6/7 n = 1/7H o = Expected entropy: (7/14)× (7/14)×0.592= Information gain: – =
Which feature to split on? OTHWPlay? 1SHHW- 2SHHS- 3OHHW+ 4RMHW+ 5RCNW+ 6RCNS- 7OCNS+ 8SMHW- 9SCNW+ 10RMNW+ 11SMNS+ 12OMHS+ 13OHNW+ 14RMHS- 10 Information gain: Outlook: Humidity: Wind: Temperature: → Split on Outlook