Machine Learning II Decision Tree Induction CSE 473
© Daniel S. Weld 2 Logistics PS 2 Due Review Session Tonight Exam on Wed Closed Book Questions like non-programming part of PSets PS 3 out in a Week
© Daniel S. Weld 3 Machine Learning Outline Machine learning: √Function approximation √Bias Supervised learning Classifiers A supervised learning technique in depth Induction of Decision Trees Ensembles of classifiers Overfitting
© Daniel S. Weld 4 Defined by restriction bias Terminology
© Daniel S. Weld 5 Past Insight Any ML problem may be cast as the problem of FUNCTION APPROXIMATION
© Daniel S. Weld 6 Types of Training Experience Credit assignment problem: Direct training examples: E.g. individual checker boards + correct move for each Supervised learning Indirect training examples : E.g. complete sequence of moves and final result Reinforcement learning Which examples: Random, teacher chooses, learner chooses
© Daniel S. Weld 7 Supervised Learning (aka Classification) FUNCTION APPROXIMATION The most simple case has experience: …
© Daniel S. Weld 8 Issues: Learning Algorithms
© Daniel S. Weld 9 Insight 2: Bias Learning occurs when PREJUDICE meets DATA! The nice word for prejudice is “bias”. What kind of hypotheses will you consider? What is allowable range of functions you use when approximating? What kind of hypotheses do you prefer?
© Daniel S. Weld 10 Two Strategies for ML Restriction bias: use prior knowledge to specify a restricted hypothesis space. Version space algorithm over conjunctions. Preference bias: use a broad hypothesis space, but impose an ordering on the hypotheses. Decision trees.
© Daniel S. Weld 11 Example: “Good day for sports” Attributes of instances Wind Temperature Humidity Outlook Feature = attribute with one value E.g. outlook = sunny Sample instance wind=weak, temp=hot, humidity=high, outlook=sunny
© Daniel S. Weld 12 Experience: “Good day for sports” Day OutlookTempHumidWindPlayTennis? d1shhwn d2shhsn d3ohhwy d4rmhw y d5rcnwy d6rcnsy d7ocnsy d8smhwn d9scnwy d10rmnwy d11smnsy d12omhsy d13ohnwy d14rmhsn
© Daniel S. Weld 13 Restricted Hypothesis Space conjunction
© Daniel S. Weld 14 Resulting Learning Problem
© Daniel S. Weld 15 Consistency Say an “example is consistent with a hypothesis” when they agree Hypothesis Sky Temp Humidity, Wind Examples:
© Daniel S. Weld 16 Naïve Algorithm
© Daniel S. Weld 17 More-General-Than Order {?, cold, high, ?, ?} >> {sun, cold, high, ? low}
© Daniel S. Weld 18 Ordering on Hypothesis Space
© Daniel S. Weld 19 Search Space Nodes? Operators?
© Daniel S. Weld 20 Two Strategies for ML Restriction bias: use prior knowledge to specify a restricted hypothesis space. Version space algorithm over conjunctions. Preference bias: use a broad hypothesis space, but impose an ordering on the hypotheses. Decision trees.
© Daniel S. Weld 21 Decision Trees Convenient Representation Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist)
© Daniel S. Weld 22 Classification E.g. Learn concept “Edible mushroom” Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed
© Daniel S. Weld 23 Experience: “Good day for tennis” Day OutlookTempHumidWindPlayTennis? d1shhwn d2shhsn d3ohhwy d4rmhw y d5rcnwy d6rcnsy d7ocnsy d8smhwn d9scnwy d10rmnwy d11smnsy d12omhsy d13ohnwy d14rmhsn
© Daniel S. Weld 24 Decision Tree Representation Outlook Humidity Wind Yes No Sunny Overcast Rain High Strong Normal Weak Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Decision tree is equivalent to logic in disjunctive normal form G-Day (Sunny Normal) Overcast (Rain Weak)
© Daniel S. Weld 25
© Daniel S. Weld 26
© Daniel S. Weld 27 DT Learning as Search Nodes Operators Initial node Heuristic? Goal? Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???)
© Daniel S. Weld 28 What is the Simplest Tree? Day OutlookTempHumidWindPlay? d1shhwn d2shhsn d3ohhwy d4rmhw y d5rcnwy d6rcnsy d7ocnsy d8smhwn d9scnwy d10rmnwy d11smnsy d12omhsy d13ohnwy d14rmhsn How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples
© Daniel S. Weld 29 Successors Yes Outlook Temp Humid Wind Which attribute should we use to split?
© Daniel S. Weld 30 To be decided: How to choose best attribute? Information gain Entropy (disorder) When to stop growing tree?