Download presentation
Presentation is loading. Please wait.
Published byAugustine Brendan McLaughlin Modified over 9 years ago
1
CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1
2
CSC 8520 Spring 2013. Paula Matuszek 2 Decision Tree Induction Very common machine learning and data mining technique. Given: –Examples –Attributes –Goal (classification, typically) Pick “important” attribute: one which divides set cleanly. Recur with subsets not yet classified.
3
CSC 8520 Spring 2013. Paula Matuszek 3 A Training Set
4
CSC 8520 Spring 2013. Paula Matuszek 4 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples Prefer to find more compact decision trees
5
CSC 8520 Spring 2013. Paula Matuszek 5 ID3 A greedy algorithm for decision tree construction originally developed by Ross Quinlan, 1987 Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node in tree –Once attribute is selected, generate children nodes, one for each possible value of selected attribute –Partition examples using possible values of attribute, assign subsets of examples to appropriate child node –Repeat for each child node until all examples associated with a node are either all positive or all negative
6
CSC 8520 Spring 2013. Paula Matuszek Best Attribute What’s the best attribute to choose? The one with the best information gain –If we choose Bar, we have no: 3 -, 3 + yes: 3 -, 3+ –If we choose Hungry, we have no: 4-, 1 + yes: 1 -, 5+ –Hungry has given us more information about the correct classification. 6
7
CSC 8520 Spring 2013. Paula Matuszek 7 Textbook restaurant domain Develop a decision tree to model the decision a patron makes when deciding whether or not to wait for a table at a restaurant Two classes: wait, leave Ten attributes: Alternative available? Bar in restaurant? Is it Friday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time? Training set of 12 examples ~ 7000 possible cases
8
CSC 8520 Spring 2013. Paula Matuszek Thinking About It What might you expect a decision tree to have as the first question? The second? 8
9
CSC 8520 Spring 2013. Paula Matuszek 9 A Decision Tree from Introspection
10
CSC 8520 Spring 2013. Paula Matuszek 10 Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Patrons? is a better choice
11
CSC 8520 Spring 2013. Paula Matuszek 11 Learned Tree
12
CSC 8520 Spring 2013. Paula Matuszek 12 How well does it work? Many case studies have shown that decision trees are at least as accurate as human experts. –A study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct –British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system –Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example
13
CSC 8520 Spring 2013. Paula Matuszek Pruning With enough levels of a decision tree we can always get the leaves to be 100% positive or negative But if we are down to one or two cases in each leaf we are probably overfitting Useful to prune leaves; stop when –we reach a certain level –we reach a small enough size leaf –our information gain is increasing too slowly 13
14
CSC 8520 Spring 2013. Paula Matuszek 14 Strengths of Decision Trees Strengths include –Fast to learn and to use –Simple to implement –Can look at the tree and see what is going on -- relatively “white box” –Empirically valid in many commercial products –Handles noisy data (with pruning) C4.5 and C5.0 are extension of ID3 that account for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation.
15
CSC 8520 Spring 2013. Paula Matuszek 15 Weaknesses and Issues Weaknesses include: –Univariate splits/partitioning (one attribute at a time) limits types of possible trees –Large decision trees may be hard to understand –Requires fixed-length feature vectors –Non-incremental (i.e., batch method) –Overfitting
16
CSC 8520 Spring 2013. Paula Matuszek Decision Tree Architecture Knowledge Base: the decision tree itself. Performer: tree walker Critic: actual outcome in training case Learner: ID3 or its variants –This is an example of a large class of learners that need all of the examples at once in order to learn. Batch, not incremental. 16
17
CSC 8520 Spring 2013. Paula Matuszek 17 Summary: Decision tree learning One of most widely used learning methods in practice Can out-perform human experts in many problems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.