Download presentation
Presentation is loading. Please wait.
Published byBelinda Norris Modified over 9 years ago
1
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ Lecture 4: Decision Trees, continued 1CSC 4510 - M.A. Papalaskari - Villanova University
2
2 Tiny Example of Decision Tree Learning Instance attributes: C = {positive, negative} D: ExampleSizeColorShapeCategory 1smallredcirclepositive 2largeredcirclepositive 3smallredtrianglenegative 4largebluecirclenegative CSC 4510 - M.A. Papalaskari - Villanova University color red blue green shape circle square triangle neg pos neg
3
3 Decision Trees Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. Can represent arbitrary conjunction and disjunction. Can represent any classification function over discrete feature vectors. Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). – red circle → pos – red circle → A blue → B; red square → B green → C; red triangle → C color red blue green shape circle square triangle neg pos neg color red blue green shape circle square triangle B C A B C CSC 4510 - M.A. Papalaskari - Villanova University
4
4 Top-Down Decision Tree Creation : + : + : : color red blue green : + : CSC 4510 - M.A. Papalaskari - Villanova University
5
5 shape circle square triangle Top-Down Decision Tree Creation : + : + : : : + : color red blue green : + pos : neg pos : neg CSC 4510 - M.A. Papalaskari - Villanova University
6
6 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 CSC 4510 - M.A. Papalaskari - Villanova University
7
7 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 CSC 4510 - M.A. Papalaskari - Villanova University
8
8 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 Gain(Color) = Entropy(start) – (Entropy(red)*(3/4) – Entropy(blue)*(1/4) = 1 – 0.69 – 0 = 0.31 Entropy=0 CSC 4510 - M.A. Papalaskari - Villanova University
9
9 First step in detail- Split on Size? Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 CSC 4510 - M.A. Papalaskari - Villanova University
10
10 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 CSC 4510 - M.A. Papalaskari - Villanova University
11
11 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 Gain(Size) = Entropy(start) – (Entropy(BIG)*(2/4) – Entropy(small)*(2/4) = 1 – 1/2 – 1/2 = 0 Entropy=1 CSC 4510 - M.A. Papalaskari - Villanova University
12
12 shape circle square triangle Top-Down Decision Tree Induction Build a tree top-down by divide and conquer. : + : + : : : + : color red blue green : + pos : neg pos : neg CSC 4510 - M.A. Papalaskari - Villanova University
13
Another view (Aispace) CSC 4510 - M.A. Papalaskari - Villanova University13
14
14 Picking a Good Split Feature Goal is to have the resulting tree be as small as possible, per Occam’s razor. Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. – General lesson in ML: “Greed is good.” Want to pick a feature that creates subsets of examples that are relatively “pure” in a single class so they are “closer” to being leaf nodes. There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). CSC 4510 - M.A. Papalaskari - Villanova University
15
15 Entropy Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. For multi-class problems with c categories, entropy generalizes to: CSC 4510 - M.A. Papalaskari - Villanova University
16
16 Entropy Plot for Binary Classification CSC 4510 - M.A. Papalaskari - Villanova University
17
17 Information Gain The information gain of a feature F is the expected reduction in entropy resulting from splitting on this feature. where S v is the subset of S having value v for feature F. Entropy of each resulting subset weighted by its relative size. Example summary – : + : + – : : 2+, 2 : E=1 size big small 1+,1 E=1 Gain=1 (0.5 1 + 0.5 1) = 0 2+, 2 : E=1 color red blue 2+,1 0+,1 E=0.918 E=0 Gain=1 (0.75 0.918 + 0.25 0) = 0.311 2+, 2 : E=1 shape circle square 2+,1 0+,1 E=0.918 E=0 Gain=1 (0.75 0.918 + 0.25 0) = 0.311 CSC 4510 - M.A. Papalaskari - Villanova University
18
18 Bias in Decision-Tree Induction Information-gain gives a bias for trees with minimal depth. Recall: Bias – Any criteria other than consistency with the training data that is used to select a hypothesis. CSC 4510 - M.A. Papalaskari - Villanova University
19
19 History of Decision-Tree Research Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. In the 1980’s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. Quinlan’s updated decision-tree package (C4.5) released in 1993. Weka includes Java version of C4.5 called J48. CSC 4510 - M.A. Papalaskari - Villanova University
20
20 Additional Decision Tree Issues Better splitting criteria – Information gain prefers features with many values. Continuous features Predicting a real-valued function (regression trees) Missing feature values Features with costs Misclassification costs Incremental learning – ID4 – ID5 Mining large databases that do not fit in main memory CSC 4510 - M.A. Papalaskari - Villanova University
21
Class Exercise Practice using decision tree learning on some of the sample datasets available in AISpace 21CSC 4510 - M.A. Papalaskari - Villanova University Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/http://www.cs.manchester.ac.uk/ugt/COMP24111/ The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/
22
Resources: Datasets UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html http://www.ics.uci.edu/~mlearn/MLRepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html http://kdd.ics.uci.edu/summary.data.application.html Statlib: http://lib.stat.cmu.edu/ http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ http://www.cs.utoronto.ca/~delve/ 22CSC 4510 - M.A. Papalaskari - Villanova University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.