Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Similar presentations


Presentation on theme: "CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/"— Presentation transcript:

1 CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ Lecture 4: Decision Trees, continued 1CSC 4510 - M.A. Papalaskari - Villanova University

2 2 Tiny Example of Decision Tree Learning Instance attributes: C = {positive, negative} D: ExampleSizeColorShapeCategory 1smallredcirclepositive 2largeredcirclepositive 3smallredtrianglenegative 4largebluecirclenegative CSC 4510 - M.A. Papalaskari - Villanova University color red blue green shape circle square triangle neg pos neg

3 3 Decision Trees Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. Can represent arbitrary conjunction and disjunction. Can represent any classification function over discrete feature vectors. Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). – red  circle → pos – red  circle → A blue → B; red  square → B green → C; red  triangle → C color red blue green shape circle square triangle neg pos neg color red blue green shape circle square triangle B C A B C CSC 4510 - M.A. Papalaskari - Villanova University

4 4 Top-Down Decision Tree Creation : + : + :  :  color red blue green : + :  CSC 4510 - M.A. Papalaskari - Villanova University

5 5 shape circle square triangle Top-Down Decision Tree Creation : + : + :  :  : + :  color red blue green : + pos :  neg pos :  neg CSC 4510 - M.A. Papalaskari - Villanova University

6 6 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 CSC 4510 - M.A. Papalaskari - Villanova University

7 7 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 CSC 4510 - M.A. Papalaskari - Villanova University

8 8 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: 2.667 negative: 1.333 Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 Gain(Color) = Entropy(start) – (Entropy(red)*(3/4) – Entropy(blue)*(1/4) = 1 – 0.69 – 0 = 0.31 Entropy=0 CSC 4510 - M.A. Papalaskari - Villanova University

9 9 First step in detail- Split on Size? Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 CSC 4510 - M.A. Papalaskari - Villanova University

10 10 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 CSC 4510 - M.A. Papalaskari - Villanova University

11 11 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 Gain(Size) = Entropy(start) – (Entropy(BIG)*(2/4) – Entropy(small)*(2/4) = 1 – 1/2 – 1/2 = 0 Entropy=1 CSC 4510 - M.A. Papalaskari - Villanova University

12 12 shape circle square triangle Top-Down Decision Tree Induction Build a tree top-down by divide and conquer. : + : + :  :  : + :  color red blue green : + pos :  neg pos :  neg CSC 4510 - M.A. Papalaskari - Villanova University

13 Another view (Aispace) CSC 4510 - M.A. Papalaskari - Villanova University13

14 14 Picking a Good Split Feature Goal is to have the resulting tree be as small as possible, per Occam’s razor. Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. – General lesson in ML: “Greed is good.” Want to pick a feature that creates subsets of examples that are relatively “pure” in a single class so they are “closer” to being leaf nodes. There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). CSC 4510 - M.A. Papalaskari - Villanova University

15 15 Entropy Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. If all examples are in one category, entropy is zero (we define 0  log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. For multi-class problems with c categories, entropy generalizes to: CSC 4510 - M.A. Papalaskari - Villanova University

16 16 Entropy Plot for Binary Classification CSC 4510 - M.A. Papalaskari - Villanova University

17 17 Information Gain The information gain of a feature F is the expected reduction in entropy resulting from splitting on this feature. where S v is the subset of S having value v for feature F. Entropy of each resulting subset weighted by its relative size. Example summary – : + : + – :  :  2+, 2  : E=1 size big small 1+,1  E=1 Gain=1  (0.5  1 + 0.5  1) = 0 2+, 2  : E=1 color red blue 2+,1  0+,1  E=0.918 E=0 Gain=1  (0.75  0.918 + 0.25  0) = 0.311 2+, 2  : E=1 shape circle square 2+,1  0+,1  E=0.918 E=0 Gain=1  (0.75  0.918 + 0.25  0) = 0.311 CSC 4510 - M.A. Papalaskari - Villanova University

18 18 Bias in Decision-Tree Induction Information-gain gives a bias for trees with minimal depth. Recall: Bias – Any criteria other than consistency with the training data that is used to select a hypothesis. CSC 4510 - M.A. Papalaskari - Villanova University

19 19 History of Decision-Tree Research Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. In the 1980’s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. Quinlan’s updated decision-tree package (C4.5) released in 1993. Weka includes Java version of C4.5 called J48. CSC 4510 - M.A. Papalaskari - Villanova University

20 20 Additional Decision Tree Issues Better splitting criteria – Information gain prefers features with many values. Continuous features Predicting a real-valued function (regression trees) Missing feature values Features with costs Misclassification costs Incremental learning – ID4 – ID5 Mining large databases that do not fit in main memory CSC 4510 - M.A. Papalaskari - Villanova University

21 Class Exercise Practice using decision tree learning on some of the sample datasets available in AISpace 21CSC 4510 - M.A. Papalaskari - Villanova University Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/http://www.cs.manchester.ac.uk/ugt/COMP24111/ The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/

22 Resources: Datasets UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html http://www.ics.uci.edu/~mlearn/MLRepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html http://kdd.ics.uci.edu/summary.data.application.html Statlib: http://lib.stat.cmu.edu/ http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ http://www.cs.utoronto.ca/~delve/ 22CSC 4510 - M.A. Papalaskari - Villanova University


Download ppt "CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/"

Similar presentations


Ads by Google