CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS 391L: Machine Learning: Decision Tree Learning
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Learning Chapter 18 and Parts of Chapter 20
Decision Tree Learning
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
For Monday No reading Homework: –Chapter 18, exercises 1 and 2.
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
1 Inductive Classification Based on the ML lecture by Raymond J. Mooney University of Texas at Austin.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
CS Machine Learning 17 Jan CS 391L: Machine Learning: Decision Tree Learning Raymond J. Mooney University of Texas at Austin.
For Wednesday Read chapter 18, section 7 Homework: –Chapter 18, exercise 11.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Decision Tree Learning
Decision Trees.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 CS 343: Artificial Intelligence Machine Learning Raymond J. Mooney University of Texas at Austin.
CS 2750: Machine Learning Ensembles; bagging and boosting; decision trees Prof. Adriana Kovashka University of Pittsburgh April 11, 2016.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
For Friday No reading Homework: Program 3 due
Artificial Intelligence
Machine Learning: Decision Tree Learning
Data Science Algorithms: The Basic Methods
Learning with Identification Trees
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Learning Chapter 18 and Parts of Chapter 20
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Presentation transcript:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: Lecture 4: Decision Trees, continued 1CSC M.A. Papalaskari - Villanova University

2 Tiny Example of Decision Tree Learning Instance attributes: C = {positive, negative} D: ExampleSizeColorShapeCategory 1smallredcirclepositive 2largeredcirclepositive 3smallredtrianglenegative 4largebluecirclenegative CSC M.A. Papalaskari - Villanova University color red blue green shape circle square triangle neg pos neg

3 Decision Trees Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. Can represent arbitrary conjunction and disjunction. Can represent any classification function over discrete feature vectors. Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). – red  circle → pos – red  circle → A blue → B; red  square → B green → C; red  triangle → C color red blue green shape circle square triangle neg pos neg color red blue green shape circle square triangle B C A B C CSC M.A. Papalaskari - Villanova University

4 Top-Down Decision Tree Creation : + : + :  :  color red blue green : + :  CSC M.A. Papalaskari - Villanova University

5 shape circle square triangle Top-Down Decision Tree Creation : + : + :  :  : + :  color red blue green : + pos :  neg pos :  neg CSC M.A. Papalaskari - Villanova University

6 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: negative: Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 CSC M.A. Papalaskari - Villanova University

7 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: negative: Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 CSC M.A. Papalaskari - Villanova University

8 First step in detail Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 color red blue green Color=red Value/Examples Count Probability positive: negative: Color=blue Value/Examples Count Probability positive: none00 negative:11 Color=green Value/Examples Count Probability positive: none00 negative: none00 Entropy=0.92 Entropy=1 Entropy=0 Gain(Color) = Entropy(start) – (Entropy(red)*(3/4) – Entropy(blue)*(1/4) = 1 – 0.69 – 0 = 0.31 Entropy=0 CSC M.A. Papalaskari - Villanova University

9 First step in detail- Split on Size? Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 CSC M.A. Papalaskari - Villanova University

10 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 CSC M.A. Papalaskari - Villanova University

11 First step in detail- Split on Size Start ValueExamplesCount Probability positive: 2.5 negative: 2.5 size BIG small Size=BIG Value/Examples Count Probability positive: 1.5 negative: 1.5 Size = small Value/Examples Count Probability positive: 1.5 negative: 1.5 Entropy=1 Gain(Size) = Entropy(start) – (Entropy(BIG)*(2/4) – Entropy(small)*(2/4) = 1 – 1/2 – 1/2 = 0 Entropy=1 CSC M.A. Papalaskari - Villanova University

12 shape circle square triangle Top-Down Decision Tree Induction Build a tree top-down by divide and conquer. : + : + :  :  : + :  color red blue green : + pos :  neg pos :  neg CSC M.A. Papalaskari - Villanova University

Another view (Aispace) CSC M.A. Papalaskari - Villanova University13

14 Picking a Good Split Feature Goal is to have the resulting tree be as small as possible, per Occam’s razor. Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. – General lesson in ML: “Greed is good.” Want to pick a feature that creates subsets of examples that are relatively “pure” in a single class so they are “closer” to being leaf nodes. There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). CSC M.A. Papalaskari - Villanova University

15 Entropy Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. If all examples are in one category, entropy is zero (we define 0  log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. For multi-class problems with c categories, entropy generalizes to: CSC M.A. Papalaskari - Villanova University

16 Entropy Plot for Binary Classification CSC M.A. Papalaskari - Villanova University

17 Information Gain The information gain of a feature F is the expected reduction in entropy resulting from splitting on this feature. where S v is the subset of S having value v for feature F. Entropy of each resulting subset weighted by its relative size. Example summary – : + : + – :  :  2+, 2  : E=1 size big small 1+,1  E=1 Gain=1  (0.5   1) = 0 2+, 2  : E=1 color red blue 2+,1  0+,1  E=0.918 E=0 Gain=1  (0.75   0) = , 2  : E=1 shape circle square 2+,1  0+,1  E=0.918 E=0 Gain=1  (0.75   0) = CSC M.A. Papalaskari - Villanova University

18 Bias in Decision-Tree Induction Information-gain gives a bias for trees with minimal depth. Recall: Bias – Any criteria other than consistency with the training data that is used to select a hypothesis. CSC M.A. Papalaskari - Villanova University

19 History of Decision-Tree Research Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. In the 1980’s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. Quinlan’s updated decision-tree package (C4.5) released in Weka includes Java version of C4.5 called J48. CSC M.A. Papalaskari - Villanova University

20 Additional Decision Tree Issues Better splitting criteria – Information gain prefers features with many values. Continuous features Predicting a real-valued function (regression trees) Missing feature values Features with costs Misclassification costs Incremental learning – ID4 – ID5 Mining large databases that do not fit in main memory CSC M.A. Papalaskari - Villanova University

Class Exercise Practice using decision tree learning on some of the sample datasets available in AISpace 21CSC M.A. Papalaskari - Villanova University Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course

Resources: Datasets UCI Repository: UCI KDD Archive: Statlib: Delve: CSC M.A. Papalaskari - Villanova University