INTRODUCTION TO Machine Learning 2nd Edition

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
RIPPER Fast Effective Rule Induction
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Review - Decision Trees
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Decision Tree Learning
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Classification and Regression Trees
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Artificial Intelligence
Decision Trees (suggested time: 30 min)
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Data Mining Classification: Alternative Techniques
Decision Tree Saed Sayad 9/21/2018.
Data Mining Classification: Basic Concepts and Techniques
Introduction to Data Mining, 2nd Edition by
ECE 471/571 – Lecture 12 Decision Tree.
Introduction to Data Mining, 2nd Edition by
INTRODUCTION TO Machine Learning
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Data Mining – Chapter 3 Classification
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Dept. of Computer Science University of Liverpool
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
Machine Learning in Practice Lecture 17
Learning Chapter 18 and Parts of Chapter 20
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning
Presentation transcript:

INTRODUCTION TO Machine Learning 2nd Edition Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edition ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

CHAPTER 9: Decision Trees Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Decision Trees A decision tree An efficient nonparametric method A hierarchical model Divided-and–conquer strategy Supervised learning Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Tree Uses Nodes, and Leaves Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Divide and Conquer Numeric xi : Discrete xi : Internal decision nodes Univariate: Uses a single attribute, xi Numeric xi : Binary split : xi > wm Discrete xi : n-way split for n possible values Multivariate: Uses all attributes, x Leaves Classification: Class labels, or proportions Regression: Numeric; r average, or local fit Learning is greedy; find the best split recursively (Breiman et al, 1984; Quinlan, 1986, 1993) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification Trees For node m, Nm instances reach m, Nim belong to Ci Node m is pure if pim is 0 or 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Entropy Measure of impurity is entropy Entropy in information theory specifies the minimum number of bits needed to encode the classification accuracy of an instance. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Example of Entropy In a two-class problem If p1 = 1 and p2 = 0 all examples are of C1 we do not need to send anything  the entropy is 0 If p1 = p2 = 0.5 we need to send a bit to signal one of the two cases  the entropy is 1 In between these two extremes, we can devise codes and use less than a bit per message by having shorter codes for the more likely class and longer codes for the less likely. Entropy function for a two-class problem Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

The Properties of Measure Functions The properties of functions measuring the impurity of a split: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Examples Examples of 2-class measure functions are Entropy : Gini index : Misclassification error : Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Best Split If node m is pure, generate a leaf and stop, otherwise split and continue recursively Impurity after split: Nmj of Nm take branch j, Nimj belong to Ci Find the variable and split that min impurity (among all variables -- and split positions for numeric variables) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Feature type: Hu moments Internal nodes: 16 Leaf nodes: 17 Height: 8 yes no crying yawning laughing dazing vomiting The decision tree is constructed by the correlation coefficients of Hu moments. Each internal node includes a decision rule, and each leaf node includes one class of the facial expression. Here we only show one image of the sequences in each class. The height of this tree is 8. There are 16 internal nodes and 17 leaf nodes. Two leaf nodes may represent a same facial expression. For example, this node represents a turn-right dazing class and this node also represents a similar class. Feature type: Hu moments Internal nodes: 16 Leaf nodes: 17 Height: 8

Feature type: R moments Internal nodes: 15 Leaf nodes: 17 Height: 10 yes vomiting yawning dazing crying laughing The decision tree is constructed by the correlation coefficients of R moments. The height of this tree is 10. There are 15 internal nodes and 17 leaf nodes. Two leaf nodes may represent a same facial expression. For example, this node represents a dazing class and this node also represents a similar class.

Feature type: Zernike moments Internal nodes: 19 Leaf nodes: 20 Height: 7 no yes crying vomiting laughing dazing yawning The decision tree is constructed by the correlation coefficients of Zernike moments. The height of this tree is 7. There are 19 internal nodes and 20 leaf nodes.

Regression Trees Error at node m: After splitting: (the error should decrease) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Model Selection in Trees Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Pruning Trees Prepruning: Postpruning: Early stopping Remove subtrees for better generalization (decrease variance) Prepruning: Early stopping Postpruning: Grow the whole tree then prune subtrees which overfit on the pruning set Prepruning is faster, postpruning is more accurate (requires a separate pruning set) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Rule Extraction from Trees C4.5Rules (Quinlan, 1993) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Learning Rules Rule induction is similar to tree induction but tree induction is breadth-first, rule induction is depth-first; one rule at a time Rule set contains rules; rules are conjunctions of terms A rule covers an example if all terms of the rule evaluate to true for the example. A rule is said to cover an example if the example satisfies all the conditions of the rule. Sequential covering: Generate rules one at a time until all positive examples are covered Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Ripper Algorithm There are two kinds of loop in Ripper algorithm (Cohen, 1995): Outer loop : adding one rule at a time to the rule base Inner loop : adding one condition at a time to the current rule Conditions are added to the rule to maximize an information gain measure. Conditions are added to the rule until it covers no negative example. The pseudo-code of the outer loop of Ripper is given in Figure 9.7. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

O(Nlog2N) DL: description length of the rule base The description length of a rule base = (the sum of the description lengths of all the rules in the rule base) + (the description of the instances not covered by the rule base) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Ripper Algorithm In Ripper, conditions are added to the rule to Maximize an information gain measure R : the original rule R’ : the candidate rule after adding a condition N (N’): the number of instances that are covered by R (R’) N+ (N’+): the number of true positives in R (R’) s : the number of true positives in R and R’ (after adding the condition) Until it covers no negative example p and n : the number of true and false positives respectively. Rule value metric Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Trees > 0 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)