Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.

Slides:

Advertisements

Similar presentations

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5

Chapter 7 – Classification and Regression Trees

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

CS 391L: Machine Learning: Decision Tree Learning

Decision Tree Rong Jin. Determine Milage Per Gallon.

Decision Tree Algorithm

Ensemble Learning: An Introduction

Induction of Decision Trees

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.

Classification.

Ensemble Learning (2), Tree and Forest

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.

Mohammad Ali Keyvanrad

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.

Chapter 9 – Classification and Regression Trees

CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.

Today’s Topics FREE Code that will Write Your PhD Thesis, a Best-Selling Novel, or Your Next Methods for Intelligently/Efficiently Searching a Space.

Learning from Observations Chapter 18 Through

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

For Wednesday No reading Homework: –Chapter 18, exercise 6.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.

Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.

Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.

Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.

Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.

Decision Tree Learning

Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Decision Tree Pruning problem of overfit approaches

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #12, Slide 1 5-Slide Example: Gene Chip Data.

BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Decision Trees: Another Example

DECISION TREES An internal node represents a test on an attribute.

CS Fall 2016 (Shavlik©), Lecture 11, Week 6

CS Fall 2016 (© Jude Shavlik), Lecture 4

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Introduction to Data Mining, 2nd Edition by

CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

Machine Learning Chapter 3. Decision Tree Learning

CS 4700: Foundations of Artificial Intelligence

CS Fall 2016 (© Jude Shavlik), Lecture 3

Machine Learning: Lecture 3

CS Fall 2016 (Shavlik©), Lecture 2

Machine Learning Chapter 3. Decision Tree Learning

Presentation transcript:

Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees Rule Pruning 9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 31

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Noise: Major Issue in ML Worst Case of Noise +, - at same point in feature space Causes of Noise 1. Too few features (“hidden variables”) or too few possible values 2. Incorrectly reported/measured/judged feature values 3. Mis-classified instances 9/22/152

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Noise - Major Issue in ML (cont.) Overfitting Producing an ‘awkward’ concept because of a few ‘noisy’ points Bad performance on future ex’s?Better performance? 9/22/153

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Overfitting Viewed in Terms of Function-Fitting (can exactly fit N points with an N-1 degree polynomial) Data = Red Line + Noise Model f(x) x Underfitting? Overfitting? 9/22/154

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Definition of Overfitting Assuming large enough test set so that it is representative, concept C overfit the training data if there exists a simpler concept S so that but > < Training set accuracy of C Training set accuracy of S Test set accuracy of C Test set accuracy of S 9/22/155

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Remember! It is easy to learn/fit the training data What’s hard is generalizing well to future (‘test set’) data! Overfitting avoidance (reduction, really) is the key issue in ML Easy to think ‘spurious correlations’ are meaningful signals 9/22/156

See a Pattern? 9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Lecture 1, Slide 7 The first 10 digits of Pi: What comes next in Pi? 3 (already used) After that? 5 “35” rounds to “4” (in fractional part of number) “4” has since been added! Picture taken (by me) June 2015 in Lambeau Field Atrium, Green Bay, WI Presumably a ‘spurious correlation’

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Can One Underfit? Sure, if not fully fitting the training set Eg, just return majority category (+ or -) in the trainset as the learned model But also if not enough data to illustrate important distinctions Eg, color may be important, but all examples seen are red, so no reason to include color and make more complex model 9/22/158

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Overfitting + Noise Using the strict definition of overfitting presented earlier, is it possible to overfit noise-free data? (Remember: overfitting the key ML issue, not just a decision-tree topic) 9/22/159

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Example of Overfitting Noise-Free Data Let –Correct concept = A  B –Feature C be true 50% of the time, for both + and – examples –Prob(pos example) = 0.66 –Training set +: A B C D E, A B C ¬D E, A B C D ¬E -: A ¬B ¬C D ¬E, ¬A B ¬C ¬D E 9/22/1510

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Example (concluded) Tree Trainset Accuracy TestSet Accuracy 100%50% Pruned 60%66% C + - FT + 9/22/1511

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 ID3 & Noisy Data To avoid overfitting, could allow splitting to stop before all ex’s are of one class –Early stopping was Quinlan’s original idea Stop if further splitting not justified by a statistical test (just skim text’s material on the  2 test) –But post-pruning now seen as better More robust to weaknesses of greedy algo’s (eg, post-pruning benefits from seeing the full tree; a node may look bad when building tree, but not in hindsight) 9/22/1512

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 ID3 & Noisy Data (cont.) Recap: Build complete tree, then use some ‘spare’ (tuning) examples to decide which parts of tree can be pruned - called Reduced [tuneset] Error Pruning 9/22/1513

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 ID3 & Noisy Data (cont.) See which dropped subtree leads to highest tune-set accuracy Repeat (ie, another greedy algo) Better tuneset accuracy? discard? 9/22/1514

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Greedily Pruning D-Trees Sample (Hill Climbing) Search Space best Stop here if node’s best child is not an improvement 9/15/15 Note in pruning we’re reversing the tree- building process 15

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Greedily Pruning D-trees - Pseudocode 1.Run ID3 to fully fit TRAIN’ Set, measure accuracy on TUNE 2.Consider all subtrees where ONE interior node removed and replaced by leaf - label with majority category in pruned subtree IF progress on TUNE choose best subtree ELSE (ie, if no improvement) quit 3.Go to 2 + 9/22/1516

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Train/Tune/Test Accuracies (same sort of curves for other tuned param’s in other algo’s) 100% Accuracy Tune Test Train Ideal tree to choose Chosen pruned tree Amount of Pruning 9/22/1517

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 The General Tradeoff in Greedy Algorithms (more later) Efficiency vs. Optimality R AB C D F E Initial Tree Assume True Best Cuts Discard C’s & F’s subtrees Single Best Cut Discard B’s subtrees - irrevocable Greedy Search: Powerful, General- Purpose, Trick–of-the-Trade 9/22/1518

9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Generating IF-THEN Rules from Trees Antecedent: Conjunction of all decisions leading to terminal node Consequent: Label of terminal node Red COLOR ? SIZE ? Blue Big Small Green - 19

9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Generating Rules (cont) Previous slide’s tree generates these rules If Color=Green  Output = - If Color=Blue  Output = + If Color=Red and Size=Big  + If Color=Red and Size=Small  - Note 1. Can ‘clean up’ the rule set (next slide) 2. Decision trees learn disjunctive concepts 20

9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Rule Post-Pruning (Another Greedy Algorithm) 1.Induce a decision tree 2.Convert to rules (see earlier slide) 3.Consider dropping any one rule antecedent –Delete the one that improves tuning set accuracy the most –Repeat as long as progress being made 21

9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Rule Post-Pruning (cont) Advantages –Allows an intermediate node to be pruned from some rules but retained in others –Can correct poor early decisions in tree construction –Final concept more understandable Also applicable to ML algo’s that directly learn rules (eg, ILP, MLNs) 22 But note that the final rules will overlap one another – so need a ‘conflict resolution’ scheme

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Training with Noisy Data If we can clean up the training data, should we do so? –No (assuming one can’t clean up the testing data when the learned concept will be used) –Better to train with the same type of data as will be experienced when the result of learning is put into use –Recall hadBankcruptcy was best indicator of “good candidate for credit card” story! 9/22/1523

CS Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Aside: A Rose by Any Other Name … Tuning sets also called –Pruning sets (in d-tree algorithms) –Validation sets (in general), but sometimes in the literature (eg, stats community) AI’s test sets called validation (and AI’s tuning sets called test sets!) 9/22/1524