INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Regularization David Kauchak CS 451 – Fall 2013.
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
DECISION TREES David Kauchak CS 451 – Fall Admin Assignment 1 available and is due on Friday (printed out at the beginning of class) Door code for.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Ensemble Learning: An Introduction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Decision Trees Chapter 18 From Data to Knowledge.
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
FEATURES David Kauchak CS 451 – Fall Admin Assignment 2  This class will make you a better programmer!  How did it go?  How much time did you.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Machine Learning CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
tch?v=Y6ljFaKRTrI Fireflies.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Scaling up Decision Trees. Decision tree learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
MULTICLASS David Kauchak CS 451 – Fall Admin Assignment 4 Assignment 2 back soon If you need assignment feedback… CS Lunch tomorrow (Thursday):
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
1 Illustration of the Classification Task: Learning Algorithm Model.
Classification and Regression Trees
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Introduction to Data Mining, 2nd Edition by
Lecture 05: Decision Trees
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Decision Trees Jeff Storey.
Data Mining CSCI 307, Spring 2019 Lecture 15
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013

Admin Assignment 1… how’d it go? Assignment 2 - out soon - building decision trees - Java with some starter code - competition - extra credit

Building decision trees Base case: If all data belong to the same class, create a leaf node with that label Otherwise: - calculate the “score” for each feature if we used it to split the data - pick the feature with the highest score, partition the data based on that data value and call recursively

Snowy Partitioning the data TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailMountainSunnyYES RoadMountainRainyYES TrailNormalSnowyNO RoadNormalRainyYES RoadMountainSnowyYES TrailNormalSunnyNO RoadNormalSnowyNO TrailMountainSnowyYES Terrain Road Trail YES: 4 NO: 1 YES: 2 NO: 3 Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 Weather Rainy Sunny YES: 2 NO: 1 YES: 2 NO: 1 YES: 2 NO: 2

Decision trees Terrain Road Trail YES: 4 NO: 1 YES: 2 NO: 3 Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 Snowy Weather Rainy Sunny YES: 2 NO: 1 YES: 2 NO: 1 YES: 2 NO: 2 Training error: the average error over the training set 3/102/104/10

Training error vs. accuracy Terrain Road Trail YES: 4 NO: 1 YES: 2 NO: 3 Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 Snowy Weather Rainy Sunny YES: 2 NO: 1 YES: 2 NO: 1 YES: 2 NO: 2 Training error: the average error over the training set 3/102/104/10 Training accuracy: the average percent correct over the training set Training error: Training accuracy: 7/108/106/10 training error = 1-accuracy (and vice versa)

Recurse Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailNormalSnowyNO RoadNormalRainyYES TrailNormalSunnyNO RoadNormalSnowyNO TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainSunnyYES RoadMountainRainyYES RoadMountainSnowyYES TrailMountainSnowyYES

Recurse Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailNormalSnowyNO RoadNormalRainyYES TrailNormalSunnyNO RoadNormalSnowyNO Snowy Terrain Road Trail YES: 2 NO: 1 YES: 0 NO: 3 Weather Rainy Sunny YES: 1 NO: 1 YES: 1 NO: 1 YES: 0 NO: 2

Recurse Unicycle Mountain Normal YES: 4 NO: 0 YES: 2 NO: 4 TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailNormalSnowyNO RoadNormalRainyYES TrailNormalSunnyNO RoadNormalSnowyNO Snowy Terrain Road Trail YES: 2 NO: 1 YES: 0 NO: 3 Weather Rainy Sunny YES: 1 NO: 1 YES: 1 NO: 1 YES: 0 NO: 2 1/6 2/6

Recurse Unicycle Mountain Normal YES: 4 NO: 0 TerrainUnicycle- type WeatherGo-For- Ride? RoadNormalSunnyYES RoadNormalRainyYES RoadNormalSnowyNO Terrain Road Trail YES: 2 NO: 1 YES: 0 NO: 3

Recurse Unicycle Mountain Normal YES: 4 NO: 0 Terrain Road Trail YES: 0 NO: 3 Snowy Weather Rainy Sunny YES: 1 NO: 0 YES: 1 NO: 0 YES: 0 NO: 1

Recurse Unicycle Mountain Normal YES: 4 NO: 0 Terrain Road Trail YES: 0 NO: 3 Snowy Weather Rainy Sunny YES: 1 NO: 0 YES: 1 NO: 0 YES: 0 NO: 1 TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailMountainSunnyYES RoadMountainRainyYES TrailNormalSnowyNO RoadNormalRainyYES RoadMountainSnowyYES TrailNormalSunnyNO RoadNormalSnowyNO TrailMountainSnowyYES Training error? Are we always guaranteed to get a training error of 0?

Problematic data TerrainUnicycle- type WeatherGo-For- Ride? TrailNormalRainyNO RoadNormalSunnyYES TrailMountainSunnyYES RoadMountainSnowyNO TrailNormalSnowyNO RoadNormalRainyYES RoadMountainSnowyYES TrailNormalSunnyNO RoadNormalSnowyNO TrailMountainSnowyYES When can this happen?

Recursive approach Base case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values Do we always want to go all the way to the bottom?

What would the tree look like for… TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainRainyYES TrailMountainSunnyYES RoadMountainSnowyYES RoadMountainSunnyYES TrailNormalSnowyNO TrailNormalRainyNO RoadNormalSnowyYES RoadNormalSunnyNO TrailNormalSunnyNO

What would the tree look like for… TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainRainyYES TrailMountainSunnyYES RoadMountainSnowyYES RoadMountainSunnyYES TrailNormalSnowyNO TrailNormalRainyNO RoadNormalSnowyYES RoadNormalSunnyNO TrailNormalSunnyNO Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Is that what you would do?

What would the tree look like for… TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainRainyYES TrailMountainSunnyYES RoadMountainSnowyYES RoadMountainSunnyYES TrailNormalSnowyNO TrailNormalRainyNO RoadNormalSnowyYES RoadNormalSunnyNO TrailNormalSunnyNO Unicycle Mountain Normal YESNO Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Maybe…

What would the tree look like for… TerrainUnicycle-typeWeatherJacketML gradeGo-For-Ride? TrailMountainRainyHeavyDYES TrailMountainSunnyLightC-YES RoadMountainSnowyLightBYES RoadMountainSunnyHeavyAYES …Mountain………YES TrailNormalSnowyLightD+NO TrailNormalRainyHeavyB-NO RoadNormalSnowyHeavyC+YES RoadNormalSunnyLightA-NO TrailNormalSunnyHeavyB+NO TrailNormalSnowyLightFNO …Normal………NO TrailNormalRainyLightCYES

Overfitting TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainRainyYES TrailMountainSunnyYES RoadMountainSnowyYES RoadMountainSunnyYES TrailNormalSnowyNO TrailNormalRainyNO RoadNormalSnowyYES RoadNormalSunnyNO TrailNormalSunnyNO Unicycle Mountain Normal YES Overfitting occurs when we bias our model too much towards the training data Our goal is to learn a general model that will work on the training data as well as other data (i.e. test data) NO

Overfitting Our decision tree learning procedure always decreases training error Is that what we want?

Test set error! Machine learning is about predicting the future based on the past. -- Hal Daume III Training Data learn model/ predictor past predict model/ predictor future Testing Data

Overfitting Even though the training error is decreasing, the testing error can go up!

Overfitting TerrainUnicycle- type WeatherGo-For- Ride? TrailMountainRainyYES TrailMountainSunnyYES RoadMountainSnowyYES RoadMountainSunnyYES TrailNormalSnowyNO TrailNormalRainyNO RoadNormalSnowyYES RoadNormalSunnyNO TrailNormalSunnyNO Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES How do we prevent overfitting?

Preventing overfitting Base case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values OR - We’ve reached a particular depth in the tree - ? One idea: stop building the tree early

Preventing overfitting Base case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values OR - We’ve reached a particular depth in the tree - We only have a certain number/fraction of examples remaining - We’ve reached a particular training error - Use development data (more on this later) - …

Preventing overfitting: pruning Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Pruning: after the tree is built, go back and “prune” the tree, i.e. remove some lower parts of the tree Similar to stopping early, but done after the entire tree is built

Preventing overfitting: pruning Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Build the full tree

Preventing overfitting: pruning Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Build the full tree Unicycle Mountain Normal YESNO Prune back leaves that are too specific

Preventing overfitting: pruning Unicycle Mountain Normal YES Terrain Road Trail NO Snowy Weather Rainy Sunny NO YES Unicycle Mountain Normal YESNO Pruning criterion?

Handling non-binary attributes What do we do with features that have multiple values? Real-values?

Features with multiple values Snowy Weather Rainy Sunny NO YES Treat as an n-ary splitTreat as multiple binary splits Rainy? Rainy not Rainy NO YES Snowy? SnowySunny

Real-valued features Fare < $20 Yes No Use any comparison test (>, <, ≤, ≥) to split the data into two parts Select a range filter, i.e. min < value < max Fare >50

Other splitting criterion Otherwise: - calculate the “score” for each feature if we used it to split the data - pick the feature with the highest score, partition the data based on that data value and call recursively We used training error for the score. Any other ideas?

Other splitting criterion - Entropy: how much uncertainty there is in the distribution over labels after the split - Gini: sum of the square of the label proportions after split - Training error = misclassification error

Decision trees Good? Bad?

Decision trees: the good Very intuitive and easy to interpret Fast to run and fairly easy to implement (Assignment 2 ) Historically, perform fairly well (especially with a few more tricks we’ll see later on) No prior assumptions about the data

Decision trees: the bad Be careful with features with lots of values IDTerrainUnicycle -type WeatherGo-For- Ride? 1TrailNormalRainyNO 2RoadNormalSunnyYES 3TrailMountainSunnyYES 4RoadMountainRainyYES 5TrailNormalSnowyNO 6RoadNormalRainyYES 7RoadMountainSnowyYES 8TrailNormalSunnyNO 9RoadNormalSnowyNO 10TrailMountainSnowyYES Which feature would be at the top here?

Decision trees: the bad Can be problematic (slow, bad performance) with large numbers of features Can’t learn some very simple data sets (e.g. some types of linearly separable data) Pruning/tuning can be tricky to get right

Final DT algorithm Base cases: 1. If all data belong to the same class, pick that label 2. If all the data have the same feature values, pick majority label 3. If we’re out of features to examine, pick majority label 4. If the we don’t have any data left, pick majority label of parent 5. If some other stopping criteria exists to avoid overfitting, pick majority label Otherwise: - calculate the “score” for each feature if we used it to split the data - pick the feature with the highest score, partition the data based on that data value and call recursively