Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day Any questions? Announcements:  First homework assigned Machine Learning process overview Learn how to use weka Introduce assignment Introduction to Cross-Validation

Overview of Machine Learning Process Skills

Naïve Approach: When all you have is a hammer… Target Representation Data

Naïve Approach: When all you have is a hammer… Target Representation Problem: there isn’t one universally best approach!!!!! Data

Slightly less naïve approach: Aimless wandering… Target Representation Data

Slightly less naïve approach: Aimless wandering… Target Representation Problem 1: It takes too long!!! Data

Slightly less naïve approach: Aimless wandering… Target Representation Problem 2: You might not realize all of the options that are available to you! Data

Expert Approach: Hypothesis driven Target Representation Data

Expert Approach: Hypothesis driven Target Representation You might end up with the same solution in the end, but you’ll get there faster. Data

Expert Approach: Hypothesis driven Target Representation Today we’ll start to learn how! Data

Warm Up Exercise

Every combination of feature values is represented. Warm Up Exercise

Every combination of feature values is represented. What will happen if you try to predict HairColor from the other features? Warm Up Exercise

If you don’t have good features, even the most powerful algorithm won’t be able to learn an accurate prediction rule.

Every combination of feature values is represented. What will happen if you try to predict HairColor from the other features? Warm Up Exercise If you don’t have good features, even the most powerful algorithm won’t be able to learn an accurate prediction rule. But that doesn’t mean this data set is a hopeless case! For example, maybe the people who like red and have brown hair like a different shade of red than the ones who have blond hair. So ask yourself: what information might be hidden or implicit that might allow me to learn a rule?

Getting a bit more sophisticated…

Example Data Set

We’re going to consider a new algorithm

Example Data Set We’re going to consider a new algorithm We’re also going to consider data representation issues

More Complex Algorithm… Two simple algorithms last time  0R – Predict the majority class  1R – Use the most predictive single feature Today – Intro to Decision Trees  Today we will stay at a high level  We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!

More Complex Algorithm… Two simple algorithms last time  0R – Predict the majority class  1R – Use the most predictive single feature Today – Intro to Decision Trees  Today we will stay at a high level  We’ll investigate more details of the algorithm next time * Only makes 2 mistakes! What will it do with this example?

Why is it better? Not because it is more complex  Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data?  0R  1R  Trees The best algorithm for your data will give you exactly the power you need

Why is it better? Not because it is more complex  Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data?  0R  1R  Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn?

Why is it better? Not because it is more complex  Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data?  0R  1R  Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn?

Why is it better? Not because it is more complex  Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data?  0R  1R  Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake.

Why is it better? Not because it is more complex  Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data?  0R  1R  Trees The best algorithm for your data will give you exactly the power you need

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set Who ran the opinion poll

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set When the poll was conducted

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set Who the Democratic candidate would be :

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set Who the Republican candidate would be :

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set Who is running against who

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set Which party will win

Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project  Better to find a larger data set This is what we want to predict

Do you see any redundant information?

Do you see any missing or hidden information?

How could you expand on what’s here? Add features that describe the source

How could you expand on what’s here? Add features that describe things that were going on during the time when the poll was taken

How could you expand on what’s here? Add features that describe personal characteristics of the candidates

What do you think would be the best rule?

What would Weka do with this data?

Using Weka Start Weka Open up the Explorer interface

Using Weka Click on Open File  Open OpinionPoll.csv from the Lectures folder You can save it as a.arff file

Using Weka Click on Open File  Open OpinionPoll.csv from the Lectures folder You can save it as a.arff file Summary stats for selected attributes are displayed

Using Weka Observe interaction between attributes by selecting on interface Select one attribute here Select another attribute here

Using Weka Observe interaction between attributes by selecting on interface Select one attribute here Select another attribute here Based on what you see, do you think the sources of the opinion polls were biased?

Using Weka Go to Classify Panel Select a classifier

Using Weka Select a classifier Select the predicted value

Using Weka Select a classifier Select the predicted value Start the evaluation

Using Weka Select a classifier Select the predicted value Start the evaluation Observe the results

Looking at the Results Percent correct Percent correct, controlling for correct by chance Performance on individual categories Confusion matrix * Right click in Result list and select Save Result Buffer to save performance stats.

Notice the shape of the tree (although the text is too small to read!)

It’s making its decision based only on who the Republican candidate is.

Why did it do that?

Where will it make mistakes?

Notice the more complex rule if we force binary splits … Note that the more complex rule performs worse!!!

More representation issues… “Gyre” by Eric Rosé

Low resolution image gives some information

Higher resolution image gives more information

But not if the accuracy is bad

Question: When might that happen?

Low resolution gives more information if the accuracy is higher

Assignment 1

Make sure Weka is set up properly on your machine Know the basics of using Weka Information about you…

Information about You Learning goals Priority on learning activities Project goals Programming competence

Cross-Validation

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Performance on training data?

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Performance on training data? Performance on testing data?

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes IMPORTANT! If you evaluate the performance of your rule on the same data you trained on, you won’t get an accurate estimate of how well it will do on new data.

What is cross validation?

Notice that Cross validation is for testing only! Not for building the rule!

But then….. If we are satisfied with the performance estimate we get Then we build the model with the WHOLE SET Now let’s see how it works…

But then….. If we are satisfied with the performance estimate we get Then we build the model with the WHOLE SET Now let’s see how it works… If you are not satisfied with the performance you get, then you should try to determine what went wrong, and then evaluate a different model that compensates.

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 2, 3, 4, 5, 6,7 and apply trained model to 1 The results is Accuracy1 1 2 3 4 5 6 7 TEST TRAIN Fold: 1

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 3, 4, 5, 6,7 and apply trained model to 2 The results is Accuracy2 1 2 3 4 5 6 7 TRAIN TEST Fold: 2

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 4, 5, 6,7 and apply trained model to 3 The results is Accuracy3 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 3

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1,2, 3, 5, 6,7 and apply trained model to 4 The results is Accuracy4 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 4

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 6,7 and apply trained model to 5 The results is Accuracy5 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 5

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 5, 7 and apply trained model to 6 The results is Accuracy6 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 6

Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 5, 6 and apply trained model to 7 The results is Accuracy7 Finally: Average Accuracy1 through Accuracy7 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 7

Remember! If we are satisfied with the performance estimate we get using cross-validation Then we build the model with the WHOLE SET We don’t use cross-validation to build the model

Why do we do cross validation? Use cross-validation when you do not have enough data to have completely independent train and test sets We are trying to estimate what performance would you get if you trained over your whole set and applied that model to an independent set of the same size We compute that estimate by averaging over folds

Do we have to do all of the folds? Yes! The test set on each fold is too small to give you an accurate estimate of performance alone Variation across folds Evaluation over part of the data is likely to be misleading

Why do we do cross validation? Makes the most of your data – large portion used for training Avoids testing on training data  Testing on training data will over estimate your performance!!! But if you do multiple iterations of cross- validation, in some ways you are using insights from your testing data in building your model

Questions about cross-validation from in-person students… How do you decide how many folds? How is data divided between folds? Don’t you need to have a hold-out set to be totally sure you have a good estimate of performance?

Other questions from in-person students… Do our class projects have to be classification problems per se?  Clustering of pen stroke data Will we learn to work with time series data in this course?

Questions?

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations

Presentation on theme: "Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations

Presentation on theme: "Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

Similar presentations

About project

Feedback