Download presentation
Presentation is loading. Please wait.
Published bySylvia Golden Modified over 8 years ago
1
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
2
Plan for the Day Any questions? Announcements: First homework assigned Machine Learning process overview Learn how to use weka Introduce assignment Introduction to Cross-Validation
3
Overview of Machine Learning Process Skills
4
Naïve Approach: When all you have is a hammer… Target Representation Data
5
Naïve Approach: When all you have is a hammer… Target Representation Problem: there isn’t one universally best approach!!!!! Data
6
Slightly less naïve approach: Aimless wandering… Target Representation Data
7
Slightly less naïve approach: Aimless wandering… Target Representation Problem 1: It takes too long!!! Data
8
Slightly less naïve approach: Aimless wandering… Target Representation Problem 2: You might not realize all of the options that are available to you! Data
9
Expert Approach: Hypothesis driven Target Representation Data
10
Expert Approach: Hypothesis driven Target Representation You might end up with the same solution in the end, but you’ll get there faster. Data
11
Expert Approach: Hypothesis driven Target Representation Today we’ll start to learn how! Data
12
Warm Up Exercise
13
Every combination of feature values is represented. Warm Up Exercise
14
Every combination of feature values is represented. What will happen if you try to predict HairColor from the other features? Warm Up Exercise
15
If you don’t have good features, even the most powerful algorithm won’t be able to learn an accurate prediction rule.
16
Every combination of feature values is represented. What will happen if you try to predict HairColor from the other features? Warm Up Exercise If you don’t have good features, even the most powerful algorithm won’t be able to learn an accurate prediction rule. But that doesn’t mean this data set is a hopeless case! For example, maybe the people who like red and have brown hair like a different shade of red than the ones who have blond hair. So ask yourself: what information might be hidden or implicit that might allow me to learn a rule?
17
Getting a bit more sophisticated…
18
Example Data Set
19
We’re going to consider a new algorithm
20
Example Data Set We’re going to consider a new algorithm We’re also going to consider data representation issues
21
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!
22
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes!
23
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes! What will it do with this example?
24
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes! What will it do with this example?
25
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes! What will it do with this example?
26
More Complex Algorithm… Two simple algorithms last time 0R – Predict the majority class 1R – Use the most predictive single feature Today – Intro to Decision Trees Today we will stay at a high level We’ll investigate more details of the algorithm next time * Only makes 2 mistakes! What will it do with this example?
27
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need
28
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn?
29
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn?
30
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn?
31
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn?
32
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake.
33
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake.
34
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake.
35
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need Let’s say you know the rule you are trying to learn is a circle and you have these points. What rule would you learn? Now lets say you don’t know the shape, now what would you learn? If you know the shape, you have fewer degrees of freedom – less room to make a mistake.
36
Why is it better? Not because it is more complex Sometimes more complexity makes performance worse What is different in what the three rule representations assume about your data? 0R 1R Trees The best algorithm for your data will give you exactly the power you need
37
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set
38
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set Who ran the opinion poll
39
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set When the poll was conducted
40
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set Who the Democratic candidate would be :
41
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set Who the Republican candidate would be :
42
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set Who is running against who
43
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set Which party will win
44
Back to the Opinion Poll Data Set From http://www.swivel.com/http://www.swivel.com/ Example of the kind of data set you could use for your course project Better to find a larger data set This is what we want to predict
45
Do you see any redundant information?
46
Do you see any missing or hidden information?
47
How could you expand on what’s here? Add features that describe the source
48
How could you expand on what’s here? Add features that describe things that were going on during the time when the poll was taken
49
How could you expand on what’s here? Add features that describe personal characteristics of the candidates
50
What do you think would be the best rule?
51
What would Weka do with this data?
52
Using Weka Start Weka Open up the Explorer interface
53
Using Weka Click on Open File Open OpinionPoll.csv from the Lectures folder You can save it as a.arff file
54
Using Weka Click on Open File Open OpinionPoll.csv from the Lectures folder You can save it as a.arff file Summary stats for selected attributes are displayed
55
Using Weka Observe interaction between attributes by selecting on interface Select one attribute here Select another attribute here
56
Using Weka Observe interaction between attributes by selecting on interface Select one attribute here Select another attribute here Based on what you see, do you think the sources of the opinion polls were biased?
57
Using Weka Go to Classify Panel Select a classifier
58
Using Weka Select a classifier Select the predicted value
59
Using Weka Select a classifier Select the predicted value Start the evaluation
60
Using Weka Select a classifier Select the predicted value Start the evaluation Observe the results
61
Looking at the Results Percent correct Percent correct, controlling for correct by chance Performance on individual categories Confusion matrix * Right click in Result list and select Save Result Buffer to save performance stats.
62
Looking at the Results Percent correct Percent correct, controlling for correct by chance Performance on individual categories Confusion matrix * Right click in Result list and select Save Result Buffer to save performance stats.
63
Notice the shape of the tree (although the text is too small to read!)
64
It’s making its decision based only on who the Republican candidate is.
65
Why did it do that?
66
Where will it make mistakes?
67
Notice the more complex rule if we force binary splits … Note that the more complex rule performs worse!!!
68
More representation issues… “Gyre” by Eric Rosé
69
Low resolution image gives some information
70
Higher resolution image gives more information
71
But not if the accuracy is bad
72
Question: When might that happen?
73
Low resolution gives more information if the accuracy is higher
74
Assignment 1
75
Make sure Weka is set up properly on your machine Know the basics of using Weka Information about you…
76
Information about You Learning goals Priority on learning activities Project goals Programming competence
77
Cross-Validation
80
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Performance on training data?
81
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Performance on training data? Performance on testing data?
82
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes IMPORTANT! If you evaluate the performance of your rule on the same data you trained on, you won’t get an accurate estimate of how well it will do on new data.
83
What is cross validation?
84
Notice that Cross validation is for testing only! Not for building the rule!
85
But then….. If we are satisfied with the performance estimate we get Then we build the model with the WHOLE SET Now let’s see how it works…
86
But then….. If we are satisfied with the performance estimate we get Then we build the model with the WHOLE SET Now let’s see how it works… If you are not satisfied with the performance you get, then you should try to determine what went wrong, and then evaluate a different model that compensates.
87
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 2, 3, 4, 5, 6,7 and apply trained model to 1 The results is Accuracy1 1 2 3 4 5 6 7 TEST TRAIN Fold: 1
88
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 3, 4, 5, 6,7 and apply trained model to 2 The results is Accuracy2 1 2 3 4 5 6 7 TRAIN TEST Fold: 2
89
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 4, 5, 6,7 and apply trained model to 3 The results is Accuracy3 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 3
90
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1,2, 3, 5, 6,7 and apply trained model to 4 The results is Accuracy4 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 4
91
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 6,7 and apply trained model to 5 The results is Accuracy5 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 5
92
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 5, 7 and apply trained model to 6 The results is Accuracy6 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 6
93
Simple Cross Validation Let’s say your data has attributes A, B, and C You want to train a rule to predict D First train on 1, 2, 3, 4, 5, 6 and apply trained model to 7 The results is Accuracy7 Finally: Average Accuracy1 through Accuracy7 1 2 3 4 5 6 7 TRAIN TEST TRAIN Fold: 7
94
Remember! If we are satisfied with the performance estimate we get using cross-validation Then we build the model with the WHOLE SET We don’t use cross-validation to build the model
95
Why do we do cross validation? Use cross-validation when you do not have enough data to have completely independent train and test sets We are trying to estimate what performance would you get if you trained over your whole set and applied that model to an independent set of the same size We compute that estimate by averaging over folds
96
Do we have to do all of the folds? Yes! The test set on each fold is too small to give you an accurate estimate of performance alone Variation across folds Evaluation over part of the data is likely to be misleading
97
Why do we do cross validation? Makes the most of your data – large portion used for training Avoids testing on training data Testing on training data will over estimate your performance!!! But if you do multiple iterations of cross- validation, in some ways you are using insights from your testing data in building your model
98
Questions about cross-validation from in-person students… How do you decide how many folds? How is data divided between folds? Don’t you need to have a hold-out set to be totally sure you have a good estimate of performance?
99
Other questions from in-person students… Do our class projects have to be classification problems per se? Clustering of pen stroke data Will we learn to work with time series data in this course?
100
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.