Download presentation
Presentation is loading. Please wait.
Published byJeffery Hamilton Modified over 8 years ago
1
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
2
Plan for the Day Announcements Questions? Assignment 4 Quiz Today’s Data Set: Speaker Identification Weka helpful hints Visualizing Errors for Regression Problems Alternative forms of cross-validation Creating Train/Test Pairs Intro to Evaluation
3
Assignment 3 Notes Everyone did well Please compare your solution to the answer key Note distinction between P(sunny|yes) vs. P(yes|sunny) One student didn’t include prior probabilities in likelihood computation
4
Speaker Identification
5
Today’s Data Set – Speaker Identification
6
Predictions? What previous data set does this remind you of?
7
Predictions? What previous data set does this remind you of? Results J48.53 Kappa SMO.37 Kappa Naïve Bayes.16 Kappa
8
Notice Ranges and Contingencies
9
Most Predictive Feature
10
Least Predictive Feature
11
What would 1R do?
12
.16 Kappa
13
Weka Helpful Hints
14
Evaluating Numeric Prediction: CPU data
15
Visualizing Classifier Errors for Numeric Prediction
16
Creating Train/Test Pairs First click here
17
Creating Train/Test Pairs If you pick unsupervised, you’ll get non-stratified folds, otherwise you’ll get stratified folds.
18
Stratified versus Non-Stratified Weka’s standard cross-validation is stratified Data is randomized before dividing it into folds Preserves distribution of class values across folds Reduces variance in performance Unstratified cross-validation means there is no randomization Order is preserved Advantage for matching predictions with instances in Weka
19
Stratified versus Non-Stratified Leave-one-out cross validation Train on all but one instance Iterate over all instances Extreme version of unstratified cross-validation If test set only has one instance, the distribution of class values cannot be preserved Maximizes amount of data used for training on each fold
20
Stratified versus Non-Stratified Leave-one-subpopulation out If you have several data points from the same subpopulation Speech data from the same speaker May have data from same subpopulation in train and test over-estimates overlap between train and test When is this not a problem? You can manually make sure that won’t happen You have to do that by hand
21
Creating Train/Test Pairs If you pick unsupervised, you’ll get non-stratified folds, otherwise you’ll get stratified folds.
22
Creating Train/Test Pairs Now click here
23
Creating Train/Test Pairs
24
You’re going to run this filter 20 times altogether. twice for every fold.
25
Creating Train/Test Pairs True for Train, false for Test
26
Creating Train/Test Pairs If you’re doing Stratified, make sure you have to class attribute selected here.
27
Creating Train/Test Pairs 1. Click Apply
28
Creating Train/Test Pairs 2. Save the file
29
Creating Train/Test Pairs 3. Undo before you create the next file
30
Doing Manual Train/Test * First load the training data on the Preprocess tab
31
Doing Manual Train/Test * Now select Supplied Test Set as the Test Option
32
Doing Manual Train/Test Then Click Set
33
Doing Manual Train/Test * Next Load the Test set
34
Doing Manual Train/Test * Then you’re all set, so click on Start
35
Evaluation Methodology
36
Intro to Chapter 5 Many techniques illustrated in Chapter 5 (ROC curves, recall-precision curves) don’t show up in applied papers They are useful for showing trade-offs between properties of different algorithms You see them in theoretical machine learning papers
37
Intro to Chapter 5 Still important to understand what they represent The thinking behind the techniques will show up in your papers You need to know what your numbers do and don’t demonstrate They give you a unified framework for thinking about machine learning techniques There is no cookie cutter for a good evaluation
38
Confidence Intervals Mainly important if there is some question about whether your data set is big enough You average your performance over 10 folds, but how certain can you be that the number you got is correct? We saw before that performance varies from fold to fold 010203040 ()
39
Confidence Intervals We know that the distribution of categories found in the training set and in the testing set affects the performance Performance on two different sets will not be the same Confidence intervals allow us to say that the probability of the real performance value being within a certain range from the observed value is 90% 010203040 ()
40
Confidence Intervals Confidence limits come from the normal distribution Computed in terms of number of standard deviations from the mean If the data is normally distributed, there is a 15% chance of the real value being more than 1 standard deviation above the mean
41
What is a significance test? How likely is it that the difference you see occurred by chance? How could the difference occur by chance? 010203040 (()) If the mean of one distribution is within the confidence interval of another, the difference you observe could be by chance. If you want p<.05, you need the 90% confidence intervals. Find the corresponding Z scores from a standard normal distribution table.
42
Computing Confidence Intervals 90% confidence interval corresponds to z=1.65 5% chance that a data point will occur to the right of the rightmost edge of the interval f = percentage of successes N = number of trials p = (f + z 2 /2N +or- z(squrt(f/N – f 2 /N + z 2 /4N 2 )))/(1 + z 2 /N) f=75%, N=1000, c=90% -> [0.727,0.773]
43
Significance Tests If you want to know whether the difference in performance between Approach A and Approach B is significant Get performance numbers for A and B on each fold of a 10-fold cross validation You can use the Experimenter or you can do the computation in Excel If you use exactly the same “folds” across approaches you can use a paired t-test rather than an unpaired t-test
44
Significance Tests Don’t forget that you can get a significant result by chance! The Experimenter corrects for multiple comparisons Significance tests are less important if you have a large amount of data and the difference in performance between approaches is large
45
Using the Experimenter * First click New
46
Using the Experimenter Make sure Simple is selected
47
Using the Experimenter Select.csv as the output file format and click on Browse Enter file name Click on Add New
48
Using the Experimenter Load data set
49
Using the Experimenter 10 repetitions is better than 1, but 1 is faster.
50
Using the Experimenter Click on Add New to add algorithms
51
Using the Experimenter Click Choose to select algorithm
52
Using the Experimenter You should add Naïve Bayes, SMO, and J48
53
Using the Experimenter Then click on the Run tab
54
Using the Experimenter Click on Start
55
Using the Experimenter When it’s done, Click on Analyze
56
Using the Experimenter Click File to load the results file you saved
57
Using the Experimenter
58
Do Analysis * Explicitly select default settings here * Then select Kappa Here * Then select Perform Test
59
Do Analysis * Base case is what you are comparing with
60
Take Home Message We focused on practical, methodological aspects of the topic of Evaluation We talked about the concept of a confidence interval and significance tests We learned how to create Train/Test pairs for manual cross-validation, which is useful for preparing for an error analysis We also learned how to use the Experimenter to do experiments and run significance tests
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.