Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Learning Algorithm Evaluation
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Lecture 22: Evaluation April 24, 2010.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
WEKA - Experimenter (sumber: WEKA Explorer user Guide for Version 3-5-5)
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Statistics: Data Analysis and Presentation Fr Clinic II.
Evaluation.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Experimental Evaluation
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Today Concepts underlying inferential statistics
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Today Evaluation Measures Accuracy Significance Testing
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Learning from Observations Chapter 18 Through
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Machine Learning Chapter 5. Evaluating Hypotheses
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Exercise in Machine Learning
Data Analysis, Presentation, and Statistics
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 11
Machine Learning in Practice Lecture 26
Machine Learning in Practice Lecture 23
Machine Learning in Practice Lecture 7
Machine Learning in Practice Lecture 17
Machine Learning in Practice Lecture 27
Assignment 8 : logistic regression
Machine Learning: Lecture 5
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day Announcements  Questions?  Assignment 4  Quiz Today’s Data Set: Speaker Identification Weka helpful hints  Visualizing Errors for Regression Problems  Alternative forms of cross-validation  Creating Train/Test Pairs Intro to Evaluation

Assignment 3 Notes Everyone did well Please compare your solution to the answer key  Note distinction between P(sunny|yes) vs. P(yes|sunny)  One student didn’t include prior probabilities in likelihood computation

Speaker Identification

Today’s Data Set – Speaker Identification

Predictions? What previous data set does this remind you of?

Predictions? What previous data set does this remind you of? Results  J48.53 Kappa  SMO.37 Kappa  Naïve Bayes.16 Kappa

Notice Ranges and Contingencies

Most Predictive Feature

Least Predictive Feature

What would 1R do?

.16 Kappa

Weka Helpful Hints

Evaluating Numeric Prediction: CPU data

Visualizing Classifier Errors for Numeric Prediction

Creating Train/Test Pairs First click here

Creating Train/Test Pairs If you pick unsupervised, you’ll get non-stratified folds, otherwise you’ll get stratified folds.

Stratified versus Non-Stratified Weka’s standard cross-validation is stratified  Data is randomized before dividing it into folds  Preserves distribution of class values across folds  Reduces variance in performance Unstratified cross-validation means there is no randomization  Order is preserved  Advantage for matching predictions with instances in Weka

Stratified versus Non-Stratified Leave-one-out cross validation  Train on all but one instance Iterate over all instances  Extreme version of unstratified cross-validation If test set only has one instance, the distribution of class values cannot be preserved  Maximizes amount of data used for training on each fold

Stratified versus Non-Stratified Leave-one-subpopulation out  If you have several data points from the same subpopulation Speech data from the same speaker  May have data from same subpopulation in train and test  over-estimates overlap between train and test When is this not a problem?  You can manually make sure that won’t happen  You have to do that by hand

Creating Train/Test Pairs If you pick unsupervised, you’ll get non-stratified folds, otherwise you’ll get stratified folds.

Creating Train/Test Pairs Now click here

Creating Train/Test Pairs

You’re going to run this filter 20 times altogether. twice for every fold.

Creating Train/Test Pairs True for Train, false for Test

Creating Train/Test Pairs If you’re doing Stratified, make sure you have to class attribute selected here.

Creating Train/Test Pairs 1. Click Apply

Creating Train/Test Pairs 2. Save the file

Creating Train/Test Pairs 3. Undo before you create the next file

Doing Manual Train/Test * First load the training data on the Preprocess tab

Doing Manual Train/Test * Now select Supplied Test Set as the Test Option

Doing Manual Train/Test Then Click Set

Doing Manual Train/Test * Next Load the Test set

Doing Manual Train/Test * Then you’re all set, so click on Start

Evaluation Methodology

Intro to Chapter 5 Many techniques illustrated in Chapter 5 (ROC curves, recall-precision curves) don’t show up in applied papers  They are useful for showing trade-offs between properties of different algorithms  You see them in theoretical machine learning papers

Intro to Chapter 5 Still important to understand what they represent The thinking behind the techniques will show up in your papers  You need to know what your numbers do and don’t demonstrate  They give you a unified framework for thinking about machine learning techniques  There is no cookie cutter for a good evaluation

Confidence Intervals Mainly important if there is some question about whether your data set is big enough You average your performance over 10 folds, but how certain can you be that the number you got is correct? We saw before that performance varies from fold to fold ()

Confidence Intervals We know that the distribution of categories found in the training set and in the testing set affects the performance Performance on two different sets will not be the same Confidence intervals allow us to say that the probability of the real performance value being within a certain range from the observed value is 90% ()

Confidence Intervals Confidence limits come from the normal distribution Computed in terms of number of standard deviations from the mean If the data is normally distributed, there is a 15% chance of the real value being more than 1 standard deviation above the mean

What is a significance test? How likely is it that the difference you see occurred by chance? How could the difference occur by chance? (()) If the mean of one distribution is within the confidence interval of another, the difference you observe could be by chance. If you want p<.05, you need the 90% confidence intervals. Find the corresponding Z scores from a standard normal distribution table.

Computing Confidence Intervals 90% confidence interval corresponds to z=1.65  5% chance that a data point will occur to the right of the rightmost edge of the interval f = percentage of successes N = number of trials p = (f + z 2 /2N +or- z(squrt(f/N – f 2 /N + z 2 /4N 2 )))/(1 + z 2 /N) f=75%, N=1000, c=90% -> [0.727,0.773]

Significance Tests If you want to know whether the difference in performance between Approach A and Approach B is significant  Get performance numbers for A and B on each fold of a 10-fold cross validation  You can use the Experimenter or you can do the computation in Excel  If you use exactly the same “folds” across approaches you can use a paired t-test rather than an unpaired t-test

Significance Tests  Don’t forget that you can get a significant result by chance! The Experimenter corrects for multiple comparisons  Significance tests are less important if you have a large amount of data and the difference in performance between approaches is large

Using the Experimenter * First click New

Using the Experimenter Make sure Simple is selected

Using the Experimenter Select.csv as the output file format and click on Browse Enter file name Click on Add New

Using the Experimenter Load data set

Using the Experimenter 10 repetitions is better than 1, but 1 is faster.

Using the Experimenter Click on Add New to add algorithms

Using the Experimenter Click Choose to select algorithm

Using the Experimenter You should add Naïve Bayes, SMO, and J48

Using the Experimenter Then click on the Run tab

Using the Experimenter Click on Start

Using the Experimenter When it’s done, Click on Analyze

Using the Experimenter Click File to load the results file you saved

Using the Experimenter

Do Analysis * Explicitly select default settings here * Then select Kappa Here * Then select Perform Test

Do Analysis * Base case is what you are comparing with

Take Home Message We focused on practical, methodological aspects of the topic of Evaluation We talked about the concept of a confidence interval and significance tests We learned how to create Train/Test pairs for manual cross-validation, which is useful for preparing for an error analysis We also learned how to use the Experimenter to do experiments and run significance tests