CS639: Data Management for Data Science

Slides:



Advertisements
Similar presentations
Imbalanced data David Kauchak CS 451 – Fall 2013.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Learning Algorithm Evaluation
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluation.
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
INTRODUCTION TO Machine Learning 3rd Edition
Evaluation of Learning Models
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Evaluating Classifiers
CLassification TESTING Testing classifier accuracy
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experimental Evaluation of Learning Algorithms Part 1.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased – error:
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
7. Performance Measurement
CS Fall 2016 (Shavlik©), Lecture 5
Evaluating Classifiers
Performance Evaluation 02/15/17
9. Credibility: Evaluating What’s Been Learned
Dipartimento di Ingegneria «Enzo Ferrari»,
CS 8520: Artificial Intelligence
Empirical Evaluation (Ch 5)
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning Techniques for Data Mining
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Experiments in Machine Learning
Learning Algorithm Evaluation
Evaluating Hypotheses
CSCI N317 Computation for Scientific Applications Unit Weka
Model Evaluation and Selection
Ensemble learning.
Machine Learning in Practice Lecture 22
Intro to Machine Learning
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Model Combination.
Model generalization Brief summary of methods
Evaluating Classifiers
Introduction to Machine learning
COSC 4368 Intro Supervised Learning Organization
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Machine Learning: Lecture 5
Evaluation David Kauchak CS 158 – Fall 2019.
Slides for Chapter 5, Evaluation
Presentation transcript:

CS639: Data Management for Data Science Lecture 19: Evaluating Machine Learning Methods Theodoros Rekatsinas

Announcements You can see your exams during office hours Homework will be announced later this week; we will have only two more projects not three

Today Evaluating ML models

How can we get an unbiased estimate of the accuracy of a learned model?

Test sets How can we get an unbiased estimate of the accuracy of a learned model? When learning a model, you should pretend that you don’t have the test data yet If the test-set labels influence the learned model in any way, accuracy estimates will be biased

Learning curves How does the accuracy of a learning method change as a function of the training-set size? This can be assessed by learning curves

Learning curves Given a training/test set partition For each sample size s on the learning curve (optionally) repeat n times Randomly select s instances from the training set Learn the model Evaluate the model on the test set to determine accuracy a Plot (s,a)

Validation (tuning) sets Suppose we want unbiased estimates of accuracy during the learning process (e.g. to choose the best level of decision-tree pruning)? Partition training data into separate training/validation sets

Limitations of using a single training/test partition We may not have enough data to make sufficiently large training and test sets A larger test set gives us more reliable estimates of accuracy (i.e., a lower variance estimate) But… a larger training set will be more representative of how much data we actually have for learning process A single training set does not tell us how sensitive accuracy is to a particular training sample

Random resampling We can address the second issue by repeatedly randomly partitioning the available data into training and set sets.

Stratified sampling When randomly selecting training or validation sets, we may want to ensure that class proportions are maintained in each selected set

Cross validation

Cross validation example Suppose we have 100 instances, and we want to estimate accuracy with cross validation

Cross validation example 10-fold cross validation is common, but smaller values of n are often used when learning takes a lot of time In leave-one-out cross validation, n=#instances In stratified cross validation, stratified sampling is used when partitioning the data CV makes efficient use of the available data for testing Note that whenever we use multiple training sets, as in CV and random resampling, we are evaluating a learning method as opposed to an individual learned model

Internal cross validation Instead of a single validation set, we can use cross-validation within a training set to select a model (e.g. to choose the best level of decision-tree pruning)

Confusion matrices How can we understand what types of mistakes a learned model makes?

Confusion matrix for 2-class problems

Is accuracy an adequate measure of predictive performance?

Other accuracy metrics

ROC curves

ROC curve example

ROC curves and misclassification costs

Algorithm for creating an ROC curve

Other accuracy metrics

Precision/recall curves

To Avoid Cross-Validation Pitfalls

To Avoid Cross-Validation Pitfalls

To Avoid Cross-Validation Pitfalls

Ablation Studies