Evaluating Classifiers

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

Evaluating Classifiers
Learning Algorithm Evaluation
Lecture 22: Evaluation April 24, 2010.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
Evaluation – next steps
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
Machine learning system design Prioritizing what to work on
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Validation methods.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
7. Performance Measurement
Machine Learning – Classification David Fenyő
Reading: R. Schapire, A brief introduction to boosting
Performance Evaluation 02/15/17
CSSE463: Image Recognition Day 11
CSE 4705 Artificial Intelligence
Perceptrons Lirong Xia.
Classification with Perceptrons Reading:
ECE 5424: Introduction to Machine Learning
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Dipartimento di Ingegneria «Enzo Ferrari»,
Bias and Variance of the Estimator
CS 4/527: Artificial Intelligence
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Alternative Techniques
Roberto Battiti, Mauro Brunato
CSSE463: Image Recognition Day 11
Hyperparameters, bias-variance tradeoff, validation
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
CSCI N317 Computation for Scientific Applications Unit Weka
Model Evaluation and Selection
Overfitting and Underfitting
Intro to Machine Learning
Model generalization Brief summary of methods
Data Mining Class Imbalance
Machine learning overview
CSSE463: Image Recognition Day 11
Evaluating Hypothesis
CSSE463: Image Recognition Day 11
Evaluating Classifiers
CS639: Data Management for Data Science
Introduction to Neural Networks
Introduction to Machine learning
COSC 4368 Intro Supervised Learning Organization
Perceptrons Lirong Xia.
More on Maxent Env. Variable importance:
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Evaluating Classifiers

Reading for this topic: T Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

Evaluating Classifiers What we want: Classifier that best predicts unseen (“test”) data Common assumption: Data is “iid” (independently and identically distributed)

Topics Cross Validation Precision and Recall ROC Analysis Bias, Variance, and Noise

Cross-Validation

Accuracy and Error Rate Accuracy = fraction of correct classifications on unseen data (test set) Error rate = 1 − Accuracy

How to use available data to best measure accuracy? Split data into training and test sets. But how to split? Too little training data: Cannot learn a good model Too little test data: Cannot evaluate learned model Also, how to learn hyper-parameters of the model?

One solution: “k-fold cross validation” Used to better estimate generalization accuracy of model Used to learn hyper-parameters of model (“model selection”)

Using k-fold cross validation to estimate accuracy Each example is used both as a training instance and as a test instance. Instead of splitting data into “training set” and “test set”, split data into k disjoint parts: S1, S2, ..., Sk. For i = 1 to k Select Si to be the “test set”. Train on the remaining data, test on Si , to obtain accuracy Ai . Report as the final accuracy.

Using k-fold cross validation to learn hyper-parameters (e. g Using k-fold cross validation to learn hyper-parameters (e.g., learning rate, number of hidden units, SVM kernel, etc. ) Split data into training and test sets. Put test set aside. Split training data into k disjoint parts: S1, S2, ..., Sk. Assume you are learning one hyper-parameter. Choose R possible values for this hyper-parameter. For j = 1 to R For i = 1 to k Select Si to be the “validation set” Train the classifier on the remaining data using the jth value of the hyperparameter Test the classifier on Si , to obtain accuracy Ai ,j. Compute the average of the accuracies: Choose the value j of the hyper-parameter with highest . Retrain the model with all the training data, using this value of the hyper-parameter. Test resulting model on the test set.

Precision and Recall

Evaluating classification algorithms “Confusion matrix” for a given class c Actual Predicted (or “classified”) Positive Negative (in class c) (not in class c) Positive (in class c) TruePositive FalseNegative Negative (not in class c) FalsePositive TrueNegative Type 2 error Type 1 error

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”.

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4.

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive.

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive. Here: 1/2

Recall and Precision All instances in test set Predicted “positive” Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive. Here: 1/2 Recall = Sensitivity = True Positive Rate

In-class exercise 1

Example: “A” vs. “B” Assume “A” is positive class Results from Perceptron: Instance Class Perception Output 1 “A” 1 2 “A” 1 3 “A” 1 4 “A” 0 5 “B” 1 6 “B” 0 7 “B” 0 8 “B” 0

Creating a Precision vs. Recall Curve Threshold Accuracy Precision Recall .9 .8 .7 .6 .5 .4 .3 .2 .1 -∞ Results of classifier

Multiclass classification: Mean Average Precision “Average precision” for a class: Area under precision/recall curve for that class Mean average precision: Mean of average precision over all classes

ROC Analysis

(“recall” or “sensitivity”) (1  “specificity”)

Creating a ROC Curve Results of classifier Threshold Accuracy TPR FPR .9 .8 .7 .6 .5 .4 .3 .2 .1 -∞ Results of classifier

How to create a ROC curve for a binary classifier Run your classifier on each instance in the test data. Calculate score. Get the range [min, max] of your scores Divide the range into about 200 thresholds, including −∞ and max For each threshold, calculate TPR and FPR Plot TPR (y-axis) vs. FPR (x-axis)

In-Class Exercise 2

Precision/Recall vs. ROC Curves Precision/Recall curves typically used in “detection” or “retrieval” tasks, in which positive examples are relatively rare. ROC curves used in tasks where positive/negative examples are more balanced.

Bias, Variance, and Noise

Bias, Variance, and Noise A classifier’s error can be separated into three components: bias, variance, and noise

Bias: Related to ability of model to learn function High Bias: Model is not powerful enough to learn function. Learns a less-complicated function (underfitting). Low Bias: Model is too powerful; learns a too-complicated function (overfitting). From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt https://www.quora.com/topic/Logistic-Regression

Variance: How much a learned function will vary if different training sets are used Same experiment, repeated: with 50 samples of 20 points each From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt

Noise: Underlying process generating data is stochastic, or data has errors or outliers . . From http:// eecs.oregonstate.edu/~tgd/talks/BV.ppt

Examples of causes of bias? Examples of causes of variance? Examples of causes of noise?

From http://lear.inrialpes.fr/~verbeek/MLCR.10.11.php