Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluating Classifiers

Similar presentations


Presentation on theme: "Evaluating Classifiers"— Presentation transcript:

1 Evaluating Classifiers

2 Reading for this topic: T
Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

3 Evaluating Classifiers
What we want: Classifier that best predicts unseen (“test”) data Common assumption: Data is “iid” (independently and identically distributed)

4 Topics Cross Validation Precision and Recall ROC Analysis
Bias, Variance, and Noise

5 Cross-Validation

6 Accuracy and Error Rate
Accuracy = fraction of correct classifications on unseen data (test set) Error rate = 1 − Accuracy

7 How to use available data to best measure accuracy?
Split data into training and test sets. But how to split? Too little training data: Cannot learn a good model Too little test data: Cannot evaluate learned model Also, how to learn hyper-parameters of the model?

8 One solution: “k-fold cross validation”
Used to better estimate generalization accuracy of model Used to learn hyper-parameters of model (“model selection”)

9 Using k-fold cross validation to estimate accuracy
Each example is used both as a training instance and as a test instance. Instead of splitting data into “training set” and “test set”, split data into k disjoint parts: S1, S2, ..., Sk. For i = 1 to k Select Si to be the “test set”. Train on the remaining data, test on Si , to obtain accuracy Ai . Report as the final accuracy.

10 Using k-fold cross validation to learn hyper-parameters (e. g
Using k-fold cross validation to learn hyper-parameters (e.g., learning rate, number of hidden units, SVM kernel, etc. ) Split data into training and test sets. Put test set aside. Split training data into k disjoint parts: S1, S2, ..., Sk. Assume you are learning one hyper-parameter. Choose R possible values for this hyper-parameter. For j = 1 to R For i = 1 to k Select Si to be the “validation set” Train the classifier on the remaining data using the jth value of the hyperparameter Test the classifier on Si , to obtain accuracy Ai ,j. Compute the average of the accuracies: Choose the value j of the hyper-parameter with highest Retrain the model with all the training data, using this value of the hyper-parameter. Test resulting model on the test set.

11 Precision and Recall

12 Evaluating classification algorithms
“Confusion matrix” for a given class c Actual Predicted (or “classified”) Positive Negative (in class c) (not in class c) Positive (in class c) TruePositive FalseNegative Negative (not in class c) FalsePositive TrueNegative Type 2 error Type 1 error

13 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

14 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”.

15 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4.

16 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive.

17 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive. Here: 1/2

18 Recall and Precision All instances in test set Predicted “positive”
Predicted “negative” x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Recall: Fraction of positive examples predicted as “positive”. Here: 3/4. Precision: Fraction of examples predicted as “positive” that are actually positive. Here: 1/2 Recall = Sensitivity = True Positive Rate

19 In-class exercise 1

20 Example: “A” vs. “B” Assume “A” is positive class
Results from Perceptron: Instance Class Perception Output 1 “A” 1 2 “A” 1 3 “A” 1 4 “A” 0 5 “B” 1 6 “B” 0 7 “B” 0 8 “B” 0

21 Creating a Precision vs. Recall Curve
Threshold Accuracy Precision Recall .9 .8 .7 .6 .5 .4 .3 .2 .1 -∞ Results of classifier

22 Multiclass classification: Mean Average Precision
“Average precision” for a class: Area under precision/recall curve for that class Mean average precision: Mean of average precision over all classes

23 ROC Analysis

24 (“recall” or “sensitivity”)
(1  “specificity”)

25

26 Creating a ROC Curve Results of classifier Threshold Accuracy TPR FPR
.9 .8 .7 .6 .5 .4 .3 .2 .1 -∞ Results of classifier

27

28 How to create a ROC curve for a binary classifier
Run your classifier on each instance in the test data. Calculate score. Get the range [min, max] of your scores Divide the range into about 200 thresholds, including −∞ and max For each threshold, calculate TPR and FPR Plot TPR (y-axis) vs. FPR (x-axis)

29 In-Class Exercise 2

30 Precision/Recall vs. ROC Curves
Precision/Recall curves typically used in “detection” or “retrieval” tasks, in which positive examples are relatively rare. ROC curves used in tasks where positive/negative examples are more balanced.

31 Bias, Variance, and Noise

32 Bias, Variance, and Noise
A classifier’s error can be separated into three components: bias, variance, and noise

33 Bias: Related to ability of model to learn function
High Bias: Model is not powerful enough to learn function. Learns a less-complicated function (underfitting). Low Bias: Model is too powerful; learns a too-complicated function (overfitting). From eecs.oregonstate.edu/~tgd/talks/BV.ppt

34 Variance: How much a learned function will vary if different training sets are used
Same experiment, repeated: with 50 samples of 20 points each From eecs.oregonstate.edu/~tgd/talks/BV.ppt

35 Noise: Underlying process generating data is stochastic, or data has errors or outliers
. . From eecs.oregonstate.edu/~tgd/talks/BV.ppt

36 Examples of causes of bias?
Examples of causes of variance? Examples of causes of noise?

37 From http://lear.inrialpes.fr/~verbeek/MLCR.10.11.php

38


Download ppt "Evaluating Classifiers"

Similar presentations


Ads by Google