7. Performance Measurement

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Learning Algorithm Evaluation
Evaluation of segmentation. Example Reference standard & segmentation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Lecture 22: Evaluation April 24, 2010.
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Evaluating Hypotheses
Experimental Evaluation
Today Evaluation Measures Accuracy Significance Testing
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Evaluating Classifiers
1 CSI 5388: ROC Analysis (Based on ROC Graphs: Notes and Practical Considerations for Data Mining Researchers by Tom Fawcett, (Unpublished) January 2003.
CLassification TESTING Testing classifier accuracy
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Experimental Evaluation of Learning Algorithms Part 1.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CpSc 810: Machine Learning Evaluation of Classifier.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Data Science Credibility: Evaluating What’s Been Learned
Evolving Decision Rules (EDR)
Alan P. Reynolds*, David W. Corne and Michael J. Chantler
Evaluating Classifiers
Evaluation – next steps
CSI5388: A Critique of our Evaluation Practices in Machine Learning
CSE 4705 Artificial Intelligence
Evaluating Results of Learning
9. Credibility: Evaluating What’s Been Learned
Dipartimento di Ingegneria «Enzo Ferrari»,
Evaluating Classifiers
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Data Mining Classification: Alternative Techniques
Machine Learning Techniques for Data Mining
Evaluation and Its Methods
Learning Algorithm Evaluation
CSCI N317 Computation for Scientific Applications Unit Weka
Model Evaluation and Selection
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Evaluation and Its Methods
Evaluating Classifiers
Roc curves By Vittoria Cozza, matr
CS639: Data Management for Data Science
Evaluation and Its Methods
Introduction to Machine learning
COSC 4368 Intro Supervised Learning Organization
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Machine Learning: Lecture 5
ROC Curves and Operating Points
Presentation transcript:

7. Performance Measurement Intrusion Detection Module 7. Performance Measurement Tom Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters 27 (2006) 861–874. Stephen Huang Department of Computer Science University of Houston

Overview ROC curves Cross Validation

ROC Curves ROC: Receiver Operating Characteristics. ROC curves are insensitive to changes in class distribution. If the proportion of positive to negative instances changes in a test set, the ROC curves will not change. A discrete classifier produces only a single point in ROC space.

Terminologies P N True Class Pos Neg Hypothesized Class Yes True Positives (TP) False Positives (FP) No False Negatives (FN) True Negatives (TN) P N

Terminologies P N True Class Pos Neg Hypothesized Class Yes True Positives (TP) False Positives (FP) No False Negatives (FN) True Negatives (TN) P N

Terminologies P N True Class Pos Neg Hypothesized Class Yes True Positives (TP) False Positives (FP) No False Negatives (FN) True Negatives (TN) P N

Interpreting ROC Curve

Interpreting ROC Curve Perfect Classifier Random Guessing

Interpreting ROC Curve Expected to be Empty Worse Than Random Guessing.

Interpreting ROC Curve Negate the classifier!

Interpreting ROC Curve Liberal Conservative

Threshold Some classifiers yield an instance probability or score, a numeric value that represents the degree to which an instance is a member of a class. Such a ranking or scoring classifier can be used with a threshold to produce a discrete (binary) classifier. Given a threshold, one can define a classifier as “if the score > threshold, then a Yes else a No.” For each classifier, we can plot a point in the ROC space. Varying a threshold from -∞ to +∞ produces a curve through ROC space.

Example p n 0.900 0.700 0.800 0.530 0.600 0.520 0.550 0.505 0.540 0.390 0.510 0.370 0.400 0.360 0.380 0.350 0.340 0.330 0.300 0.100

Threshold p n 0.900 0.700 0.800 0.530 0.600 0.520 0.550 0.505 0.540 0.390 0.510 0.370 0.400 0.360 0.380 0.350 0.340 0.330 0.300 0.100

Threshold

Problem

Precision

Recall

F-Measure

Accuracy

Produce a ROC 4.16% FP

Cross Validation Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model. In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation. Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

Cross Validation There are two possible goals in cross-validation: To estimate performance of the learned model from available data using one algorithm. In other words, to gauge the generalizability of an algorithm. To compare the performance of two or more different algorithms and find out the best algorithm for the available data, or alternatively to compare the performance of two or more variants of a parameterized model. Type of cross-validation procedures: Resubstitution Validation Hold-Out Validation K-Fold Cross-Validation Leave-One-Out Cross-Validation Repeated K-Fold Cross-Validation

Resubstitution Validation In resubstitution validation, the model is learned from all the available data and then tested on the same set of data. This validation process uses all the available data but suffers seriously from over-fitting. That is, the algorithm might perform well on the available data yet poorly on future unseen test data.

Hold-Out Validation To avoid over-fitting, an independent test set is preferred. A natural approach is to split the available data into two non-overlapped parts: one for training and the other for testing. The test data is held out and not looked at during training. Hold-out validation avoids the overlap between training data and test data, yielding a more accurate estimate for the generalization performance of the algorithm. The downside is that this procedure does not use all the available data and the results are highly dependent on the choice for the training/test split. The instances chosen for inclusion in the test set may be too easy or too difficult to classify and this can skew the results.

Hold-Out Validation Furthermore, the data in the test set may be valuable for training and if it is heldout prediction performance may suffer, again leading to skewed results. These problems can be partially addressed by repeating hold-out validation multiple times and averaging the results, but unless this repetition is performed in a systematic manner, some data may be included in the test set multiple times while others are not included at all, or conversely some data may always fall in the test set and never get a chance to contribute to the learning phase. To deal with these challenges and utilize the available data to the max, k-fold cross-validation is used.

k-fold Cross Validation In k-fold cross-validation the data is first partitioned into k equally (or nearly equally) sized segments or folds. Subsequently k iterations of training and validation are performed such that within each iteration a different fold of the data is held-out for validation while the remaining k - 1 folds are used for learning. Fig. 1 demonstrates an example with k = 3. The darker section of the data are used for training while the lighter sections are used for validation. In data mining and machine learning 10-fold cross-validation (k = 10) is the most common.

k-fold Cross Validation

Leave-One-Out Cross-Validation Leave-one-out cross-validation (LOOCV) is a special case of k-fold cross-validation where k equals the number of instances in the data. In other words in each iteration nearly all the data except for a single observation are used for training and the model is tested on that single observation. An accuracy estimate obtained using LOOCV is known to be almost unbiased but it has high variance, leading to unreliable estimates. It is still widely used when the available data are very rare, especially in bioinformatics where only dozens of data samples are available.

Repeated K-Fold Cross-Validation To obtain reliable performance estimation or comparison, large number of estimates are always preferred. In k-fold cross-validation, only k estimates are obtained. A commonly used method to increase the number of estimates is to run k-fold cross-validation multiple times. The data is reshuffled and re-stratified before each round.