Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Lecture 22: Evaluation April 24, 2010.
What is Statistical Modeling
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Spring 2003Data Mining by H. Liu, ASU1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations.
Model Evaluation Metrics for Performance Evaluation
Low/High Findability Analysis Shariq Bashir Vienna University of Technology Seminar on 2 nd February, 2009.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Performance measures Morten Nielsen, CBS, BioCentrum, DTU.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
Jeremy Wyatt Thanks to Gavin Brown
Introduction to Machine Learning Approach Lecture 5.
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Evaluation – next steps
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
CpSc 810: Machine Learning Evaluation of Classifier.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Information Organization: Evaluation of Classification Performance.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Canadian Bioinformatics Workshops
7. Performance Measurement
Evolving Decision Rules (EDR)
Evaluating Classifiers
Performance Evaluation 02/15/17
Evaluating Results of Learning
Features & Decision regions
Evaluation and Its Methods
Image Classification via Attribute Detection
Learning Algorithm Evaluation
ROC Curves and Operating Points
Model Evaluation and Selection
Evaluating Models Part 1
Computational Intelligence: Methods and Applications
Dr. Sampath Jayarathna Cal Poly Pomona
Evaluation and Its Methods
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Machine Learning: Methodology Chapter
Dr. Sampath Jayarathna Cal Poly Pomona
Evaluation and Its Methods
ROC Curves and Operating Points
Presentation transcript:

Classification Performance Evaluation

How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier A better than classifier B? Internal Evaluation: – Measure the performance of the classifier. External Evaluation: – Measure the performance performed by an agency or individuals not directly involved in or responsible for the program

Accuracy Easily the most common and intuitive measure of classification performance.

Significance testing Say I have two classifiers. A = 50% accuracy B = 75% accuracy B is better, right?

Significance Testing Say I have another two classifiers A = 50% accuracy B = 50.5% accuracy Is B better?

Basic Evaluation Training data – used to identify model parameters Testing data – used for evaluation Optionally: Development / tuning data – used to identify model hyperparameters. Difficult to get significance or confidence values

Cross validation Identify n “folds” of the available data. Train on n-1 folds Test on the remaining fold. In the extreme (n=N) this is known as “leave-one-out” cross validation n-fold cross validation (xval) gives n samples of the performance of the classifier.

Types of Errors False Positives – The system predicted TRUE but the value was FALSE – aka “False Alarms” or Type I error False Negatives – The system predicted FALSE but the value was TRUE – aka “Misses” or Type II error

Problems with accuracy Contingency Table True Values PositiveNegativ e Hyp Values PositiveTrue Positive False Positive Negativ e False Negative True Negative

Problems with accuracy Information Retrieval Example – Find the 10 documents related to a query in a set of 110 documents True Values PositiveNegative Hyp Values Positive00 Negative10100

Problems with accuracy Precision: how many hypothesized events were true events Recall: how many of the true events were identified F 1 -Measure: Harmonic mean of precision and recall True Values PositiveNegativ e Hyp Values Positive00 Negativ e 10100

F-Measure F-measure can be weighted to favor Precision or Recall – beta > 1 favors recall – beta < 1 favors precision True Values PositiveNegative Hyp Values Positive00 Negative10100

F-Measure Accuracy is weighted towards majority class performance. F-measure is useful for measuring the performance on both classes.

ROC curves It is common to plot classifier performance at a variety of settings or thresholds Receiver Operating Characteristic (ROC) curves plot true positives against false positives. The overall performance is calculated by the Area Under the Curve (AUC)

What about more than 2 classes Convert to a set of binary-class classification problems

What about more than 2 classes a as positive a(pos) b&c(neg) <-- classified as TP=50 FN=0 a(pos) FP=0 TN=100b&c(neg) b as positive b(pos) a&c(neg) <-- classified as TP=46 FN=4 b(pos) FP=0 TN=100a&c(neg) try c as positive yourself