Information Organization: Evaluation of Classification Performance.

Slides:



Advertisements
Similar presentations
Chapter 5 – Evaluating Classification & Predictive Performance
Advertisements

Chapter 4 – Evaluating Classification & Predictive Performance © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel.
Lecture 22: Evaluation April 24, 2010.
What is Statistical Modeling
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 5 Data mining : A Closer Look.
Measurement and Data Quality
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Copyright © 2004, Graduate Management Admission Council ®. All Rights Reserved. 1 Expected Classification Accuracy Lawrence M. Rudner Graduate Management.
Processing of large document collections Part 2. Feature selection: IG zInformation gain: measures the number of bits of information obtained for category.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CpSc 810: Machine Learning Evaluation of Classifier.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Evaluating Results of Learning Blaž Zupan
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto,
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Adjudicator Agreement and System Rankings for Person Name Search Mark Arehart, Chris Wolf, Keith Miller The MITRE Corporation {marehart, cwolf,
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Text Classification and Naïve Bayes Text Classification: Evaluation.
Screening for Disease: Part One
Evaluating Classifiers
COMP1942 Classification: More Concept Prepared by Raymond Wong
Evaluating Results of Learning
SAD: 6º Projecto.
Figure 1. Table for calculating the accuracy of a diagnostic test.
Model Evaluation and Selection
Machine Learning: Methodology Chapter
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Information Organization: Evaluation of Classification Performance

Classification: Evaluation Measures  Sensitivity = A / (A+C)  Probability that classifier result is correct given that an item belongs to T  Similar to Recall  Specificity = D / (B+D)  Probability that classifier result is correct given that an item belongs to F  Positive Predictive Value = A / (A+B)  Probability that classifier result is correct given that it is T  Similar to Precision  Negative Predictive Value = D / (C+D)  Probability that classifier result is correct given that it is F  Accuracy = (A+D) / (A+B+C+D)  Probability that classifier result is correct  F-measure F-measure Search Engine2 Actual Predicted TF TAB FCD A = True Positive B = False Postive C = False Negative D = True Negative

Classifier Evaluation: Examples  Senario 1  A ball sorter (classifier) with 99% sensitivity and 98% specificity puts 1000 green and 1000 white tennis balls into T-bin (w/ color) and F-bin (w/o color). A= 990 green balls in T-bin (true positive) C= 10 green balls in F-bin (false negative) D= 980 white balls in F-bin (true negative) B= 20 white balls in T-bin (false positive) Positive Predictive Value = 990 / (990+20) = 90% Negative Predictive Value = 980 / (10+980) = 98.99% Accuracy = ( ) / 2000 = 98.5% F-score = (2*990)/( ) = 94.29%  Scenario 2:  A ball sorter (classifier) with 99% sensitivity and 98% specificity puts 100 green and 1900 white tennis balls into T-bin (w/ color) and F-bin (w/o color). A= 99 green balls in T-bin (true positive) C= 1 green balls in F-bin (false negative) D= 1862 white balls in F-bin (true negative) B= 38 white balls in T-bin (false positive) Positive Predictive Value = 99 / (99+38) = 72.26% Negative Predictive Value = 1862 / (1+1862) = 99.95% Accuracy = ( ) / 2000 = 98.05% F-score = (2*99)/( ) = 83.54% Search Engine3 Actual Predicted TF TAB FCD Sensitivity = A / (A+C) Specificity = D / (B+D) PPV = A / (A+B) NPV = D / (C+D) Accuracy = (A+D) / (A+B+C+D) F-score = 2A / (A+C+A+B)

Classifier Evaluation: Multiple Classes  Multiple classes  Macro-Averaging compute performance for each class, then average over all classes gives equal weight to each class – can be unduly influenced by small categories e.g., accuracy = ( ) / 2 = 0.58  Micro-Averaging one confusion matrix for all classes gives equal weights to each item (e.g. document) – can be dominated by large categories e.g., accuracy = 200/600 = 0.33 Search Engine4 Actual Predicted TF T353 F260 Class 1: Accuracy = 95/100 = 0.95 Actual Predicted TF T65360 F3540 Class 2: Accuracy = 105/500 = 0.21 Actual Predicted TF T F37100 Micro-average Table: Accuracy = 200/600 = 0.33

n-Fold Cross Validation  Cross Validation  A technique for estimating classifier performance To reduce variability for accurate prediction of classifier performance  n-fold cross validation construct n sets of training/test data, where – training portion = (n-1)/n of the whole training data – test portion = 1/n of the whole training data (e.g. 3-fold cross validation) compute the classification rate on each set for each classifier, – assess the consistency of each classification rate » if not consistent, problem w/ data or classifier – compute the average of n classification rate to compare w/ other classifiers Search Engine5

Harmonic Mean  Harmonic Mean Harmonic Mean  A type of average e.g., average of rates  Inverse of the mean of the inverse  Count of numbers divided by sum of their reciprocals  F-measure  Harmonic mean of Precision & Recall 6 Example: roundtrip speed A to B: x miles 50 miles/hr from A to B : x/50 hrs 20 miles/hr from B to A: x/20 hrs avg. speed =