1 A Projection-Based Framework for Classifier Performance Evaluation Nathalie Japkowicz + Pritika Sanghi + Peter Tischer + SITE, University of Ottawa.

Slides:

Advertisements

Similar presentations

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.

Advertisements

Lazy Paired Hyper-Parameter Tuning

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

Aggregating local image descriptors into compact codes

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.

« هو اللطیف » By : Atefe Malek. khatabi Spring 90.

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

The Comparison of the Software Cost Estimating Methods

Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland.

Cluster Analysis (1).

© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.

1 Systems of Linear Equations Error Analysis and System Condition.

Rotation Forest: A New Classifier Ensemble Method 交通大學電子所蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.

Table of Contents Solving Linear Systems - Elementary Row Operations A linear system of equations can be solved in a new way by using an augmented matrix.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.

Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.

Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

Today Evaluation Measures Accuracy Significance Testing

Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven.

1 CSI 5388: ROC Analysis (Based on ROC Graphs: Notes and Practical Considerations for Data Mining Researchers by Tom Fawcett, (Unpublished) January 2003.

Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.

Face Recognition and Feature Subspaces

Face Recognition and Feature Subspaces

High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.

Non-Traditional Metrics Evaluation measures from the Evaluation measures from the medical diagnostic community medical diagnostic community Constructing.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Lecture 20: Cluster Validation

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Experimental Evaluation of Learning Algorithms Part 1.

A Language Independent Method for Question Classification COLING 2004.

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

Elementary Linear Algebra Anton & Rorres, 9th Edition

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.

An Investigation of Subspace Outlier Detection Alex Wiegand Supervisor: Jiuyong Li.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.

1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.

Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &

SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.

Evaluating Classifiers

Bag-of-Visual-Words Based Feature Extraction

Chapter 21 More About Tests.

Basic machine learning background with Python scikit-learn

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

1 A Projection-Based Framework for Classifier Performance Evaluation Nathalie Japkowicz *+ Pritika Sanghi + Peter Tischer + * SITE, University of Ottawa + CSSE, Monash University

2 Some Shortcomings of Traditional Classifier Evaluation Measures Evaluation measures such as Accuracy, Precision/Recall, Sensitivity/Specificity, F- measure etc. suffer from the following problems: Evaluation measures such as Accuracy, Precision/Recall, Sensitivity/Specificity, F- measure etc. suffer from the following problems: They summarize the performance results into one or two numbers, thus, loosing a lot of important information.They summarize the performance results into one or two numbers, thus, loosing a lot of important information. They do not always apply to multi-class domains.They do not always apply to multi-class domains. They do not aggregate well when the performance of the classifier is considered over a number of various domains.They do not aggregate well when the performance of the classifier is considered over a number of various domains.

3 A New Framework for Classifier Evaluation Classifier evaluation can be viewed as a problem of analyzing high-dimensional data. Classifier evaluation can be viewed as a problem of analyzing high-dimensional data. The performance measures currently used are but one class of projections that can be applied to these data. The performance measures currently used are but one class of projections that can be applied to these data. Why not apply other (standard or not) projections to the data with various kinds (standard or not) distance measures? Why not apply other (standard or not) projections to the data with various kinds (standard or not) distance measures?

4 Some Advantages of this new Framework Projection approaches are typically intended for visualization. This yields two advantages: Projection approaches are typically intended for visualization. This yields two advantages: A quick and easy way for human-beings to assess classifier performance results.A quick and easy way for human-beings to assess classifier performance results. The possibility to offer simultaneous multiple views of classifier performance evaluation.The possibility to offer simultaneous multiple views of classifier performance evaluation. The framework offers a solution to the problem of aggregating the results obtained by a classifier on several domains. The framework offers a solution to the problem of aggregating the results obtained by a classifier on several domains. The framework offers a way to deal with multi-class domains. The framework offers a way to deal with multi-class domains.

5 The Framework and its Implementation The framework is implemented as a function of the following steps: The framework is implemented as a function of the following steps: 1.All the classifiers in the study are run on all the domains of the study. 2.The performance matrices (e.g., confusion matrix) of a single classifiers on every domain are aggregated into a single vector. This is repeated for each classifier 3.A projection and a distance measure for that projection are chosen and applied to the vectors of Step 2.

6 Illustration of the Framework True class  PosNeg Yes8217 No12114 True class  PosNeg Yes155 No25231 Ideal True class  PosNeg Yes996 No Confusion matrices for a single classifier on three domains

7 A Few Remarks about our Framework It decomposes the evaluation problem neatly, separating the issue of projection from that of distance measure. It decomposes the evaluation problem neatly, separating the issue of projection from that of distance measure. By going from a projection (or two) into a one-dimensional space to one into a two- dimensional space, we allow for two rather than one relationships to be established: By going from a projection (or two) into a one-dimensional space to one into a two- dimensional space, we allow for two rather than one relationships to be established: The ranking of classifiers with respect to the ideal classifier.The ranking of classifiers with respect to the ideal classifier. The comparison of each classifier to the others.The comparison of each classifier to the others.

8 Specific Implementation Details Our framework can be used with any projection technique and any distance function associated to the projection. Our framework can be used with any projection technique and any distance function associated to the projection. In this work, we experimented with: In this work, we experimented with: A Minimal Cost Spanning Tree (MCST) distance-preserving projection [Yang, 2004].A Minimal Cost Spanning Tree (MCST) distance-preserving projection [Yang, 2004]. Two distance functions: the Euclidean Distance (L2-norm) and the Manhattan Distance (L1-Norm). In our experiments, the L2 norm is the one normally used, unless otherwise specified.Two distance functions: the Euclidean Distance (L2-norm) and the Manhattan Distance (L1-Norm). In our experiments, the L2 norm is the one normally used, unless otherwise specified. We focus on the results obtained We focus on the results obtained When combining the confusion matrices of various classifiers on several domains,When combining the confusion matrices of various classifiers on several domains, When dealing with a multi-class domain.When dealing with a multi-class domain.

9 The MCST Projection Approach Plot the ideal classifier and label it c 0 Plot the ideal classifier and label it c 0 For i=1 to # of classifiers For i=1 to # of classifiers Select the classifier closest to c i-1.Select the classifier closest to c i-1. Label it c iLabel it c i Plot c i preserving the following constraints:Plot c i preserving the following constraints: D(P(c i ), D(P(c i-1 )) = D(c i, c i-1 ) D(P(c i ), D(P(c i-1 )) = D(c i, c i-1 ) D(P(c i ), D(P(c 0 )) = D(c i, c 0 ) D(P(c i ), D(P(c 0 )) = D(c i, c 0 ) D(P(c i ), D(P(c i-2 )) = D(c i, c i-2 ) (constraint used only if the first two constraints allow for 2 possibilities) D(P(c i ), D(P(c i-2 )) = D(c i, c i-2 ) (constraint used only if the first two constraints allow for 2 possibilities)  D(x,y) is the distance between x and y.  P(c i ) is the projection of classifier c i.  A classifier, c i, is represented by a concatenation of the confusion matrices it obtained on one or several domains.

10 Experimental Set-Up We tested our approach on four UCI domains: We tested our approach on four UCI domains: 3 Binary ones: Breast Cancer, Labour and Liver.3 Binary ones: Breast Cancer, Labour and Liver. 1 Multi-Class One: Anneal.1 Multi-Class One: Anneal. We compared the performance of eight WEKA classifiers on these domains: NB, J48, Ibk, JRip, SMO, Bagging, AdaBoost, RandFor. We compared the performance of eight WEKA classifiers on these domains: NB, J48, Ibk, JRip, SMO, Bagging, AdaBoost, RandFor. The focus of our study is not to discover which classifier wins or loses on our data sets. Rather, we are using our experiments to illustrate the advantages and disadvantages of our framework over other evaluation methods. We, thus, used Weka’s default parameters in all cases. The focus of our study is not to discover which classifier wins or loses on our data sets. Rather, we are using our experiments to illustrate the advantages and disadvantages of our framework over other evaluation methods. We, thus, used Weka’s default parameters in all cases. Also, though we test our approach with the MCST projection, others could have been used. This is also true of our distance functions. Also, though we test our approach with the MCST projection, others could have been used. This is also true of our distance functions.

11 Illustration on Multiple domains: Breast Cancer, Labour and Liver Acc.F- Meas. AUC NB BC: La: Li: Avg: SMO BC: La: Li: Avg: Boost. BC: La: Li: Avg: : SVM (SMO) 9: NB 1: Ideal Abnormality detection with our new approach is a lot easier and accurate than it is, when relying on Accuracy, F-Measure, or AUC listings on each domain or their average on all domains. Also, our new approach Allows us to mix binary and multi-class domains. Averaging does not!

12 Illustration on a Multiclass domain: Anneal (L2-Norm) Adaboost: a b c d e f  classified as a b c d e f  classified as | a | a | b | b | c | c | d | d | e | e | f | fNB: a b c d e f  classified as a b c d e f  classified as | a | a | b | b | c | c | d | d | e | e | f | f 8: Adaboost 9: NB 1: IdealNB Other Classifiers Ada boost to Accuracy does not tell us whether NB and Adaboost make the same kind of errors!

13 Illustration on Anneal using the L1-Norm When using the L2- norm, NB and Adaboost were at approximately the same distance to Ideal. When using the L2- norm, NB and Adaboost were at approximately the same distance to Ideal. When using the L1- norm, NB is significantly closer. When using the L1- norm, NB is significantly closer.  NB makes fewer errors than Adaboost, but the majority of its errors are concentrated on one or several large classes. 1: Ideal, 8: NB 9: Adaboost Because our new evaluation framework is visual in nature, we can process quickly the results obtained using various distance Measures (evaluation measure), and, thus, interpret our results In a more informed manner. It is easier done this way than by staring at large tables of numbers!

14Summary We present a new framework for classifier evaluation that recognizes that classifier evaluation consists of projecting high-dimensional data into low-dimensional ones. We present a new framework for classifier evaluation that recognizes that classifier evaluation consists of projecting high-dimensional data into low-dimensional ones. By using a projection into 2-dimensional space rather than one, we propose a visualization approach to the problem. This allows for quick assessments of the classifiers’ behaviour based on the results obtained using multiple performance measures. By using a projection into 2-dimensional space rather than one, we propose a visualization approach to the problem. This allows for quick assessments of the classifiers’ behaviour based on the results obtained using multiple performance measures. Each entry of the evaluation vectors we project is compared in pair-wise fashion to its equivalent in other vectors. Thus, our aggregation technique is more precise than that used with traditional performance measures. This is an advantage when considering results over various domains, or in the case of multi-class domains. Each entry of the evaluation vectors we project is compared in pair-wise fashion to its equivalent in other vectors. Thus, our aggregation technique is more precise than that used with traditional performance measures. This is an advantage when considering results over various domains, or in the case of multi-class domains.

15 Future Work As presented, our approach seems limited to the comparison of single classifier’s performance. As presented, our approach seems limited to the comparison of single classifier’s performance. How about threshold-insensitive classifiers?How about threshold-insensitive classifiers? How about the computation of statistical guarantees on our results?How about the computation of statistical guarantees on our results?  This can be solved by plotting either the results obtained at various thresholds, or the results obtained at various folds of a cross-validation regimen, thus plotting clouds of classifiers that could then be analyzed.  We also plan to experiment with other distance measures and projection methods.