Performance Metrics for Graph Mining Tasks

Slides:

Advertisements

Similar presentations

Learning Algorithm Evaluation

Advertisements

Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.

Model Assessment, Selection and Averaging

Model Evaluation Metrics for Performance Evaluation

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

University at BuffaloThe State University of New York Cluster Validation Cluster validation q Assess the quality and reliability of clustering results.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Evaluating Hypotheses

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Cluster Validation.

QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions Prof. Vera Adamchik.

Experimental Evaluation

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Relationships Among Variables

Evaluating Performance for Data Mining Techniques

1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.

Evaluating Classifiers

Gene expression profiling identifies molecular subtypes of gliomas

Chapter 13: Inference in Regression

2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,

CLassification TESTING Testing classifier accuracy

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人：黃子齊

1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.

Lecture 14 Sections 7.1 – 7.2 Objectives:

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Lecture 20: Cluster Validation

Experimental Evaluation of Learning Algorithms Part 1.

Learning from Observations Chapter 18 Through

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.

CpSc 810: Machine Learning Evaluation of Classifier.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.

Cluster validation Integration ICES Bioinformatics.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.

Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.

Validation methods.

1 Introduction to data mining G. Marcou + + Laboratoire d’infochimie, Université de Strasbourg, 4, rue Blaise Pascal, Strasbourg.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

7. Performance Measurement

Data Mining K-means Algorithm

Dipartimento di Ingegneria «Enzo Ferrari»,

Big Data Analysis and Mining

Clustering Evaluation The EM Algorithm

CSE 4705 Artificial Intelligence

Critical Issues with Respect to Clustering

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

Cluster Analysis.

Text Categorization Berlin Chen 2003 Reference:

Presentation transcript:

Performance Metrics for Graph Mining Tasks

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Introduction to Performance Metrics Performance metric measures how well your data mining algorithm is performing on a given dataset. For example, if we apply a classification algorithm on a dataset, we first check to see how many of the data points were classified correctly. This is a performance metric and the formal name for it is “accuracy.” Performance metrics also help us decide is one algorithm is better or worse than another. For example, one classification algorithm A classifies 80% of data points correctly and another classification algorithm B classifies 90% of data points correctly. We immediately realize that algorithm B is doing better. There are some intricacies that we will discuss in this chapter.

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Supervised Learning Performance Metrics Metrics that are applied when the ground truth is known (E.g., Classification tasks) Outline: 2 X 2 Confusion Matrix Multi-level Confusion Matrix Visual Metrics Cross-validation

2X2 Confusion Matrix f++ f+- C = f++ + f+- f-+ f-- D = f-+ + f-- An 2X2 matrix, is used to tabulate the results of 2-class supervised learning problem and entry (i,j) represents the number of elements with class label i, but predicted to have class label j. Predicted Class + - Actual f++ f+- C = f++ + f+- f-+ f-- D = f-+ + f-- A = f++ + f-+ B = f+- + f-- T = f+++ f-++ f+-+ f-- False Negative True Positive False Positive True Negative + and – are two class labels

2X2 Confusion Metrics Example Results from a Classification Algorithms Corresponding 2x2 matrix for the given table Vertex ID Actual Class Predicted 1 + 2 3 4 5 - 6 7 8 Predicted Class + - Actual 4 1 C = 5 2 D = 3 A = 6 B = 2 T = 8 True positive = 4 False positive = 1 True Negative = 1 False Negative =2

2X2 Confusion Metrics Performance Metrics Walk-through different metrics using the following example Accuracy is proportion of correct predictions Error rate is proportion of incorrect predictions 3. Recall is the proportion of “+” data points predicted as “+” 4. Precision is the proportion of data points predicted as “+” that are truly “+”

Multi-level Confusion Matrix An nXn matrix, where n is the number of classes and entry (i,j) represents the number of elements with class label i, but predicted to have class label j

Multi-level Confusion Matrix Example Predicted Class Marginal Sum of Actual Values Class 1 Class 2 Class 3 Actual Class 2 1 4 3 6 Marginal Sum of Predictions 5 T = 14

Multi-level Confusion Matrix Conversion to 2X2 Predicted Class Class 1 Class 2 Class 3 Actual Class 2 1 3 f-+ f+- f-- We can now apply all the 2X2 metrics Predicted Class Class 1 (+) Not Class 1 (-) Actual Class 2 C = 4 8 D = 10 A = 4 B = 10 T = 14 2X2 Matrix Specific to Class 1 Accuracy = 2/14 Error = 8/14 Recall = 2/4 Precision = 2/4

Multi-level Confusion Matrix Performance Metrics Predicted Class Class 1 Class 2 Class 3 Actual Class 2 1 3 Critical Success Index or Threat Score is the ratio of correct predictions for class L to the sum of vertices that belong to L and those predicted as L 2. Bias - For each class L, it is the ratio of the total points with class label L to the number of points predicted as L. Bias helps understand if a model is over or under-predicting a class

Confusion Metrics R-code library(PerformanceMetrics) data(M) M [,1] [,2] [1,] 4 1 [2,] 2 1 twoCrossConfusionMatrixMetrics(M) data(MultiLevelM) MultiLevelM [,1] [,2] [,3] [1,] 2 1 1 [2,] 1 2 1 [3,] 1 2 3 multilevelConfusionMatrixMetrics(MultiLevelM)

Visual Metrics ROC plot Metrics that are plotted on a graph to obtain the visual picture of the performance of two class classifiers 1 False Positive Rate True Positive Rate (0,1) - Ideal (0,0) Predicts the –ve class all the time (1,1) Predicts the +ve Random Guessing Models AUC = 0.5 ROC plot Plot the performance of multiple models to decide which one performs best

Understanding Model Performance based on ROC Plot Models that lie in upper right are liberal. Will predict “+” with little evidence High False positives Models that lie in this upper left have good performance Note: This is where you aim to get the model 1 False Positive Rate True Positive Rate Random Guessing Models AUC = 0.5 Models that lie in lower left are conservative. Will not predict “+” unless strong evidence Low False positives but high False Negatives Models that lie in this area perform worse than random Note: Models here can be negated to move them to the upper right corner

ROC Plot Example M1 (0.1,0.8) M3 (0.3,0.5) M2 (0.5,0.5) 1 False Positive Rate True Positive Rate M1 (0.1,0.8) M2 (0.5,0.5) M3 (0.3,0.5) M1’s performance occurs furthest in the upper-right direction and hence is considered the best model.

Cross-validation Cross-validation also called rotation estimation, is a way to analyze how a predictive data mining model will perform on an unknown dataset, i.e., how well the model generalizes Strategy: Divide up the dataset into two non-overlapping subsets One subset is called the “test” and the other the “training” Build the model using the “training” dataset Obtain predictions of the “test” set Utilize the “test” set predictions to calculate all the performance metrics Typically cross-validation is performed for multiple iterations, selecting a different non-overlapping test and training set each time

Types of Cross-validation hold-out: Random 1/3rd of the data is used as test and remaining 2/3rd as training k-fold: Divide the data into k partitions, use one partition as test and remaining k-1 partitions for training Leave-one-out: Special case of k-fold, where k=1 Note: Selection of data points is typically done in stratified manner, i.e., the class distribution in the test set is similar to the training set

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Unsupervised Learning Performance Metrics Metrics that are applied when the ground truth is not always available (E.g., Clustering tasks) Outline: Evaluation Using Prior Knowledge Evaluation Using Cluster Properties

Evaluation Using Prior Knowledge To test the effectiveness of unsupervised learning methods is by considering a dataset D with known class labels, stripping the labels and providing the set as input to any unsupervised leaning algorithm, U. The resulting clusters are then compared with the knowledge priors to judge the performance of U To evaluate performance Contingency Table Ideal and Observed Matrices

Contingency Table Cluster Same Cluster Different Cluster Class Same Class u11 u10 Different Class u01 u00 (A) To fill the table, initialize u11, u01, u10, u00 to 0 (B) Then, for each pair of points of form (v,w): if v and w belong to the same class and cluster then increment u11 if v and w belong to the same class but different cluster then increment u10 if v and w belong to the different class but same cluster then increment u01 if v and w belong to the different class and cluster then increment u00

Contingency Table Performance Metrics Example Matrix Cluster Same Cluster Different Cluster Class Same Class 9 4 Different Class 3 12 Rand Statistic also called simple matching coefficient is a measure where both placing a pair of points with the same class label in the same cluster and placing a pair of points with different class labels in different clusters are given equal importance, i.e., it accounts for both specificity and sensitivity of the clustering Jaccard Coefficient can be utilized when placing a pair of points with the same class label in the same cluster is primarily important

Ideal and Observed Matrices Given that the number of points is T, the ideal-matrix is a TxT matrix, where each cell (i,j) has a 1 if the points i and j belong to the same class and a 0 if they belong to different clusters. The observed-matrix is a TxT matrix, where a cell (i,j) has a 1 if the points i and j belong to the same cluster and a 0 if they belong to different cluster Mantel Test is a statistical test of the correlation between two matrices of the same rank. The two matrices, in this case, are symmetric and, hence, it is sufficient to analyze lower or upper diagonals of each matrix

Evaluation Using Prior Knowledge R-code library(PerformanceMetrics) data(ContingencyTable) ContingencyTable [,1] [,2] [1,] 9 4 [2,] 3 12 contingencyTableMetrics(ContingencyTable)

Evaluation Using Cluster Properties In the absence of prior knowledge we have to rely on the information from the clusters themselves to evaluate performance. Cohesion measures how closely objects in the same cluster are related Separation measures how distinct or separated a cluster is from all the other clusters Here, gi refers to cluster i, W is total number of clusters, x and y are data points, proximity can be any similarity measure (e.g., cosine similarity) We want the cohesion to be close to 1 and separation to be close to 0

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Optimizing Metrics Performance metrics that act as optimization functions for a data mining algorithm Outline: Sum of Squared Errors Preserved Variability

Sum of Squared Errors Squared sum error (SSE) is typically used in clustering algorithms to measure the quality of the clusters obtained. This parameter takes into consideration the distance between each point in a cluster to its cluster center (centroid or some other chosen representative). For dj, a point in cluster gi, where mi is the cluster center of gi, and W, the total number of clusters, SSE is defined as follows: This value is small when points are close to their cluster center, indicating a good clustering. Similarly, a large SSE indicates a poor clustering. Thus, clustering algorithms aim to minimize SSE.

Preserved Variability Preserved variability is typically used in eigenvector-based dimension reduction techniques to quantify the variance preserved by the chosen dimensions. The objective of the dimension reduction technique is to maximize this parameter. Given that the point is represented in r dimensions (k << r), the eigenvalues are λ1>=λ2>=….. λr-1>=λr. The preserved variability (PV) is calculated as follows: The value of this parameter depends on the number of dimensions chosen: the more included, the higher the value. Choosing all the dimensions will result in the perfect score of 1.

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Statistical Significance Techniques Methods used to asses a p-value for the different performance metrics Scenario: We obtain say cohesion =0.99 for clustering algorithm A. From the first look it feels like 0.99 is a very good score. However, it is possible that the underlying data is structured in such a way that you would get 0.99 no matter how you cluster the data. Thus, 0.99 is not very significant. One way to decide that is by using statistical significance estimation. We will discuss the Monte Carlo Procedure in next slide!

Monte Carlo Procedure Empirical p-value Estimation Monte Carlo procedure uses random sampling to assess the significance of a particular performance metric we obtain could have been attained at random. For example, if we obtain a cohesion score of a cluster of size 5 is 0.99, we would be inclined to think that it is a very cohesive score. However, this value could have resulted due to the nature of the data and not due to the algorithm. To test the significance of this 0.99 value we Sample N (usually 1000) random sets of size 5 from the dataset Recalculate the cohesion for each of the 1000 sets Count R: number of random sets with value >= 0.99 (original score of cluster) Empirical p-value for the cluster of size 5 with 0.99 score is given by R/N We apply a cutoff say 0.05 to decide if 0.99 is significant Steps 1-4 is the Monte Carlo method for p-value estimation.

Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical Significance Techniques Model Comparison

Model Comparison Metrics that compare the performance of different algorithms Scenario: Model 1 provides an accuracy of 70% and Model 2 provides an accuracy of 75% From the first look, Model 2 seems better, however it could be that Model 1 is predicting Class1 better than Class2 However, Class1 is indeed more important than Class2 for our problem We can use model comparison methods to take this notion of “importance” into consideration when we pick one model over another Cost-based Analysis is an important model comparison method discussed in the next few slides.

Cost-based Analysis In real-world applications, certain aspects of model performance are considered more important than others. For example: if a person with cancer was diagnosed as cancer-free or vice-versa then the prediction model should be especially penalized. This penalty can be introduced in the form of a cost-matrix. Cost Matrix Predicted Class + - Actual c11 c10 c01 c00 Associated with f10 or u10 Associated with f11 or u11 Associated with f01 or u01 Associated with f00 or u00

Cost-based Analysis Cost of a Model The cost and confusion matrices for Model M are given below Cost of Model M is given as: Confusion Matrix Predicted Class + - Actual f11 f10 f01 f00 Cost Matrix Predicted Class + - Actual c11 c10 c01 c00

Cost-based Analysis Comparing Two Models This analysis is typically used to select one model when we have more than one choice through using different algorithms or different parameters to the learning algorithms. Cost Matrix Predicted Class + - Actual -20 100 45 -10 Confusion Matrix of My Predicted Class + - Actual 3 2 1 Confusion Matrix of Mx Predicted Class + - Actual 4 1 2 Cost of My : 200 Cost of Mx: 100 CMx < CMy Purely, based on cost model, Mx is a better model

Cost-based Analysis R-code library(PerformanceMetrics) data(Mx) data(My) data(CostMatrix) Mx [,1] [,2] [1,] 4 1 [2,] 2 1 My [1,] 3 2 costAnalysis(Mx,CostMatrix) costAnalysis(My,CostMatrix)