A Unified View of Multi-Label Performance Measures

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
On-line learning and Boosting
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Yue Han and Lei Yu Binghamton University.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu Toward Optimal Deployment of Communication-Intensive Cloud Applications 1.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Mining Hierarchical Decision Rules from Hybrid Data with Categorical and Continuous Valued Attributes Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Visual Recognition Tutorial
Sparse vs. Ensemble Approaches to Supervised Learning
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
Evaluating Classification Performance
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.
Experimental Design © 2013 Project Lead The Way, Inc.Principles of Biomedical Science.
Semi-Supervised Learning Using Label Mean
Chapter 7. Classification and Prediction
Dr. Clincy Professor of CS
CACTUS-Clustering Categorical Data Using Summaries
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Title of the Work for Which This Poster is Being Presented
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Correlative Multi-Label Multi-Instance Image Annotation
Analyzing Redistribution Matrix with Wavelet
SoC and FPGA Oriented High-quality Stereo Vision System
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
CS 4/527: Artificial Intelligence
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Data Mining Practical Machine Learning Tools and Techniques
Significant Figures The significant figures of a (measured or calculated) quantity are the meaningful digits in it. There are conventions which you should.
Experimental Design Principles of Biomedical Science
Title of the Work for Which This Poster is Being Presented
Ying shen Sse, tongji university Sep. 2016
Evaluation and Its Methods
Experiments in Machine Learning
Title of the Work for Which This Poster is Being Presented
Evaluating Classifiers (& other algorithms)
Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION
ML – Lecture 3B Deep NN.
Deep Robust Unsupervised Multi-Modal Network
Project 1 Guidelines.
Active Learning with Unbalanced Classes & Example-Generation Queries
Research Question What are you investigating?
Category-Sensitive Question Routing in Community Question Answering
Evaluation and Its Methods
Evaluating Classifiers
Supervised machine learning: creating a model
Evaluation and Its Methods
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Learning and Memorization
Learning to Navigate for Fine-grained Classification
Presentation transcript:

A Unified View of Multi-Label Performance Measures Xi-Zhu Wu, Zhi-Hua Zhou LAMDA Group, Nanjing University Thanks for introduction.

Multi-label classification Multi-label classification deals with the problem where each instance is associated with multiple relevant labels. First, let’s see what is multi-label classification. Multi-label classification deals with the problem where each instance is associated with multiple relevant labels. For example, in multi-class problem, the first photo can be labeled as bicycle, and the second can be labeled as cow. But in multi-label problem, they can have multiple labels, like car, dog, and so on. Multi-class: Multi-label:

Evaluation Evaluation in multi-label classification is complicated. The evaluation in multi-label classification is a bit complicated. For example, the two pictures. Suppose we now have two predictions, prediction A and prediction B. Which is better? In terms of correct predictions, we find that A has four correct predictions, more than B. So prediction A seems to be better. But if we care about the incorrect predictions, B has less incorrect predictions which means B is better. Prediction A: Prediction B:

Performance measures Multi-label performance measures: Hamming loss ranking loss one-error coverage average precision macro-F1 micro-F1 instance-F1 macro-AUC micro-AUC instance-AUC … Multi-label evaluation is complicated, that’s why so many performance measures are proposed. You can see there are many performance measures in multi-label.

Multi-label algorithmic works "My algorithm is better!" See experimental results in these measures… Algorithm Performance measures used LIFT hloss, one-error, coverage, rloss, avgprec, macro-AUC ML-kNN hloss, one-error, coverage, rloss, avgprec MAHR LEAD LMCF hloss, one-error, rloss, avgprec Rank-SVM hloss, rloss, precision, one-error ECC hloss, macro-F1, macro-AUC COMB micro-F1, macro-F1 LM-kNN micro-F1, instance-F1 BSVM RAkEL hloss, micro-F1 CPLST hloss In previous works, when a new multi-label algorithm is proposed, the most common way to prove the superiority is to show the experimental results in different measures. However, there are so many performance measures, and they can only post the results on some of them.

Motivation The phenomenon: BSVM performs well on micro-F1/macro-F1 on datasets called scene and yeast. [Fan and Lin, 2007] BSVM performs poor on coverage/ranking loss on these same datasets. [Zhang and Wu, 2015] The question: is there any relationship between measures? well on measure A ≈ well on measure B ? well on measure A ≈ bad on measure B ? To answer: we need a unified view to analyze many measures at once to reveal the relationship between them. —— label-wise margin/instance-wise margin Meanwhile, we have found this phenomenon that one algorithm performs well on some performance measures may be poor on other measures. So, how to explain the phenomenon. Is it possible that if we do well on measure A, we will also do well on measure B? Or, is it possible if we do well on measure A, the performance on measure B becomes worse To answer this question, we need a unified view to analyze many measures at once to reveal the relationship between them. We proposed two new concepts called label-wise margin and instance-wise margin as tools to analyze them.

Notations Instance matrix: Label matrix: Predictor: , Label matrix Prediction 𝒙 𝟏 1 𝒙 𝟐 𝒙 𝟑 0.9 0.3 0.1 0.8 0.2 0.7 Here we introduce some notations. The instance matrix is X, with m instances. And the label matrix is Y, with m instances and l labels. And the real-value predictor F, which can be decomposed into f1 to fl. For example, there is a label matrix Y and the according prediction matrix F(X). The 1 in label matrix means the positive label, and zero means negative label. We want the predictions higher on positive label and lower on negative label.

Label-wise/instance-wise margin Margin: the minimum difference between the predictions of a positive and a negative label. Label-wise margin: Here comes the definition of our two concepts, label-wise margin and instance-wise margin. First, Margin is the minimum difference between the predictions of a positive and a negative label Label-wise margin is the margin defined on each instance, between positive and negative labels. For example, on the label matrix and the according prediction matrix, the minimum difference on the first instance is between 0.9 and 0.3 Label matrix Prediction Label-wise margin 1 0.9 0.3 0.1 0.8 0.4 0.7 0.6 0.4 -0.2

Label-wise/instance-wise margin Margin: the minimum difference between the predictions of a positive and a negative label. Label-wise margin: Instance-wise margin: Similarly, we define the instance-wise margin. On the first label, the minimum difference is between 0.8 and 0.7, so the instance-wise margin on the first label is 0.1 That’s the definition of label-wise and instance-wise margin Label-wise margin Label matrix Prediction 1 0.9 0.3 0.1 0.8 0.4 0.7 0.6 0.4 -0.2 0.1 0.0 0.3 Instance-wise margin

A prediction function F can: Optimize label-wise margin Optimize inst-wise margin Optimize double margin Now, by the definition of the two margins, a prediction function F can optimize the label-wise margin, or optimize the instance-wise margin. Furthermore, it can optimize double margin, which means optimize both the two margins.

The main results By optimizing different margins, according measures are to be optimized. ‘✔’ means 𝐹 in this cell is proved to optimize this measure; ‘✘’ means 𝐹 in this cell does not necessarily optimize the measure; ‘●’/‘○’ means the calculation is with/without thresholding. similar Here is our main results. We prove that by optimizing different margins, according measures are to be optimized. From the table, we can see some performance measures are closely related The detailed discussion of thresholding can be found in our paper different

Validate the theoretical results Optimizing each/both margins proved to have different results on different measures The results are true in practice? Definition -> Formulation -> Algorithm -> Experiments Formulation: Loss on label-wise margins Loss on instance-wise margins

LIMO (Label-wise and Instance-wise Margin Optimization) SGD In each iteration, the algorithm random pick some instances and labels to optimize the two margins optimize instance-wise margins optimize label-wise margins

Experiments - synthetic The synthetic data 4 labels On [-1,1] square Comparison: LIMO-label LIMO-inst LIMO For LIMO-label, we set lambda_2=0 and only keep the second term (loss on label-wise margin) LIMO LIMO-label LIMO-inst

Experimental results Rescale to [0,1], higher are better

Experimental results – cont’d

Experimental results on benchmark Consistent! The experimental results are consistent with theoretical results Datasets: CAL500, enron, medical, corel5k, bibtex

Conclusion Evaluation is complicated in MLC Too many performance measures Contribution: A unified view of measures, which provides better understanding of MLC evaluation Take-home table:

Thanks! Poster #139 tonight wuxz@lamda.nju.edu.cn Code: https://github.com/YuriWu/LIMO I’m happy to take questions