A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang

Slides:



Advertisements
Similar presentations
An Introduction to Boosting Yoav Freund Banter Inc.
Advertisements

On-line learning and Boosting
Decision trees for hierarchical multilabel classification A case study in functional genomics.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines
Patch to the Future: Unsupervised Visual Prediction
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Classification and Decision Boundaries
IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Sparse vs. Ensemble Approaches to Supervised Learning
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Learning Algorithms
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Data mining and machine learning A brief introduction.
A speech about Boosting Presenter: Roberto Valenti.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Benk Erika Kelemen Zsolt
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Learning with AdaBoost
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Multi-label Classification Yusuke Miyao. N. Ghamrawi, A. McCallum. Collective multi-label classification. CIKM S. Godbole, S. Sarawagi. Discriminative.
11 Automated multi-label text categorization with VG-RAM weightless neural networks Presenter: Guan-Yu Chen A. F. DeSouza, F. Pedroni, E. Oliveira, P.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture 15. Pattern Classification (I): Statistical Formulation
Correlative Multi-Label Multi-Instance Image Annotation
Boosting and Additive Trees
COSC 4335: Other Classification Techniques
Presentation transcript:

A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang National Laboratory for Novel Software Technology Nanjing University, Nanjing, China July 26, 2005 Zhi-Hua Zhou

Outline Multi-Label Learning (MLL) M L-kNN (Multi-Label k-Nearest Neighbor) Experiments Conclusion

Outline Multi-Label Learning (MLL) M L-kNN (Multi-Label k-Nearest Neighbor) Experiments Conclusion

Multi-Label Objects Lake Trees Mountains Multi-label learning e.g. natural scene image Ubiquitous Documents, Web pages, Molecules......

Formal Definition Settings:  : d-dimensional input space  d  : the finite set of possible labels or classes H:  →2 , the set of multi-label hypotheses Inputs: S: i.i.d. multi-labeled training examples {(x i, Y i )} (i=1,2,... m) drawn from an unknown distribution D, where x i ∈  and Y i   Outputs: h:  →2 , a multi-label predictor; or f :    → , a ranking predictor, where for a given instance x, the labels in  are ordered according to f(x,·)

Evaluation Metrics Given: S: a set of multi-label examples {(x i, Y i )} (i=1,2,... m), where x i ∈  and Y i   f :    → , a ranking predictor (h is the corresponding multi-label predictor) Hamming Loss: One-error: Coverage Ranking Loss: Average Precision: Definitions:

State-of-the-Art I BoosTexter [Schapire & Singer, MLJ00] Extensions of AdaBoost Convert each multi-labeled example into many binary-labeled examples Maximal Margin Labeling [Kazawa et al., NIPS04] Convert MLL problem to a multi-class learning problem Embed labels into a similarity-induced vector space Approximation method in learning and efficient classification algorithm in testing Probabilistic generative models Mixture Model + EM [McCallum, AAAI99] P MM [Ueda & Saito, NIPS03] Text Categorization

State-of-the-Art II Extended Machine Learning Approaches ADTBoost.MH [DeComité et al. MLDM03] Derived from AdaBoost.MH [Freund & Mason, ICML99] and ADT (Alternating Decision Tree) [Freund & Mason, ICML99] Use ADT as a special weak hypothesis in AdaBoost.MH Rank-SVM [Elisseeff & Weston, NIPS02] Minimize ranking loss criterion while at the same have a large margin Multi-Label C4.5 [Clare & King, LNCS2168] Modify the definition of entropy Learn a set of accurate rules, not necessarily a set of complete classification rules

State-of-the-Art III Other Works Another formalization [Jin & Ghahramani, NIPS03] Only one of the labels associated with an instance is correct e.g. disagreement between several assessors Using EM for maximum likelihood estimation Multi-label scene classification [M.R. Boutell, et al. PR04] A natural scene image may belong to several categories e.g. Mountains + Trees Decompose multi-label learning problem into multiple independent two-class learning problems

Outline Multi-Label Learning (MLL) M L-kNN (Multi-Label k-Nearest Neighbor) Experiments Conclusion

Motivation Multi-label text categorization algorithms BoosTexter [Schapire & Singer, MLJ00] Maximal Margin Labeling [Kazawa et al., NIPS04] Probabilistic generative models [McCallum, AAAI99] [Ueda & Saito, NIPS03] Multi-label decision trees ADTBoost.MH [DeComité et al. MLDM03] Multi-Label C4.5 [Clare & King, LNCS2168] Multi-label kernel methods Rank-SVM [Elisseeff & Weston, NIPS02] ML-SVM [M.R. Boutell, et al. PR04] However, multi-label lazy learning approach is unavailable Existing multi-label learning methods

M L-kNN M L-kNN (Multi-Label k -Nearest Neighbor) Derived from the traditional k -Nearest Neighbor algorithm, the first multi-label lazy learning approach Notations: (x,Y): a multi-label d-dimensional example x with associated label set Y   N(x): the set of k nearest neighbors of x identified in the training set : the category vector for x, where takes the value of 1 if l ∈ Y, otherwise 0 : membership counting vector, where counts how many neighbors of x belongs to the l -th category H l 1 : the event that x has label l H l 0 : the event that x doesn’t have label l E l j : the event that, among N(x), there are exactly j examples which have label l

Algorithm Given test example t, the category vector is obtained as follows:  Identify its K nearest neighbors N(t) in the training set  Compute the membership counting vector  Determine with the following maximum a posteriori (MAP) principle All the probabilities can be directly estimated from the training set based on frequency counting Prior probabilities Posteriori probabilities

Outline Multi-Label Learning (MLL) M L-kNN (Multi-Label k-Nearest Neighbor) Experiments Conclusion

Experimental Setup Experimental data Yeast gene functional data Previously studied in the literature [Elisseeff & Weston, NIPS02] Each gene is described by a 103-dimesional feature vector (concatenation of micro-array expression data and phylogenetic profile) Each gene is associated a set of functional classes 1,500 genes in the training set and 917 in the test set There are 14 possible classes and the average number of labels for all genes in the training set is 4.2±1.6 Comparison algorithms M L-kNN : the number of neighbors varies from 6 to 9 Rank-SVM: polynomial kernel with degree 8 ADTBoost.MH: 30 boosting rounds BoosTexter: 1000 boosting rounds

Experimental Results The value of k doesn’t significantly affect M L-kNN ’s Hamming Loss M L-kNN achieves best performance on the other four ranking-based criteria with k=7 The performance of M L-kNN is comparable to that of Rank-SVM Both M L-kNN and Rank-SVM perform significantly better than ADTBoost.MH and BoosTexter

Outline Multi-Label Learning (MLL) M L-kNN (Multi-Label k-Nearest Neighbor) Experiments Conclusion

Conclusion The problem of designing multi-label lazy learning approach is addressed in this paper Experiments on a multi-label bioinformatic multi-label data show that M L-kNN is highly competitive to several existing multi-label learning algorithms Conducting more experiments on other multi-label data sets to fully evaluate the effectiveness of M L-kNN Whether other kinds of distance metrics could further improve the performance of M L-kNN

Suggestions? & Comments? Thanks!