Multi-label Classification Yusuke Miyao. N. Ghamrawi, A. McCallum. Collective multi-label classification. CIKM 2005. S. Godbole, S. Sarawagi. Discriminative.

Slides:

Advertisements

Similar presentations

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Evaluation of Decision Forests on Text Categorization

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Data Mining Classification: Alternative Techniques

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,

Discriminative and generative methods for bags of features

On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Text Classification With Support Vector Machines

Sparse vs. Ensemble Approaches to Supervised Learning

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Presented by Zeehasham Rasheed

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Scalable Text Mining with Sparse Generative Models

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Discriminative and generative methods for bags of features

A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang

Final review LING572 Fei Xia Week 10: 03/11/

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Data mining and machine learning A brief introduction.

MULTICLASS CONTINUED AND RANKING David Kauchak CS 451 – Fall 2013.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Data Mining and Decision Support

NTU & MSRA Ming-Feng Tsai

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.

KNN & Naïve Bayes Hongning Wang

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Presentation transcript:

Multi-label Classification Yusuke Miyao

N. Ghamrawi, A. McCallum. Collective multi-label classification. CIKM S. Godbole, S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD G. Tsoumakas, I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. ECML G. Tsoumakas, I. Katakis. Multi-label classification: An overview. Journal of Data Warehousing and Mining A. Fujino, H. Isozaki. Multi-label text categorization with model combination based on F1-score maximization. IJCNLP 2008.

Machine Learning Template Library Separate data structures from learning algorithms Allow for any combinations of structures and algorithms decode expectation diff InterfaceData structure Perceptron 1-best MIRA Log-linear model n-best Classifier Markov chain Dep. tree Semi-Markov Multi-label Learning algorithm n-best MIRA Max-margin EM algorithm Reranking Feature forest Naïve Bayes

Target Problem Choose multiple labels from a fixed set of labels Ex. Keyword assignment (text categorization) Keyword set Text Politics Sports Entertainment Life Food Recipe Comedy Drama Travel Tech Health Video Book Food Recipe Animation Select appropriate keywords for the text Music

Applications Keyword assignment (text categorization) – Benchmark data: Reuter-21578, OHSUMED, etc. Medical diagnosis Protein function classification – Benchmark data: Yeast, Genbase, etc. Music/scene categorization Non-contiguous, overlapping segmentation [McDonald et al., 2005]

Formulation x : object, L : label set, y ⊆ L : labels assigned to x y = argmax x f(x,y) L x Politics Sports Entertainment Life Food Recipe Comedy Drama Travel Tech Health Video Book Food Recipe Animation y Music

Popular Approaches Subsets as atomic labels – Each subset is considered as an atomic label – Tractable only when |L| is small A set of binary classifications – One-vs-all – Each label is independently assigned Label ranking – A ranking function is induced from multi-labeled data (BoosTexter [Schapire et al., 2000], Rank-SVM [Elisseeff et al., 2002], large-margin [Crammer et al., 2003] ) Probabilistic generative models [McCallum 1999; Ueda et al., 2003; Sato et al., 2007]

Issues on Multi-Label Classification How to reduce training/running cost – The number of targets (i.e. subsets) is exponentially related to the size of the label set How to model correlation of labels – Binary classification cannot use features on multiple labels Classification vs. Ranking Hierarchical multi-label classification (ex. MeSH term) [Cesa-Bianchi et al. 2006; J. Rousu et al., 2006]

Collective Multi-Label Classification CRF is applied to multi-label classification Features are defined on pairs of labels Notation: – y i = 1 if i- th label ∈ y – y i = 0 otherwise

Accounting for Multiple Labels Binary Model: f b (x,y) : y i given x Collective Multi-Label (CML) model: f ml (x,y) : y i and y j Collective Multi-Label with Features (CMLF) model: f mlf (x,y) : y i and y j given x

Parameter Estimation Enumeration of y is intractable in general Two approximations: – Supported combinations: consider only the label combinations that occur in training data – Binary pruned inference: first apply binary model consider only the labels having probabilities above a threshold t No dynamic programming

Experiments Reuters Modified Apte (ModApte) split – 90 labels – Training: 9,603 docs, Test: 3,299 docs – 8.7% of the documents have multiple labels OHSUMED “Heart Disease” documents – 40 labels assigned to training documents – 16 labels assigned to 75 or more training documents

Supported combinations Binary pruned Results: Reuters BinaryCMLCMLF macro-F micro-F exact match classification time 1.4 ms48 ms78 ms BinaryCMLCMLF macro-F micro-F exact match classification time 1.4 ms4.6 ms4.7 ms

Supported combinations Binary pruned Results: OHSUMED BinaryCMLCMLF macro-F micro-F exact match BinaryCMLCMLF macro-F micro-F exact match

Similar Methods H. Kazawa et al. Maximal margin labeling for multi-topic text categorization. NIPS – All subsets are considered as atomic labels – Approximation by only considering neighbor subsets (subsets that differ in a single label from the gold) S. Zhu et al. Multi-labelled classification using maximum entropy method. SIGIR – Simply enumerate all subsets, and use f ml – Only evaluated with small label sets ( ≦ 10)

Discriminative Methods for Multi- Labeled Classification Cascade binary classifiers (SVM) Another technique: remove negative instances that are close to decision boundary |L||L| |L||L| classifier for each label input text |L||L| |L||L| ensemble classifier

Random k-Labelsets Randomly select size- k subsets from 2 L Train multi-class classifiers for the subsets Label a new instance by majority voting YmYm YmYm Y2Y2 Y2Y2 Y1Y1 Y1Y1 Y3Y3 Y3Y3 classifiers for size- k subsets input text (1,0,0,1,0,…,0,0) (0,1,0,1,0,…,0,1) (1,0,0,0,1,…,0,1) (0,0,0,1,0,…,1,1) majority voting (0,0,0,1,0,…,0,1)

Other Approaches Learn a latent model to account for label correlations – K. Yu et al. Multi-label informed latent semantic indexing. SIGIR – J. Zhang et al. Learning multiple related tasks using latent independent component analysis. NIPS – V. Roth et al. Improved functional prediction of proteins by learning kernel combinations in multilabel settings. PMSB kNN-like algorithms – M-L Zhang et al. A k-nearest neighbor based algorithm for multi- label classification. IEEE Conference on Granular Computing – F. Kang et al. Correlated label propagation with application to multi-label learning. CVPR – K. Brinker et al. Case-based multilabel ranking. IJCAI 2007.

Summary Multi-label classification is an important and interesting problem Major issues: – Label correlation – Computational cost A lot of methods have been proposed – Basically, enhancement of fundamental methods (subsets as atomic labels, set of binary classifications) No existing methods solve the problem completely

Future Directions Algorithm for exact solution? Other learning algorithms – Via machine learning template library Structurization of label sets – IS-A hierarchy → hierarchical multi-label – Exclusive labels Modeling of label distance – Redesign of objective functions

Possible Applications Any tasks of keyword assignments Substitute for n-best/ranking Multi-label problems where label sets are not fixed – Keyword (key phrase) extraction Choose words/phrases from each document – Summarization by sentence extraction cf. D. Xin et al. Extracting redundancy-aware top-k patterns. KDD 2006.