Download presentation
Presentation is loading. Please wait.
Published byBrooke Collins Modified over 8 years ago
1
Multi-label Classification Yusuke Miyao
2
N. Ghamrawi, A. McCallum. Collective multi-label classification. CIKM 2005. S. Godbole, S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD 2004. G. Tsoumakas, I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. ECML 2007. G. Tsoumakas, I. Katakis. Multi-label classification: An overview. Journal of Data Warehousing and Mining. 2007. A. Fujino, H. Isozaki. Multi-label text categorization with model combination based on F1-score maximization. IJCNLP 2008.
3
Machine Learning Template Library Separate data structures from learning algorithms Allow for any combinations of structures and algorithms decode expectation diff InterfaceData structure Perceptron 1-best MIRA Log-linear model n-best Classifier Markov chain Dep. tree Semi-Markov Multi-label Learning algorithm n-best MIRA Max-margin EM algorithm Reranking Feature forest Naïve Bayes
4
Target Problem Choose multiple labels from a fixed set of labels Ex. Keyword assignment (text categorization) Keyword set Text Politics Sports Entertainment Life Food Recipe Comedy Drama Travel Tech Health Video Book Food Recipe Animation Select appropriate keywords for the text Music
5
Applications Keyword assignment (text categorization) – Benchmark data: Reuter-21578, OHSUMED, etc. Medical diagnosis Protein function classification – Benchmark data: Yeast, Genbase, etc. Music/scene categorization Non-contiguous, overlapping segmentation [McDonald et al., 2005]
6
Formulation x : object, L : label set, y ⊆ L : labels assigned to x y = argmax x f(x,y) L x Politics Sports Entertainment Life Food Recipe Comedy Drama Travel Tech Health Video Book Food Recipe Animation y Music
7
Popular Approaches Subsets as atomic labels – Each subset is considered as an atomic label – Tractable only when |L| is small A set of binary classifications – One-vs-all – Each label is independently assigned Label ranking – A ranking function is induced from multi-labeled data (BoosTexter [Schapire et al., 2000], Rank-SVM [Elisseeff et al., 2002], large-margin [Crammer et al., 2003] ) Probabilistic generative models [McCallum 1999; Ueda et al., 2003; Sato et al., 2007]
8
Issues on Multi-Label Classification How to reduce training/running cost – The number of targets (i.e. subsets) is exponentially related to the size of the label set How to model correlation of labels – Binary classification cannot use features on multiple labels Classification vs. Ranking Hierarchical multi-label classification (ex. MeSH term) [Cesa-Bianchi et al. 2006; J. Rousu et al., 2006]
9
Collective Multi-Label Classification CRF is applied to multi-label classification Features are defined on pairs of labels Notation: – y i = 1 if i- th label ∈ y – y i = 0 otherwise
10
Accounting for Multiple Labels Binary Model: f b (x,y) : y i given x Collective Multi-Label (CML) model: f ml (x,y) : y i and y j Collective Multi-Label with Features (CMLF) model: f mlf (x,y) : y i and y j given x
11
Parameter Estimation Enumeration of y is intractable in general Two approximations: – Supported combinations: consider only the label combinations that occur in training data – Binary pruned inference: first apply binary model consider only the labels having probabilities above a threshold t No dynamic programming
12
Experiments Reuters-21578 Modified Apte (ModApte) split – 90 labels – Training: 9,603 docs, Test: 3,299 docs – 8.7% of the documents have multiple labels OHSUMED “Heart Disease” documents – 40 labels assigned to 15-74 training documents – 16 labels assigned to 75 or more training documents
13
Supported combinations Binary pruned Results: Reuters-21578 BinaryCMLCMLF macro-F1 0.43800.44780.4477 micro-F1 0.86270.86590.8635 exact match 0.79990.83290.8316 classification time 1.4 ms48 ms78 ms BinaryCMLCMLF macro-F1 0.43880.47920.4760 micro-F1 0.86340.86920.8701 exact match 0.80000.81190.8162 classification time 1.4 ms4.6 ms4.7 ms
14
Supported combinations Binary pruned Results: OHSUMED BinaryCMLCMLF macro-F1 0.64830.67950.6629 micro-F1 0.68490.70030.6983 exact match 0.49140.59250.6025 BinaryCMLCMLF macro-F1 0.64820.65560.6658 micro-F1 0.68490.67510.6886 exact match 0.49180.52260.5190
15
Similar Methods H. Kazawa et al. Maximal margin labeling for multi-topic text categorization. NIPS 2004. – All subsets are considered as atomic labels – Approximation by only considering neighbor subsets (subsets that differ in a single label from the gold) S. Zhu et al. Multi-labelled classification using maximum entropy method. SIGIR 2005. – Simply enumerate all subsets, and use f ml – Only evaluated with small label sets ( ≦ 10)
16
Discriminative Methods for Multi- Labeled Classification Cascade binary classifiers (SVM) Another technique: remove negative instances that are close to decision boundary |L||L| |L||L| 2 2 1 1 3 3 classifier for each label input text |L||L| |L||L| 2 2 1 1 3 3 +1 +1 +1 ensemble classifier
17
Random k-Labelsets Randomly select size- k subsets from 2 L Train multi-class classifiers for the subsets Label a new instance by majority voting YmYm YmYm Y2Y2 Y2Y2 Y1Y1 Y1Y1 Y3Y3 Y3Y3 classifiers for size- k subsets input text (1,0,0,1,0,…,0,0) (0,1,0,1,0,…,0,1) (1,0,0,0,1,…,0,1) (0,0,0,1,0,…,1,1) majority voting (0,0,0,1,0,…,0,1)
18
Other Approaches Learn a latent model to account for label correlations – K. Yu et al. Multi-label informed latent semantic indexing. SIGIR 2005. – J. Zhang et al. Learning multiple related tasks using latent independent component analysis. NIPS 2005. – V. Roth et al. Improved functional prediction of proteins by learning kernel combinations in multilabel settings. PMSB 2006. kNN-like algorithms – M-L Zhang et al. A k-nearest neighbor based algorithm for multi- label classification. IEEE Conference on Granular Computing. 2005. – F. Kang et al. Correlated label propagation with application to multi-label learning. CVPR 2006. – K. Brinker et al. Case-based multilabel ranking. IJCAI 2007.
19
Summary Multi-label classification is an important and interesting problem Major issues: – Label correlation – Computational cost A lot of methods have been proposed – Basically, enhancement of fundamental methods (subsets as atomic labels, set of binary classifications) No existing methods solve the problem completely
20
Future Directions Algorithm for exact solution? Other learning algorithms – Via machine learning template library Structurization of label sets – IS-A hierarchy → hierarchical multi-label – Exclusive labels Modeling of label distance – Redesign of objective functions
21
Possible Applications Any tasks of keyword assignments Substitute for n-best/ranking Multi-label problems where label sets are not fixed – Keyword (key phrase) extraction Choose words/phrases from each document – Summarization by sentence extraction cf. D. Xin et al. Extracting redundancy-aware top-k patterns. KDD 2006.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.