Download presentation
Presentation is loading. Please wait.
Published byNatalia Lillard Modified over 9 years ago
1
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center Presenter: Xiatian Zhang xiatianz@cn.ibm.com Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao, Wei Fan, Wentao Zheng, Zhong Wang
2
Multi-label Classification Classical Classification (Single Label Classification) –The classes are exclusive: if an example belongs to one class, it can’t be belongs to others Multi-label Classification –A picture, video, article may belong to several compatible categories –A pieces of gene can control several biological functions Tree LakeIce Winter Park
3
Existed Multi-label Classification Methods Grigorios Tsoumakas et al[2007] summarize the existing methods for ML-Classification Two Strategies –Problem Transformation –Transfer Multi-label Classification Problem to Single Classification Problem –Algorithm Adaptation –Adapt Single-label Classifiers to Solve the Multi-label Classification Problem –With high complexity
4
Problem Transformation Approaches Label Powerset (LP) –Label Powerset considers each unique subset of labels that exists in the multi-label dataset as a single label L1L2L3 L1 L2 L3 Binary Relevance (BR) –Binary Relevance learns one binary classier for each label L1 L2 L3 L4 Classifier L1+L2+L3+ L1-L2-L3- Classifier1Classifier2Classifier3
5
Large Number of Labels Problem Hundreds and even more labels –Text categorization –protein function classification –semantic annotation of multimedia The Impacts to Multi-label Classification Methods –Label Powerset: the number of training examples for each particular label will be much less –Binary Relevance: The computational complexity is with linear complexity with respect to the number of labels –Algorithm Adaptation: Even more worse than Binary Relevance
6
HOMER for Large Number of Labels Problem HOMER (Hierarchy Of Multilabel classifERs) is developed by Grigorios Tsoumakas et al, 2008. The HOMER algorithm constructs a Hierarchy Of Mul-tilabel classifERs, each one dealing with a much smaller set of labels.
7
Our Method – Without Label Cost Without Label Cost –Training Time is almost irrelevant with number of labels |L| But with Reliable Quality –The classification Quality can be compared to mainstream methods over different data sets. How to make it?
8
Our Method – Without Label Cost cont. Binary Relevance Method based on Random Decision Tree Random Decision Tree [Fan et al, 2003] –Training Process is irrelevant with label information –Random Construction with very low cost –Stable quality on many applications
9
Random Decision Tree – Tree Construction At each node, an un-used feature is chosen randomly –A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. –A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen It stop when one of the following happens: –A node becomes too small (<= 4 examples). –Or the total height of the tree exceeds some limits: –Such as the total number of features. The construction process is irrelevant with label information
10
Random Decision Tree - Node Statistics Classification and Probability Estimation: –Each node of the tree keeps the number of examples belonging to each class. The node statistics process cost a little computation resource F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …
11
Random Decision Tree - Classification During classification, each tree outputs posterior probability: P(+|x)=30/100 =0.3 F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …
12
Random Decision Tree - Ensemble For a instance x, average the estimated probability on each tree and take the average probability as the predicted probability of x. P’(+|x)=30/50 =0.6 P(+|x)=30/100=0.3 (P(+|x)+P’(+|x))/2 = 0.45 F3>0.3 F2<0.6 F1>0.7 +:100 -:120 +:30 -: 20 Y Y N N N … F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …
13
Multi-label Random Decision Tree F1<0.5 F2>0.7 F3>0.3 Y Y N N N … L1+:30 L1-: 70 L2+:50 L2-: 50 L1+:200 L1-: 10 L2+:40 L2-: 60 F3>0.5 F2<0.7 F1>0.7 Y Y N N N … L1+:30 L1-: 20 L2+:20 L2-: 80 L1+:100 L1-:120 L1+:200 L1-: 10 P(L1+|x)=30/100=0.3P’(L1+|x)=30/50 =0.6 P(L2+|x)=50/100=0.5P’(L2+|x)=20/100=0.2 (P(L1+|x)+P’(L1+|x))/2 = 0.45 (P(L2+|x)+P’(L2+|x))/2 = 0.35
14
Why RDT Works? Ensemble Learning View –Our Analysis –Other Explanations Non-Parametric Estimation
15
Complexity of Multi-label Random Decision Tree Training Complexity: –m is the number of trees, and n is the number of instances –t is the average number of labels on each leaf nodes, t<<n, and t<<|L|. –It is irrelevant with number of labels |L|. –Complexity of C4.5: V i is the size of values of i-th attribute. –Complexity of HOMER: Test Complexity: –q is the average depth of branches of trees –It is also irrelevant with number of labels |L|
16
Experiment – Metrics and Datasets Quality Metrics: Datasets:
17
Experiment - Quality
18
Experiment – Computational Cost
19
Experiment – Computational Cost cont.
20
Experiment – Computational Cost cont
21
Future Works Leverage the relationship of labels. Apply ML-RDT for Recommendation Parallelization and Streaming Implementation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.