Download presentation
Presentation is loading. Please wait.
Published byPeregrine Chase Modified over 9 years ago
1
1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti Advisor:Dr Hsu Graduate:ching-wen Hong
2
2 Content 1.Motivation 1.Motivation 2.Objective 2.Objective 3.Introduction: (1).SVM (2).Using SVM to solve multi-class problems. (3).Present a method in this paper. 3.Introduction: (1).SVM (2).Using SVM to solve multi-class problems. (3).Present a method in this paper. 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm 5.Experimental evaluation 5.Experimental evaluation 6.Conclusion 6.Conclusion 7.Personal opinion 7.Personal opinion
3
3 Motivation Solve multi-class problems. Solve multi-class problems.
4
4 Objective SVM excel at two-class discrinative learning problems. The accuracy of SVM is high. SVM excel at two-class discrinative learning problems. The accuracy of SVM is high. SVM is difficult to solve multi-class problems. Because training time is long. SVM is difficult to solve multi-class problems. Because training time is long. The na ï ve Bayes(NB) classifier is much faster than SVM in training time. The na ï ve Bayes(NB) classifier is much faster than SVM in training time. We propose a new technique for multi-way classification which exploits the accuracy of SVM and the speed of NB classifiers. We propose a new technique for multi-way classification which exploits the accuracy of SVM and the speed of NB classifiers.
5
5 Introduction 1.SVM: 1.SVM: Input: a training set S= {( x 1,y 1 ), …, ( x N,y N )},x i is a vector, y i =1,-1 Input: a training set S= {( x 1,y 1 ), …, ( x N,y N )},x i is a vector, y i =1,-1 Output: a classifier f ( x ) =W . X+b Output: a classifier f ( x ) =W . X+b For example: Medical diagnosis For example: Medical diagnosis X i = ( age,sex,blood, …,genome, … ) X i = ( age,sex,blood, …,genome, … ) Y i indicates the risk of cancer. Y i indicates the risk of cancer.
6
6 1.Linear SVM
7
7 Linear SVM
8
8
9
9
10
10 Linear SVM
11
11 2.Using SVM to solve multi-class problems. 1. “ one-vs-others ” approach 1. “ one-vs-others ” approach For each of the N classes, We construct a one-others (yes/no) SVM for that class alone. For each of the N classes, We construct a one-others (yes/no) SVM for that class alone. The winning SVM is the one which says yes, and whose margin is largest among all SVMs. The winning SVM is the one which says yes, and whose margin is largest among all SVMs.
12
12 Using SVM to solve multi-class problems 2.Accumulated votes approach 2.Accumulated votes approach To construct SVMs between all possible pairs of classes. To construct SVMs between all possible pairs of classes. The winning class has the largest number of accumulated votes. The winning class has the largest number of accumulated votes.
13
13 3.Present a method in this paper. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs. The first stage :Using multi-class NB classifier to a confusion matrix. The first stage :Using multi-class NB classifier to a confusion matrix. The second stage :Using SVM with the “ one-vs-others ” approach. The second stage :Using SVM with the “ one-vs-others ” approach.
14
14 OUR APPROACH Confusion matrix: using NB and held-out validation dataset. Confusion matrix: using NB and held-out validation dataset.
15
15 OUR APPROACH
16
16 Hierarchical Approach Top-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusters of labels. Top-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusters of labels. Second-level(L2) we build multi-class SVMs within each cluster of classes. Second-level(L2) we build multi-class SVMs within each cluster of classes.
17
17 Evaluation of the hierarchical approach We compare four methods: We compare four methods: MCNB(one-vs-others) MCNB(one-vs-others) MCSVM(one-vs-others) MCSVM(one-vs-others) Hier-NB (L1:NB,L2:NB), Hier-NB (L1:NB,L2:NB), Hier-SVM (L1:NB,L2:SVM ) Hier-SVM (L1:NB,L2:SVM )
18
18 Evaluation of the hierarchical approach
19
19 Evaluation of the hierarchical approach
20
20 Evaluation of the hierarchical approach NB-L2( 89.01%),combining with the NB- L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%) NB-L2( 89.01%),combining with the NB- L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%) SVM-L2 with NB-L1(92.04%), Hier- SVM(86.12%),MCSVM(89.66%) SVM-L2 with NB-L1(92.04%), Hier- SVM(86.12%),MCSVM(89.66%) The main reason for the low accuracy of the hierarchical approaches is the compounding of errors at the two levels. The main reason for the low accuracy of the hierarchical approaches is the compounding of errors at the two levels. This led us to design a new algorithm GraphSVM. This led us to design a new algorithm GraphSVM.
21
21 The GraphSVM algorithm 1.The confusion matrix obtained by a fast multi-class NB classifier M1, 1.The confusion matrix obtained by a fast multi-class NB classifier M1, For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified. For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified. In Figure1, I=alt.atheism,t=3%,F(alt.atheism)={talk. religion.misc,soc.religion.christian}. In Figure1, I=alt.atheism,t=3%,F(alt.atheism)={talk. religion.misc,soc.religion.christian}. 2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}. 2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}.
22
22.Experimental evaluation 1.Datasets 1.Datasets 20-newsgroups:18828 news wire articles from 20 Usenet group.We randomly chose 70% of the documents for training and 30% for testing. 20-newsgroups:18828 news wire articles from 20 Usenet group.We randomly chose 70% of the documents for training and 30% for testing. Reuter-21578:135 classes,8819 training documents and 1887 test documents. Reuter-21578:135 classes,8819 training documents and 1887 test documents.
23
23 Overall comparison
24
24 Scalability with number of classes
25
25 Scalability with number of classes
26
26 Scalability with training set size
27
27 Effect of the threshold parameter
28
28 Conclusion GraphSVM is accurate and efficient in multi-classes problem. GraphSVM is accurate and efficient in multi-classes problem. GraphSVM outerforms SVMs w.r.t. training time and memory requirements. GraphSVM outerforms SVMs w.r.t. training time and memory requirements. GraphSVM is very simple to understand and requires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of classses and millions of instances). GraphSVM is very simple to understand and requires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of classses and millions of instances).
29
29 Personal opinion GraphSVM may be worse is high positive value of the threshold t. GraphSVM may be worse is high positive value of the threshold t. It is nice that the accurate of GraphSVM can not affected by the threshold t. It is nice that the accurate of GraphSVM can not affected by the threshold t.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.