1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti.

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Random Forest Predrag Radenković 3237/10

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

Machine learning continued Image source:

TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.

Transductive Reliability Estimation for Kernel Based Classifiers 1 Department of Computer Science, University of Ioannina, Greece 2 Faculty of Computer.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.

Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.

Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor ：

SVM by Sequential Minimal Optimization (SMO)

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien-Shing Chen Author: Tie-Yan.

Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.

Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

1 Classifying Lymphoma Dataset Using Multi-class Support Vector Machines INFS-795 Advanced Data Mining Prof. Domeniconi Presented by Hong Chai.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

A Language Independent Method for Question Classification COLING 2004.

1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor ： Dr.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Fast and accurate text classification via multiple linear discriminant projections Soumen Chakrabarti Shourya Roy Mahesh Soundalgekar IIT Bombay

Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

NTU & MSRA Ming-Feng Tsai

Musical Genre Categorization Using Support Vector Machines Shu Wang.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared ： S.Y.C. Neural Information Processing Systems,

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Article Filtering for Conflict Forecasting Benedict Lee and Cuong Than Comp 540 4/25/2006.

1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong.

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.

Experience Report: System Log Analysis for Anomaly Detection

Trees, bagging, boosting, and stacking

A New Support Vector Finder Method Based on Triangular Calculations

Basic machine learning background with Python scikit-learn

Discriminative Frequent Pattern Analysis for Effective Classification

Concave Minimization for Support Vector Machine Classifiers

Presentation transcript:

1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti Advisor:Dr Hsu Graduate:ching-wen Hong

2 Content 1.Motivation 1.Motivation 2.Objective 2.Objective 3.Introduction: (1).SVM (2).Using SVM to solve multi-class problems. (3).Present a method in this paper. 3.Introduction: (1).SVM (2).Using SVM to solve multi-class problems. (3).Present a method in this paper. 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm 4.OUR APPROACH (1).Hierarchical Approach (2).The GraphSVM algorithm 5.Experimental evaluation 5.Experimental evaluation 6.Conclusion 6.Conclusion 7.Personal opinion 7.Personal opinion

3 Motivation Solve multi-class problems. Solve multi-class problems.

4 Objective SVM excel at two-class discrinative learning problems. The accuracy of SVM is high. SVM excel at two-class discrinative learning problems. The accuracy of SVM is high. SVM is difficult to solve multi-class problems. Because training time is long. SVM is difficult to solve multi-class problems. Because training time is long. The na ï ve Bayes(NB) classifier is much faster than SVM in training time. The na ï ve Bayes(NB) classifier is much faster than SVM in training time. We propose a new technique for multi-way classification which exploits the accuracy of SVM and the speed of NB classifiers. We propose a new technique for multi-way classification which exploits the accuracy of SVM and the speed of NB classifiers.

5 Introduction 1.SVM: 1.SVM: Input: a training set S= ｛（ x 1,y 1 ）, …, （ x N,y N ）｝,x i is a vector, y i =1,-1 Input: a training set S= ｛（ x 1,y 1 ）, …, （ x N,y N ）｝,x i is a vector, y i =1,-1 Output: a classifier f （ x ） =W ． X+b Output: a classifier f （ x ） =W ． X+b For example: Medical diagnosis For example: Medical diagnosis X i = （ age,sex,blood, …,genome, … ） X i = （ age,sex,blood, …,genome, … ） Y i indicates the risk of cancer. Y i indicates the risk of cancer.

6 1.Linear SVM

7 Linear SVM

8

9

10 Linear SVM

11 2.Using SVM to solve multi-class problems. 1. “ one-vs-others ” approach 1. “ one-vs-others ” approach For each of the N classes, We construct a one-others (yes/no) SVM for that class alone. For each of the N classes, We construct a one-others (yes/no) SVM for that class alone. The winning SVM is the one which says yes, and whose margin is largest among all SVMs. The winning SVM is the one which says yes, and whose margin is largest among all SVMs.

12 Using SVM to solve multi-class problems 2.Accumulated votes approach 2.Accumulated votes approach To construct SVMs between all possible pairs of classes. To construct SVMs between all possible pairs of classes. The winning class has the largest number of accumulated votes. The winning class has the largest number of accumulated votes.

13 3.Present a method in this paper. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs. 1.Using scalability of NB classifiers w.r.t. number of classes and accuracy of SVMs. The first stage :Using multi-class NB classifier to a confusion matrix. The first stage :Using multi-class NB classifier to a confusion matrix. The second stage :Using SVM with the “ one-vs-others ” approach. The second stage :Using SVM with the “ one-vs-others ” approach.

14 OUR APPROACH Confusion matrix: using NB and held-out validation dataset. Confusion matrix: using NB and held-out validation dataset.

15 OUR APPROACH

16 Hierarchical Approach Top-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusters of labels. Top-level( L1) classifier(NB or SVM) to discriminate amongst the top-level clusters of labels. Second-level(L2) we build multi-class SVMs within each cluster of classes. Second-level(L2) we build multi-class SVMs within each cluster of classes.

17 Evaluation of the hierarchical approach We compare four methods: We compare four methods: MCNB(one-vs-others) MCNB(one-vs-others) MCSVM(one-vs-others) MCSVM(one-vs-others) Hier-NB (L1:NB,L2:NB), Hier-NB (L1:NB,L2:NB), Hier-SVM (L1:NB,L2:SVM ) Hier-SVM (L1:NB,L2:SVM )

18 Evaluation of the hierarchical approach

19 Evaluation of the hierarchical approach

20 Evaluation of the hierarchical approach NB-L2( 89.01%),combining with the NB- L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%) NB-L2( 89.01%),combining with the NB- L1 (93.56%),Hier-NB (83.28%),MCNB (85.27%) SVM-L2 with NB-L1(92.04%), Hier- SVM(86.12%),MCSVM(89.66%) SVM-L2 with NB-L1(92.04%), Hier- SVM(86.12%),MCSVM(89.66%) The main reason for the low accuracy of the hierarchical approaches is the compounding of errors at the two levels. The main reason for the low accuracy of the hierarchical approaches is the compounding of errors at the two levels. This led us to design a new algorithm GraphSVM. This led us to design a new algorithm GraphSVM.

21 The GraphSVM algorithm 1.The confusion matrix obtained by a fast multi-class NB classifier M1, 1.The confusion matrix obtained by a fast multi-class NB classifier M1, For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified. For each class i,F(i)={mis-classified as class i },a threshold t% mis-classified. In Figure1, I=alt.atheism,t=3%,F(alt.atheism)={talk. religion.misc,soc.religion.christian}. In Figure1, I=alt.atheism,t=3%,F(alt.atheism)={talk. religion.misc,soc.religion.christian}. 2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}. 2.Train a multi-class classifier M2(i) to distinguish among the class{i}U F{i}.

22.Experimental evaluation 1.Datasets 1.Datasets 20-newsgroups:18828 news wire articles from 20 Usenet group.We randomly chose 70% of the documents for training and 30% for testing. 20-newsgroups:18828 news wire articles from 20 Usenet group.We randomly chose 70% of the documents for training and 30% for testing. Reuter-21578:135 classes,8819 training documents and 1887 test documents. Reuter-21578:135 classes,8819 training documents and 1887 test documents.

23 Overall comparison

24 Scalability with number of classes

25 Scalability with number of classes

26 Scalability with training set size

27 Effect of the threshold parameter

28 Conclusion GraphSVM is accurate and efficient in multi-classes problem. GraphSVM is accurate and efficient in multi-classes problem. GraphSVM outerforms SVMs w.r.t. training time and memory requirements. GraphSVM outerforms SVMs w.r.t. training time and memory requirements. GraphSVM is very simple to understand and requires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of classses and millions of instances). GraphSVM is very simple to understand and requires negligible coding,but it is useful to deal with very large classifiers(ten of thousands of classses and millions of instances).

29 Personal opinion GraphSVM may be worse is high positive value of the threshold t. GraphSVM may be worse is high positive value of the threshold t. It is nice that the accurate of GraphSVM can not affected by the threshold t. It is nice that the accurate of GraphSVM can not affected by the threshold t.