Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Reduced Support Vector Machine

Active Learning with Support Vector Machines

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Support Vector Machines Kernel Machines

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Binary Classification Problem Learn a Classifier from the Training Set

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Mathematical Programming in Support Vector Machines

An Introduction to Support Vector Machines Martin Law.

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

Efficient Model Selection for Support Vector Machines

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Support Vector Machine (SVM) Based on Nello Cristianini presentation

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

An Introduction to Support Vector Machines (M. Law)

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.

Support Vector Machines Tao Department of computer science University of Illinois.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

PREDICT 422: Practical Machine Learning

Geometrical intuition behind the dual problem

An Introduction to Support Vector Machines

The following slides are taken from:

Concave Minimization for Support Vector Machine Classifiers

University of Wisconsin - Madison

University of Wisconsin - Madison

Minimal Kernel Classifiers

Presentation transcript:

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute of Statistical Science Academia Sinica 2003 International Conference on Informatics, Cybernetics, and Systems ISU, Kaohsiung, Dec

Outline  Difficulties with nonlinear SVMs for large problems  Storage and computational complexity  Reduced Support Vector Machines  Support Vector Machines for classification problems  Linear and nonlinear SVMs  Incremental Reduced Support Vector Machines  Numerical Results  Conclusions

Support Vector Machines (SVMs) Powerful tools for Data Mining  SVMs have a sound theoretical foundation  Base on statistical learning theory  SVMs can be generated very efficiently and have high accuracy  SVMs have an optimal defined separating surface algorithm for classification and regression  SVMs become the most promising learning  SVMs can be extend from linear to nonlinear case  By using kernel functions

Support Vector Machines for Classification Maximizing the Margin between Bounding Planes A+ A-

Support Vector Machine Formulation  Solve the quadratic program for some ： min s. t. (QP) ，, denotes where or membership.  SSVM ： Smooth Support Vector Machine is an efficient SVM algorithm proposed by Yuh-Jye Lee

Nonlinear Support Vector Machine  Extend to nonlinear cases by using kernel functions min s. t.  Nonlinear Support Vector Machine formulation:  The value of kernel function represents the inner product in the feature space  Map data from input space to a higher dimensional feature space where the data can be separated linearly

Difficulties with Nonlinear SVM for Large Problems  Separating surface depends on almost entire dataset  Need to store the entire dataset after solving the problem  The nonlinear kernel is fully dense  Long CPU time to compute numbers  Runs out of memory while storing kernel matrix  Computational complexity depends on  Complexity of nonlinear SSVM

Reduced Support Vector Machines Overcoming Computational & Storage Difficulties by Using a Rectangular Kernel  Choose a small random sample of  The small random sample is a representative sample of the entire dataset  Typically is 1% to 10% of the rows of  Replace by with corresponding in nonlinear SSVM the rectangular kernel  Only need to compute and store numbers for  Computational complexity reduces to  The nonlinear separator only depends on

Reduced Set plays the most important role in RSVM  It is natural to raise two questions:  Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance?  Is there a mechanism that determines the size of reduced set automatically or dynamically?  Incremental reduced support vector machine is proposed to answer these questions

Our Observations ( Ⅰ ) is a linear combination of a set of kernel functions  If the kernel functions are very similar, the hypothesis space spanned by this kernel functions will be very limited.  The nonlinear separating surface

Our Observations ( Ⅱ )  Start with a very small reduced set, then add new data point only when the kernel function is dissimilar to the current function set  These points contribute the most extra information

 The distance from the kernel vector to the column space of is greater than a threshold  The information criterion is  This distance can be determined by solving a least squares problem How to measure the dissimilar? solving least squares problems

Dissimilar Measurement solving least squares problems  It has a unique solution, and the distance is

IRSVM Algorithm pseudo-code (sequential version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set 4 Computes its kernel vector 5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix 7 If its distance exceed a certain threshold 8 Add this point into the reduced set and form the new reduced kernal matrix 9 Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

Speed up IRSVM  Note we have to solve the least squares problem many times whose time complixity is  The main cost depends on but not on  Take advantage of this fact, we proposed a batch version of IRSVM that examines a batch points once

IRSVM Algorithm pseudo-code (Batch version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For a batch data point not in the reduced set 4 Computes their kernel vectors 5 Computes the corresponding distances from these kernel vector 6 to the column space of the current reduced kernel matrix 7 For those points’ distance exceed a certain threshold 8 Add those point into the reduced set and form the new reduced kernal matrix 9 Until no data points in a batch were added in line 7,8 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

IRSVM on four public data sets

Conclusions  IRSVM — an advanced algorithm of RSVM  Start with extremely small reduced set and sequentially expands to include informative data points into the reduced set  Determine the size of the reduced set automatically and dynamically but no pre-specified  The reduced set generated by IRSVM will be more representative  All advantages of RSVM for dealing with large scale nonlinear classification problem are retained  Experimental tests show that IRSVM used a smaller reduced set without scarifying classification accuracy