RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University of Wisconsin-Madison

Outline of Talk  The smooth support vector machine (SSVM)  Difficulties with nonlinear SVMs:  Computational: Handling massive kernel matrix:  Storage: Separating surface depends on almost entire dataset  Reduced Support Vector Machines (RSVMs)  Reduced kernel : Much smaller rectangular matrix  : 1% to 10% of  Numerical Results  What is a support vector machine (SVM) classifier?  A new SVM solvable without an optimization package  e.g. 32,562-point dataset classified in 17 minutes compared to 2.15 hours by a standard algorithm (SMO)  Speeds computation & reduces storage

What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

Support Vector Machines Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

SVM as an Unconstrained Minimization Problem At the solution of (QP) : where, Hence (QP) is equivalent to the nonsmooth SVM: min s. t. (QP)

SSVM: The Smooth Support Vector Machine  Replacing the plus function in the nonsmooth SVM by the smooth, gives our SSVM: nonsmooth SVM as goes to infinity.  The solution of SSVM converges to the solution of ( Typically, ) min, obtained by integrating the sigmoid functionof Here, is an accurate smooth approximation of neural networks. (sigmoid = smoothed step)

Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface:  Use a nonlinear kernel in SSVM: min  The kernel matrix is fully dense  Use Newton algorithm to solve the problem  Each iteration solves m+1 linear equations in m+1 variables  Nonlinear separating surface depends on entire dataset :

Examples of Kernels is an integer:  Polynomial Kernel : ) (Linear Kernel :  Gaussian (Radial Basis) Kernel :

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Long CPU time to compute numbers  Computational complexity depends on  Separating surface depends on almost entire dataset  Need to store the entire dataset after solving the problem  Complexity of nonlinear SSVM  Runs out of memory while storing kernel matrix

Overcoming Computational & Storage Difficulties Use a Rectangular Kernel  Choose a small random sample of  The small random sample is a representative sample of the entire dataset  Typically is 1% to 10% of the rows of  Replace by with corresponding in nonlinear SSVM the rectangular kernel  Only need to compute and store numbers for  Computational complexity reduces to  The nonlinear separator only depends on Using gives lousy results!

Reduced Support Vector Machine Algorithm Nonlinear Separating Surface: (i) Choose a random subset matrix of entire data matrix (ii) Solve the following problem by the Newton method with corresponding : min (iii) The separating surface is defined by the optimal solution in step (ii):

How to Choose in RSVM?  is a representative sample of the entire dataset  Need not be a subset of  A good selection of may generate a classifier using very small  Possible ways to choose :  Choose random rows from the entire dataset  Choose such that the distance between its rows exceeds a certain tolerance  Use k cluster centers of as and

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%

CPU Times on UCI Adult Dataset RSVM, SMO and PCGC with a Gaussian Kernel Adult Dataset : Training Set Size vs. CPU Time in Seconds Size 31854781641411221161012269732562 RSVM 44.283.6123.4227.8342.5587.4980.2 SMO 66.2146.6258.8781.41784.44126.47749.6 PCGC 380.51137.22530.611910.6 Ran out of memory

Time( CPU sec. ) Training Set Size CPU Time Comparison on UCI Dataset RSVM, SMO and PCGC with a Gaussian Kernel

Conclusion  RSVM : An effective classifier for large datasets  Classifier uses 10% or less of dataset  Can handle massive datasets  Much faster than other algorithms  Test set correctness:  Applicable to all nonlinear kernel problems Same or better than full dataset Much better than randomly chosen subset  Rectangular kernel :  Novel practical idea

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Similar presentations

Presentation on theme: "RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Similar presentations

Presentation on theme: "RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University."— Presentation transcript:

Similar presentations

About project

Feedback