University of Wisconsin - Madison

University of Wisconsin - Madison
Reduced Data Classifiers via Support Vector Machines SIAM International Conference on Data Mining Chicago April 5-7, 2001 O. L. Mangasarian & Y. J. Lee Data Mining Institute University of Wisconsin - Madison Second Annual Review June 1, 2001

Key Objective Given a classification problem with points
Storage required is of order Computational time is of order How to best reduce ?

Outline of Talk What is a support vector machine (SVM)?
What is a smooth support vector machine (SSVM)? An SVM solvable without optimization software (LP,QP) Difficulties with nonlinear SVM classifiers: Computational: Handling massive kernel matrix: Storage: Classifier depends on almost entire dataset Reduced Support Vector Machines (RSVMs) Reduced kernel : Much smaller rectangular matrix : 1% to 10% of Speeds computation & reduces storage Numerical Results e.g. 32,562-point dataset: RSVM 8 times faster than SMO

What is a Support Vector Machine?
An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space Implicitly defined by a kernel function

What are Support Vector Machines Used For?
Classification Regression & Data Fitting Supervised & Unsupervised Learning (Will concentrate on classification)

Geometry of the Classification Problem 2-Category Linearly Separable Case

Support Vector Machines Algebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries Separate by two bounding planes, More succinctly: where e is a vector of ones.

Support Vector Machines Maximizing the Margin between Bounding Planes

Support Vector Machine Formulation
Solve the quadratic program for some : min s. t. (QP) , , denotes where or membership. Margin is maximized by minimizing

SVM as an Unconstrained Minimization Problem
(QP) At the solution of (QP) : where , Hence (QP) is equivalent to the nonsmooth SVM: min

Smoothing the Plus Function: Integrate the Sigmoid Function

SSVM: The Smooth Support Vector Machine
Replacing the plus function in the nonsmooth SVM by the smooth , gives our SSVM: min , obtained by integrating the sigmoid function of Here, is an accurate smooth approximation of neural networks. (sigmoid = smoothed step) nonsmooth SVM as goes to infinity. The solution of SSVM converges to the solution of (Typically, )

Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface: (Instead of Linear Surface:
Use a nonlinear kernel in SSVM: min The kernel matrix is fully dense Use Newton algorithm to solve the problem Each iteration solves m+1 linear equations in m+1 variables Nonlinear separating surface depends on entire dataset :

Examples of Kernels is an integer: : Polynomial Kernel (Linear Kernel
) (Linear Kernel Gaussian (Radial Basis) Kernel :

Difficulties with Nonlinear SVM for Large Problems
The nonlinear kernel is fully dense Long CPU time to compute numbers Runs out of memory while storing kernel matrix Computational complexity depends on Complexity of nonlinear SSVM Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem

Choose a small random sample
Overcoming Computational & Storage Difficulties Use a Rectangular Kernel Choose a small random sample of The small random sample is a representative sample of the entire dataset Typically is 1% to 10% of the rows of Replace by with corresponding in nonlinear SSVM the rectangular kernel Only need to compute and store numbers for Computational complexity reduces to The nonlinear separator only depends on Using gives lousy results!

Reduced Support Vector Machine Algorithm Nonlinear Separating Surface:
(i) Choose a random subset matrix of entire data matrix (ii) Solve the following problem by the Newton method with corresponding : min (iii) The separating surface is defined by the optimal solution in step (ii):

How to Choose in RSVM? is a representative sample of the entire dataset Need not be a subset of A good selection of may generate a classifier using very small Possible ways to choose : Choose random rows from the entire dataset Choose such that the distance between its rows exceeds a certain tolerance Use k cluster centers of as and

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderately Sized Problems (Best Test Set Correctness %, CPU seconds)
Cleveland Heart 297 x 13, 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6 , 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0
Average Correctness % & Standard Deviation, 50 Runs (6414, ) 84.47 0.001 77.03 0.014 % (11221, 21341) 84.71 75.96 0.016 % (16101, 16461) 84.90 75.45 0.017 % (22697, ) 85.31 76.73 0.018 % (32562, 16282) 85.07 76.95 0.013 %

CPU Times on UCI Adult Dataset RSVM, SMO and PCGC with a Gaussian Kernel
Adult Dataset : CPU Seconds for Various Dataset Sizes Size 3185 4781 6414 11221 16101 22697 32562 RSVM 44.2 83.6 123.4 227.8 342.5 587.4 980.2 SMO (Platt) 66.2 146.6 258.8 781.4 1784.4 4126.4 7749.6 PCGC (Burges) 380.5 1137.2 2530.6 Ran out of memory

CPU Time Comparison on UCI Dataset RSVM, SMO and PCGC with a Gaussian Kernel
Time( CPU sec. ) Training Set Size

Conclusion RSVM : An effective classifier for large datasets
Classifier uses 10% or less of dataset Can handle massive datasets Much faster than other algorithms Test set correctness: Same or better than full dataset Much better than randomly chosen subset Rectangular kernel : Novel practical idea Applicable to all nonlinear kernel problems

University of Wisconsin - Madison

Similar presentations

Presentation on theme: "University of Wisconsin - Madison"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Wisconsin - Madison

Similar presentations

Presentation on theme: "University of Wisconsin - Madison"— Presentation transcript:

Similar presentations

About project

Feedback