Download presentation
Presentation is loading. Please wait.
1
University of Wisconsin - Madison
Reduced Data Classifiers via Support Vector Machines SIAM International Conference on Data Mining Chicago April 5-7, 2001 O. L. Mangasarian & Y. J. Lee Data Mining Institute University of Wisconsin - Madison Second Annual Review June 1, 2001
2
Key Objective Given a classification problem with points
Storage required is of order Computational time is of order How to best reduce ?
3
Outline of Talk What is a support vector machine (SVM)?
What is a smooth support vector machine (SSVM)? An SVM solvable without optimization software (LP,QP) Difficulties with nonlinear SVM classifiers: Computational: Handling massive kernel matrix: Storage: Classifier depends on almost entire dataset Reduced Support Vector Machines (RSVMs) Reduced kernel : Much smaller rectangular matrix : 1% to 10% of Speeds computation & reduces storage Numerical Results e.g. 32,562-point dataset: RSVM 8 times faster than SMO
4
What is a Support Vector Machine?
An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space Implicitly defined by a kernel function
5
What are Support Vector Machines Used For?
Classification Regression & Data Fitting Supervised & Unsupervised Learning (Will concentrate on classification)
6
Geometry of the Classification Problem 2-Category Linearly Separable Case
7
Support Vector Machines Algebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries Separate by two bounding planes, More succinctly: where e is a vector of ones.
8
Support Vector Machines Maximizing the Margin between Bounding Planes
9
Support Vector Machine Formulation
Solve the quadratic program for some : min s. t. (QP) , , denotes where or membership. Margin is maximized by minimizing
10
SVM as an Unconstrained Minimization Problem
(QP) At the solution of (QP) : where , Hence (QP) is equivalent to the nonsmooth SVM: min
11
Smoothing the Plus Function: Integrate the Sigmoid Function
12
SSVM: The Smooth Support Vector Machine
Replacing the plus function in the nonsmooth SVM by the smooth , gives our SSVM: min , obtained by integrating the sigmoid function of Here, is an accurate smooth approximation of neural networks. (sigmoid = smoothed step) nonsmooth SVM as goes to infinity. The solution of SSVM converges to the solution of (Typically, )
13
Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface: (Instead of Linear Surface:
Use a nonlinear kernel in SSVM: min The kernel matrix is fully dense Use Newton algorithm to solve the problem Each iteration solves m+1 linear equations in m+1 variables Nonlinear separating surface depends on entire dataset :
14
Examples of Kernels is an integer: : Polynomial Kernel (Linear Kernel
) (Linear Kernel Gaussian (Radial Basis) Kernel :
15
Difficulties with Nonlinear SVM for Large Problems
The nonlinear kernel is fully dense Long CPU time to compute numbers Runs out of memory while storing kernel matrix Computational complexity depends on Complexity of nonlinear SSVM Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem
16
Choose a small random sample
Overcoming Computational & Storage Difficulties Use a Rectangular Kernel Choose a small random sample of The small random sample is a representative sample of the entire dataset Typically is 1% to 10% of the rows of Replace by with corresponding in nonlinear SSVM the rectangular kernel Only need to compute and store numbers for Computational complexity reduces to The nonlinear separator only depends on Using gives lousy results!
17
Reduced Support Vector Machine Algorithm Nonlinear Separating Surface:
(i) Choose a random subset matrix of entire data matrix (ii) Solve the following problem by the Newton method with corresponding : min (iii) The separating surface is defined by the optimal solution in step (ii):
18
How to Choose in RSVM? is a representative sample of the entire dataset Need not be a subset of A good selection of may generate a classifier using very small Possible ways to choose : Choose random rows from the entire dataset Choose such that the distance between its rows exceeds a certain tolerance Use k cluster centers of as and
19
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots
20
Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000
21
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
22
RSVM on Moderately Sized Problems (Best Test Set Correctness %, CPU seconds)
Cleveland Heart 297 x 13, 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6 , 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50
23
RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0
Average Correctness % & Standard Deviation, 50 Runs (6414, ) 84.47 0.001 77.03 0.014 % (11221, 21341) 84.71 75.96 0.016 % (16101, 16461) 84.90 75.45 0.017 % (22697, ) 85.31 76.73 0.018 % (32562, 16282) 85.07 76.95 0.013 %
24
CPU Times on UCI Adult Dataset RSVM, SMO and PCGC with a Gaussian Kernel
Adult Dataset : CPU Seconds for Various Dataset Sizes Size 3185 4781 6414 11221 16101 22697 32562 RSVM 44.2 83.6 123.4 227.8 342.5 587.4 980.2 SMO (Platt) 66.2 146.6 258.8 781.4 1784.4 4126.4 7749.6 PCGC (Burges) 380.5 1137.2 2530.6 Ran out of memory
25
CPU Time Comparison on UCI Dataset RSVM, SMO and PCGC with a Gaussian Kernel
Time( CPU sec. ) Training Set Size
26
Conclusion RSVM : An effective classifier for large datasets
Classifier uses 10% or less of dataset Can handle massive datasets Much faster than other algorithms Test set correctness: Same or better than full dataset Much better than randomly chosen subset Rectangular kernel : Novel practical idea Applicable to all nonlinear kernel problems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.