Download presentation
Presentation is loading. Please wait.
Published byNoah Bailey Modified over 9 years ago
1
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant
2
Outline l The linear support vector machine (SVM) –Linear kernel l Generalized support vector machine (GSVM) –Nonlinear indefinite kernel l Linear Programming Formulation of GSVM –MINOS l Quadratic Programming Formulation of GSVM –Successive Overrelaxation (SOR) l Numerical comparisons l Conclusions
3
The Discrimination Problem The Fundamental 2-Category Linearly Separable Case Separating Surface: A+ A-
4
The Discrimination Problem The Fundamental 2-Category Linearly Separable Case l Given m points in the n dimensional space R n l Represented by an m x n matrix A l Membership of each point A i in the classes 1 or -1 is specified by: –An m x m diagonal matrix D with along its diagonal l Separate by two bounding planes: such that: l More succinctly: where e is a vector of ones.
5
Preliminary Attempt at the (Linear) Support Vector Machine: Robust Linear Programming l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.
6
The (Linear) Support Vector Machine Maximize Margin Between Separating Planes A+ A-
7
The (Linear) Support Vector Machine Formulation l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.
8
GSVM: Generalized Support Vector Machine Linear Programming Formulation l Linear Support Vector Machine (linear separating surface ) l By “duality”, set (linear separating surface ) l Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel. Nonlinear separating surface:
9
Examples of Kernels l Examples –Polynomial Kernel denotes componentwise exponentiation as in MATLAB –Radial Basis Kernel –Neural Network Kernel ` denotes the step function componentwise.
10
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in R 2 Separate 486 Asterisks from 514 Dots
11
Previous Work
12
Polynomial Kernel:
13
Large Margin Classifier (SOR) Reformulation in Space A+ A-
14
(SOR) Linear Support Vector Machine Quadratic Programming Formulation l Solve the following mathematical program: l The quadratic term here maximizes the distance between the bounding planes in the space
15
Introducing a Nonlinear Kernel l The Wolfe Dual for the SOR Linear SVM is: –Linear separating surface: l Substitute in a kernel for the AA’ term: –Linear separating surface:
16
SVM Optimality Conditions l Define l Then dual SVM becomes much simpler! l Gradient Projection necessary & sufficient optimality condition: l denotes projecting u onto the region
17
SOR Algorithm & Convergence l Above optimality conditions lead to the SOR algorithm: –Remember, optimality conditions are expressed as: l SOR Linear Convergence [Luo-Tseng 1993]: –The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem –The objective function values converge Q-linearly to
18
Numerical Testing l Comparison of Linear & Nonlinear Kernels using –Linear Programming –Quadratic Programming - SOR Formulations l Data Sets: –UCI Liver Disorders: 345 points in R 6 –Bell Labs Checkerboard: 1000 points in R 2 –Gaussian Synthetic: 1000 points in R 32 –SCDS Synthetic: 1 million points in R 32 –Massive Synthetic: 10 million points in R 32 l Machines: –Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM Total: 64 Processors, 8 Gig RAM
19
Comparison of Linear & Nonlinear SVMs Linear Programming Generated l Nonlinear kernels yield better training and testing set correctness
20
SOR Results l Examples of training on massive data: –1 million point dataset generated by SCDS generator: Trained completely in 9.7 hours Tuning set reached 99.7% of final accuracy in 0.3 hours –10 million point randomly generated dataset: Tuning set reached 95% of final accuracy in 14.3 hours Under 10,000 iterations l Comparison of linear and nonlinear kernels
21
Conclusions l Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs l Nonlinear separating surfaces improve generalization over linear ones l SOR can handle very large problems not (easily) solveable by other methods l SOR scales up with virtually no changes l Future directions –Parallel SOR for very large problems not resident in memory –Massive multicategory discrimination via SOR –Support vector regression
22
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.