Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant.

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant

Outline l The linear support vector machine (SVM) –Linear kernel l Generalized support vector machine (GSVM) –Nonlinear indefinite kernel l Linear Programming Formulation of GSVM –MINOS l Quadratic Programming Formulation of GSVM –Successive Overrelaxation (SOR) l Numerical comparisons l Conclusions

The Discrimination Problem The Fundamental 2-Category Linearly Separable Case Separating Surface: A+ A-

The Discrimination Problem The Fundamental 2-Category Linearly Separable Case l Given m points in the n dimensional space R n l Represented by an m x n matrix A l Membership of each point A i in the classes 1 or -1 is specified by: –An m x m diagonal matrix D with along its diagonal l Separate by two bounding planes: such that: l More succinctly: where e is a vector of ones.

Preliminary Attempt at the (Linear) Support Vector Machine: Robust Linear Programming l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.

The (Linear) Support Vector Machine Maximize Margin Between Separating Planes A+ A-

The (Linear) Support Vector Machine Formulation l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.

GSVM: Generalized Support Vector Machine Linear Programming Formulation l Linear Support Vector Machine (linear separating surface ) l By “duality”, set (linear separating surface ) l Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel. Nonlinear separating surface:

Examples of Kernels l Examples –Polynomial Kernel denotes componentwise exponentiation as in MATLAB –Radial Basis Kernel –Neural Network Kernel ` denotes the step function componentwise.

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in R 2 Separate 486 Asterisks from 514 Dots

Previous Work

Polynomial Kernel:

Large Margin Classifier (SOR) Reformulation in Space A+ A-

(SOR) Linear Support Vector Machine Quadratic Programming Formulation l Solve the following mathematical program: l The quadratic term here maximizes the distance between the bounding planes in the space

Introducing a Nonlinear Kernel l The Wolfe Dual for the SOR Linear SVM is: –Linear separating surface: l Substitute in a kernel for the AA’ term: –Linear separating surface:

SVM Optimality Conditions l Define l Then dual SVM becomes much simpler! l Gradient Projection necessary & sufficient optimality condition: l denotes projecting u onto the region

SOR Algorithm & Convergence l Above optimality conditions lead to the SOR algorithm: –Remember, optimality conditions are expressed as: l SOR Linear Convergence [Luo-Tseng 1993]: –The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem –The objective function values converge Q-linearly to

Numerical Testing l Comparison of Linear & Nonlinear Kernels using –Linear Programming –Quadratic Programming - SOR Formulations l Data Sets: –UCI Liver Disorders: 345 points in R 6 –Bell Labs Checkerboard: 1000 points in R 2 –Gaussian Synthetic: 1000 points in R 32 –SCDS Synthetic: 1 million points in R 32 –Massive Synthetic: 10 million points in R 32 l Machines: –Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM Total: 64 Processors, 8 Gig RAM

Comparison of Linear & Nonlinear SVMs Linear Programming Generated l Nonlinear kernels yield better training and testing set correctness

SOR Results l Examples of training on massive data: –1 million point dataset generated by SCDS generator: Trained completely in 9.7 hours Tuning set reached 99.7% of final accuracy in 0.3 hours –10 million point randomly generated dataset: Tuning set reached 95% of final accuracy in 14.3 hours Under 10,000 iterations l Comparison of linear and nonlinear kernels

Conclusions l Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs l Nonlinear separating surfaces improve generalization over linear ones l SOR can handle very large problems not (easily) solveable by other methods l SOR scales up with virtually no changes l Future directions –Parallel SOR for very large problems not resident in memory –Massive multicategory discrimination via SOR –Support vector regression

Questions?

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant.

Similar presentations

Presentation on theme: "Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant.

Similar presentations

Presentation on theme: "Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant."— Presentation transcript:

Similar presentations

About project

Feedback