Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant.

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Support Vector Machines

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Support Vector Machines and Kernel Methods

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.

Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Binary Classification Problem Learn a Classifier from the Training Set

Unconstrained Optimization Problem

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Active Set Support Vector Regression

Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Lecture 10: Support Vector Machines

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

Classification and Regression

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Mathematical Programming in Support Vector Machines

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Support Vector Machines for Data Fitting and Classification David R. Musicant with Olvi L. Mangasarian UW-Madison Data Mining Institute Annual Review June.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

An Introduction to Support Vector Machines (M. Law)

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.

Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

An Introduction to Support Vector Machine (SVM)

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

SVMs in a Nutshell.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

PREDICT 422: Practical Machine Learning

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Geometrical intuition behind the dual problem

Support Vector Machines Introduction to Data Mining, 2nd Edition by

University of Wisconsin - Madison

University of Wisconsin - Madison

Minimal Kernel Classifiers

Presentation transcript:

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Outline l The linear support vector machine (SVM) –Linear kernel l Generalized support vector machine (GSVM) –Nonlinear indefinite kernel l Linear Programming Formulation of GSVM –MINOS l Quadratic Programming Formulation of GSVM –Successive Overrelaxation (SOR) l Numerical comparisons l Conclusions

The Discrimination Problem The Fundamental 2-Category Linearly Separable Case Separating Surface: A+ A-

The Discrimination Problem The Fundamental 2-Category Linearly Separable Case l Given m points in the n dimensional space R n l Represented by an m x n matrix A l Membership of each point A i in the classes 1 or -1 is specified by: –An m x m diagonal matrix D with along its diagonal l Separate by two bounding planes: such that: l More succinctly: where e is a vector of ones.

Preliminary Attempt at the (Linear) Support Vector Machine: Robust Linear Programming l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.

The (Linear) Support Vector Machine Maximize Margin Between Separating Planes A+ A-

The (Linear) Support Vector Machine Formulation l Solve the following mathematical program: where y = nonnegative error (slack) vector l Note: y = 0 if convex hulls of A+ and A- do not intersect.

GSVM: Generalized Support Vector Machine Linear Programming Formulation l Linear Support Vector Machine (linear separating surface ) l By “duality”, set (linear separating surface ) l Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel. Nonlinear separating surface:

Examples of Kernels l Examples –Polynomial Kernel denotes componentwise exponentiation as in MATLAB –Radial Basis Kernel –Neural Network Kernel ` denotes the step function componentwise.

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in R 2 Separate 486 Asterisks from 514 Dots

Previous Work

Polynomial Kernel:

Large Margin Classifier (SOR) Reformulation in Space A+ A-

(SOR) Linear Support Vector Machine Quadratic Programming Formulation l Solve the following mathematical program: l The quadratic term here maximizes the distance between the bounding planes in the space

Introducing a Nonlinear Kernel l The Wolfe Dual for the SOR Linear SVM is: –Linear separating surface: l Substitute in a kernel for the AA’ term: –Linear separating surface:

SVM Optimality Conditions l Define l Then dual SVM becomes much simpler! l Gradient Projection necessary & sufficient optimality condition: l denotes projecting u onto the region

SOR Algorithm & Convergence l Above optimality conditions lead to the SOR algorithm: –Remember, optimality conditions are expressed as: l SOR Linear Convergence [Luo-Tseng 1993]: –The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem –The objective function values converge Q-linearly to

Numerical Testing l Comparison of Linear & Nonlinear Kernels using –Linear Programming –Quadratic Programming - SOR Formulations l Data Sets: –UCI Liver Disorders: 345 points in R 6 –Bell Labs Checkerboard: 1000 points in R 2 –Gaussian Synthetic: 1000 points in R 32 –SCDS Synthetic: 1 million points in R 32 –Massive Synthetic: 10 million points in R 32 l Machines: –Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM Total: 64 Processors, 8 Gig RAM

Comparison of Linear & Nonlinear SVMs Linear Programming Generated l Nonlinear kernels yield better training and testing set correctness

SOR Results l Examples of training on massive data: –1 million point dataset generated by SCDS generator: Trained completely in 9.7 hours Tuning set reached 99.7% of final accuracy in 0.3 hours –10 million point randomly generated dataset: Tuning set reached 95% of final accuracy in 14.3 hours Under 10,000 iterations l Comparison of linear and nonlinear kernels

Conclusions l Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs l Nonlinear separating surfaces improve generalization over linear ones l SOR can handle very large problems not (easily) solveable by other methods l SOR scales up with virtually no changes l Future directions –Parallel SOR for very large problems not resident in memory –Massive multicategory discrimination via SOR –Support vector regression

Questions?