Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Reduced Support Vector Machine
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support vector machines
PREDICT 422: Practical Machine Learning
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Computer Sciences Dept. University of Wisconsin - Madison
Support vector machines
Support vector machines
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison

Key Contributions  Fast new support vector machine classifier  An order of magnitude faster than standard classifiers  Extremely simple to implement  4 lines of MATLAB code  NO optimization packages (LP,QP) needed

Outline of Talk  (Standard) Support vector machine (SVM) classifiers  Proximal support vector machines (PSVM) classifiers  Geometric motivation  Linear PSVM classifier  Nonlinear PSVM classifier  Full and reduced kernels  Numerical results  Correctness comparable to standard SVM  Much faster classification!  2-million points in 10-space in 21 seconds  Compared to over 10 minutes for standard SVM

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

Proximal Vector Machines Fitting the Data using two parallel Bounding Planes A+ A-

SVM as an Unconstrained Minimization Problem At the solution of (QP) : where, Hence (QP) is equivalent to : min s. t. (QP) Changing to 2-norm and measuring margin in ( ) space:

PSVM Formulation We have from the QP SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min

Advantages of New Formulation  Objective function remains strongly convex  An explicit exact solution can be written in terms of the problem data  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space  Exact leave-one-out-correctness can be obtained in terms of problem data

Linear PSVM We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier

Linear PSVM Solution Here,  The linear system to solve depends on: which is of the size  is usually much smaller than

Linear Proximal SVM Algorithm Classifier: Input Define Solve Calculate

Nonlinear PSVM Formulation By QP “duality”,. Maximizing the margin in the “dual space”, gives: min  Replace by a nonlinear kernel : min  Linear PSVM: (Linear separating surface: ) (QP) min s. t.

The Nonlinear Classifier  Gaussian (Radial Basis) Kernel :  Polynomial Kernel :  The nonlinear classifier:  Where K is a nonlinear kernel, e.g.:

Nonlinear PSVM Defining slightly different:  Similar to the linear case, setting the gradient equal to zero, we obtain: However, reduced kernels techniques can be used (RSVM) to reduce dimensionality.  Here, the linear system to solve is of the size

Linear Proximal SVM Algorithm Input Solve Calculate Non Define Classifier:

PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Linear PSVM Comparisons with Other SVMs Much Faster, Comparable Correctness Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) SVM Ten-fold test % Time (sec.) WPBC (60 mo.) 110 x Ionosphere 351 x Cleveland Heart 297 x Pima Indians 768 x BUPA Liver 345 x Galaxy Dim 4192 x

Linear PSVM Comparisons on Larger Adult Dataset Much Faster & Comparable Correctness Dataset SizeTesting correctness % Running time Sec. (Best in Red) (Train,Test) Attributes=123 PSVMLSVMSSVMSORSMOSVM (11221,21341) (16101,16461) (22697,9865) (32562,16282)

Linear PSVM vs LSVM 2-Million Dataset Over 30 Times Faster DatasetMethodTraining Correctness % Testing Correctness % Time Sec. NDC “Easy” LSVM PSVM NDC “Hard” LSVM PSVM

Nonlinear PSVM: Spiral Dataset 94 Red Dots & 94 White Dots

Nonlinear PSVM Comparisons Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) LSVM Ten-fold test % Time (sec.) Ionosphere 351 x BUPA Liver 345 x Tic-Tac-Toe 958 x Mushroom * 8124 x * A rectangular kernel was used of size 8124 x 215

Conclusion  PSVM is an extremely simple procedure for generating linear and nonlinear classifiers  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier  Comparable test set correctness to standard SVM  Much faster than standard SVMs : typically an order of magnitude less.

Future Work  Extension of PSVM to multicategory classification  Massive data classification using an incremental PSVM  Parallel extension and implementation of PSVM