Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Similar presentations


Presentation on theme: "Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute."— Presentation transcript:

1 Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison

2 Key Contributions  Fast new support vector machine classifier  An order of magnitude faster than standard classifiers  Extremely simple to implement  4 lines of MATLAB code  NO optimization packages (LP,QP) needed

3 Outline of Talk  (Standard) Support vector machine (SVM) classifiers  Proximal support vector machines (PSVM) classifiers  Geometric motivation  Linear PSVM classifier  Nonlinear PSVM classifier  Full and reduced kernels  Numerical results  Correctness comparable to standard SVM  Much faster classification!  2-million points in 10-space in 21 seconds  Compared to over 10 minutes for standard SVM

4 Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

5 Proximal Vector Machines Fitting the Data using two parallel Bounding Planes A+ A-

6 SVM as an Unconstrained Minimization Problem At the solution of (QP) : where, Hence (QP) is equivalent to : min s. t. (QP) Changing to 2-norm and measuring margin in ( ) space:

7 PSVM Formulation We have from the QP SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min

8 Advantages of New Formulation  Objective function remains strongly convex  An explicit exact solution can be written in terms of the problem data  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space  Exact leave-one-out-correctness can be obtained in terms of problem data

9 Linear PSVM We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier

10 Linear PSVM Solution Here,  The linear system to solve depends on: which is of the size  is usually much smaller than

11 Linear Proximal SVM Algorithm Classifier: Input Define Solve Calculate

12 Nonlinear PSVM Formulation By QP “duality”,. Maximizing the margin in the “dual space”, gives: min  Replace by a nonlinear kernel : min  Linear PSVM: (Linear separating surface: ) (QP) min s. t.

13 The Nonlinear Classifier  Gaussian (Radial Basis) Kernel :  Polynomial Kernel :  The nonlinear classifier:  Where K is a nonlinear kernel, e.g.:

14 Nonlinear PSVM Defining slightly different:  Similar to the linear case, setting the gradient equal to zero, we obtain: However, reduced kernels techniques can be used (RSVM) to reduce dimensionality.  Here, the linear system to solve is of the size

15 Linear Proximal SVM Algorithm Input Solve Calculate Non Define Classifier:

16 PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

17 Linear PSVM Comparisons with Other SVMs Much Faster, Comparable Correctness Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) SVM Ten-fold test % Time (sec.) WPBC (60 mo.) 110 x 32 68.5 0.02 68.5 0.17 62.7 3.85 Ionosphere 351 x 34 87.3 0.17 88.7 1.23 88.0 2.19 Cleveland Heart 297 x 13 85.9 0.01 86.2 0.70 86.5 1.44 Pima Indians 768 x 8 77.5 0.02 77.6 0.78 76.4 37.00 BUPA Liver 345 x 6 69.4 0.02 70.0 0.78 69.5 6.65 Galaxy Dim 4192 x 14 93.5 0.34 95.0 5.21 94.1 28.33

18 Linear PSVM Comparisons on Larger Adult Dataset Much Faster & Comparable Correctness Dataset SizeTesting correctness % Running time Sec. (Best in Red) (Train,Test) Attributes=123 PSVMLSVMSSVMSORSMOSVM (11221,21341)84.48 2.5 84.84 38.9 84.79 14.1 84.37 18.8 - 17.0 84.68 306.6 (16101,16461)84.78 3.7 85.01 60.5 84.96 21.5 84.62 24.8 - 35.3 84.83 667.2 (22697,9865)85.16 5.2 85.35 92.0 85.35 29.0 85.06 31.3 - 85.7 85.17 1425.6 (32562,16282)84.56 7.4 85.05 140.9 85.02 44.5 84.96 83.9 - 163.6 85.05 2184.0

19 Linear PSVM vs LSVM 2-Million Dataset Over 30 Times Faster DatasetMethodTraining Correctness % Testing Correctness % Time Sec. NDC “Easy” LSVM90.8691.23658.5 PSVM90.8091.1320.8 NDC “Hard” LSVM69.8069.44655.6 PSVM69.8469.5220.6

20 Nonlinear PSVM: Spiral Dataset 94 Red Dots & 94 White Dots

21 Nonlinear PSVM Comparisons Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) LSVM Ten-fold test % Time (sec.) Ionosphere 351 x 34 95.2 4.60 95.8 25.25 95.8 14.58 BUPA Liver 345 x 6 73.6 4.34 73.7 20.65 73.7 30.75 Tic-Tac-Toe 958 x 9 98.4 74.95 98.4 395.30 94.7 350.64 Mushroom * 8124 x 22 88.0 35.50 88.8 307.66 87.8 503.74 * A rectangular kernel was used of size 8124 x 215

22 Conclusion  PSVM is an extremely simple procedure for generating linear and nonlinear classifiers  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier  Comparable test set correctness to standard SVM  Much faster than standard SVMs : typically an order of magnitude less.

23 Future Work  Extension of PSVM to multicategory classification  Massive data classification using an incremental PSVM  Parallel extension and implementation of PSVM


Download ppt "Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute."

Similar presentations


Ads by Google