Presentation is loading. Please wait.

Presentation is loading. Please wait.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Similar presentations


Presentation on theme: "Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,"— Presentation transcript:

1 Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California, Nov 17-20, 2002

2 Outline of Talk  Linear Support Vector Machines (SVM)  Linear separating surface  Quadratic programming (QP) formulation  Linear programming (LP) formulation  Nonlinear Support Vector Machines  Nonlinear kernel separating surface  LP formulation  The Minimal Kernel Classifier (MKC)  The pound loss function (#)  MKC Algorithm  Numerical experiments  Conclusion

3 What is a Support Vector Machine?  An optimally defined surface  Linear or nonlinear in the input space  Linear in a higher dimensional feature space  Implicitly defined by a kernel function

4 What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning

5 Generalized Support Vector Machines 2-Category Linearly Separable Case A+ A-

6 Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

7 Support Vector Machine Formulation Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

8 QP Support Vector Machine Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

9 Support Vector Machines Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

10 Nonlinear Kernel: LP Formulation  Linear SVM: (Linear separating surface: ) (LP) min s.t.  Replace by a nonlinear kernel : min s.t. in the “dual space”, gives: By QP “duality”,. Maximizing the margin min s.t.

11 The Nonlinear Classifier  The nonlinear classifier:  Where K is a nonlinear kernel, e.g.:  Gaussian (Radial Basis) Kernel :  The -entry of represents “similarity” between the data points and

12 Nonlinear PSVM: Spiral Dataset 94 Red Dots & 94 White Dots

13 Model Simplification  Goal #1: Generate a very sparse solution vector.  Goal #2: Minimize number of active constraints.  Simplifies separating surface.  Why? Minimizes number of kernel functions used.  Why? Reduces data dependence.  Useful for massive incremental classification.

14 Model Simplification Goal #1 Simplifying Separating Surface  The nonlinear separating surface: The separating surface does not depend explicitly on the datapoint  Minimize the number of nonzero

15 Model Simplification Goal #2 Minimize Data Dependence  By KKT conditions: Hence:  Minimize the number of nonzero

16 Achieving Model Simplification: Minimal Kernel Classifier Formulation min s.t.  The new loss function # is given by:  Where is given by:

17 The (Pound) Loss Function #

18 Approximating the Pound Loss Function #

19 Minimal Kernel Classifier as a Concave Minimization Problem min s.t.  For we have:  That can be effectively solved using the finite Successive Linearization Algorithm (SLA) (Magasarian 1996)

20 Minimal Kernel Algorithm (SLA) min s.t.  Start with:  Having determine by solving the LP:  Stop when:

21 Minimal Kernel Algorithm (SLA)  Each iteration of the algorithm solves a Linear program.  The algorithm terminates in a finite number of iterations (typically 5 to 7 iterations).  Solution obtained satisfies the Minimum Principle necessary optimality condition.

22

23 Checkerboard Separating Surface # of Kernel Functions=27 * # of Active Constraints= 30 o

24 Numerical Experiments Results for Six Public Datasets Data Set m x n Reduced rectangular Kernel nnz(t) x nnz(u) MKC Test % (Time Sec.) SVM Test % (#SV) Kernel SV reduction % Testing time Reduction % (SVM-MKC time Sec.) Ionosphere 351 x 34 30.2 x 15.794.9 (172.3) 92.9 (288.2) 94.694.9 (3.05 – 0.16) Cleveland Heart 297 x 13 64.6 x 7.685.8 (147.2) 85.5 (241.0) 96.996.3 (0.84 - 0.03) Pima Indians 768 x 8 263.1 x 7.877.7 (303.3) 76.6 (637.3) 98.8 (3.95 – 0.05) BUPA Liver 345 x 6 144.4 x 10.575.0 (285.9) 72.7 (310.5) 96.697.5 (0.59 – 0.02) Tic-Tac-Toe 958 x 9 31.3 x 14.398.4 (150.4) 98.3 (861.4) 98.398.2 (6.97 – 0.13) Mushroom 8124 x 22 933.8 x 47.989.3 (2763.5) oom (NA) NA

25 Conclusion  A finite algorithm that generates a classifier depending on a fraction of the input data only.  Important for fast online testing of unseen data, e.g. fraud or intrusion detection.  Useful for incremental training of massive data.  Overall algorithm consists of solving 5 to 7 LPs.  Kernel data dependence reduced up to 98.8% of the data used by a standard SVM.  Testing time reduction up to: 98.2%.  MKC testing set correctness comparable to that of more complex standard SVM.


Download ppt "Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,"

Similar presentations


Ads by Google