Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.
2 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 radial SVM
3 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Road map linear discrimination: the separable case linear discrimination: the NON separable case quadratic discrimination radial SVM –principle –3 regularization hyperparametres –some benchmark results (glass data) SMV for regression
4 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 What ’s new with SVM Artificial Neural Networks Support Vector Machine From biology to Machine learning –It works ! Some reason –formalization of learning : statistical learning theory - learning from data From maths ! to Machine learning = minimization –universality learn every thing : Kernel trick –complexity control but not any thing : Margin minimization + constraints
5 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Space functional Kernel’s trick
6 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Minimization with constraints L(x, ) : the Lagrangian (Lagrange, 1788)
7 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Minimization with constraints dual formulation Phase 1 Phase 2
8 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Linear discrimination the separable case wx+ b=0 Well classify all examples
9 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Margin Linear discrimination the separable case wx+ b=0 With the largest MARGIN Well classify all examples
10 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April Linear discrimination the separable case y x
11 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April Linear discrimination the separable case y = wx y x MARGIN
12 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Margin With the largest MARGIN Linear discrimination the separable case wx+ b=0 Well classify all examples
13 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Linear classification- the separable case
14 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Equality constraint integration 0 0 = H c y y
15 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Inequality constraint integration While ( ) do not verify optimality conditions = M -1 b and = - H + c + y if <0, a constraint is blocked : ( i =0) (an active variable is eliminated) else if < 0, a constraint is relaxed QP
16 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Linear classification : the non separable case Error variables
17 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 quadratic SVM
18 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 polynomial classification 1n1n 1 5 Rang(H) = 5 regularization needed
19 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Gaussian Kernel based S.V.M.
20 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April d example Class 1 : mixture of 2 gaussian Class 2 : gaussian Training set Output of the SVM for the test set Margin Support vectors
21 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April regularization parameters C : the superior bound : the kernel bandwidth: K (x,y) the linear system regularization H =b => (H+ I) =b
22 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Small bandwidth and large C
23 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Large bandwidth and large C
24 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Large bandwidth and small C
25 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 SVM for regression
26 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Example...
27 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 small and also
28 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Geostatistics
29 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 An other way to see things (Girosi, 97)
30 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 SVM history and trends Vapnik, V.; Lerner, A statistical learning theory Mangasarian, O. 1965, 1968 optimization Kimeldorf, G; Wahba, G; 1971 non parametric regression : splines Boser, B.; Guyon, I..; Vapnik, V Bennett, K.; Mangasarian, O Learning Theory : Cortes, C soft margin classifier, effective VC-dimensions other formalisms,... The pioneers The 2nd start : ANN, learning & computers... Trends... Applications : on-line handwritten C. R. Face recognition Text mining... Optimization : Vapnik Osuna, E. & Girosi, John C. Platt Linda Kaufman Thorsten Joachims
31 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Optimization issues QP with constraints Box constraints H is positive semidefinite (beware commercial solver) Size of H ! But a lot of are 0 or C –active constraint set, starting with = 0 –do not compute (store) the whole H –chunk multiclass issue !
32 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Optimization issues Solve the whole problem commercial : LOQO (primal-dual approach), MINOS, Matlab !!! Vapnik : More and Toraldo (1991) Decompose the problem Chunking (Vapnik, 82, 92), Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98) Sequential Minimal Optimization (SMO) John C. Platt, 98 No H : Start from 0 - active set technique (Linda Kaufman, 98) minimize the cost function –2nd order : Newton, –conjugate gradient, projected conjugate gradient PCG, Burges, 98 select the relevant constraints Interior point methods Moré, 91, Z. Dostal, 97 and others...
33 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Some benchmark considerations (Platt 98) Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems Using two-variable QP subproblems (SMO) does not require QP library SMO trades off QP time for kernel evaluation time Optimizations can dramatically reduce kernel time –Linear SVMs (useful for text categorization) –Sparse dot products –Kernel caching (good for smaller problems, Thorsten Joachims, 98 ) SMO can be much faster than other techniques for some problems what about active set and interior points technique ?
34 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 open issues VC Entropy for Margin Classifiers: learning bounds other margin classifiers: boosting Non “L 2 ” (quadratic) cost function: Sparse coding (Drezet & Harrsion) curse of dimensionality: local vs global kernel influence (Tsuda) applications: –classification (Weston & Watkins), –…to regression (Pontil & al.) –face detection (Fernandez & Viennet) algorithms (Christiani & Campbell) making bridges - other formalisms: –bayesian (Kwok), –statistical mechanics (Buhot & Gordon), –logic (Sebag), …
35 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Books in Support Vector Research V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, Statistical Learning Theory. Wiley, SVM introductive chapter in : S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed). V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods. Wiley, C.J.C. Burges; A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge, Discovery, Vol 2 Number 2. Schölkopf, B.; Support Vector Learning. PhD Thesis. Published by: R. Oldenbourg Verlag, Munich, ISBN Smola, A. J.; Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999 NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998, NIPS’ 98 workshop’s book on large margin classifier… is coming
36 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Events in Support Vector Research ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods, March 27, 1999, Nordkirchen Castle, Germany
37 ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22 nd April 1999 Conclusion SVM select relevant patterns in a robust way - svm.cs.rhbnc.ac.uk Matlab code available under request - Multi class problems Small error