Presentation is loading. Please wait.

Presentation is loading. Please wait.

MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.

Similar presentations


Presentation on theme: "MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst."— Presentation transcript:

1

2 MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek

3 MMLD2 Outline zSupport Vector Machines for Classification yLinear Discrimination via geometry yNonlinear Discrimination zNitty Gritty Details zResults from Cortes and Vapnik zHallelujah zHype

4 MMLD3 Binary Classification zExample – Medical Diagnosis Is it benign or malignant?

5 MMLD4 Linear Classification Model zGiven training data zLinear model - find zSuch that

6 MMLD5 Best Linear Separator?

7 MMLD6 Best Linear Separator?

8 MMLD7 Best Linear Separator?

9 MMLD8 Best Linear Separator?

10 MMLD9 Best Linear Separator?

11 MMLD10 Find Closest Points in Convex Hulls c d

12 MMLD11 Plane Bisect Closest Points d c

13 MMLD12 Find using quadratic program Many existing and new solvers.

14 MMLD13 Best Linear Separator: Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =

15 MMLD14 Maximize margin using quadratic program

16 MMLD15 Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors:

17 MMLD16 Support Vector Machines (SVM) Key Ideas: z“Maximize Margins” z“Do the Dual” z“Construct Kernels” A methodology for inference based on Vapnik’s Statistical Learning Theory.

18 MMLD17 Statistical Learning Theory zMisclassification error and the function complexity bound generalization error. zMaximizing margins minimizes complexity. z“Eliminates” overfitting. zSolution depends only on Support Vectors not number of attributes.

19 MMLD18 Margins and Complexity Skinny margin is more flexible thus more complex.

20 MMLD19 Margins and Complexity Fat margin is less complex.

21 MMLD20 Linearly Inseparable Case Convex Hulls Intersect! Same argument won’t work.

22 MMLD21 Reduced Convex Hulls Don’t Intersect Reduce by adding upper bound D

23 MMLD22 Find Closest Points Then Bisect No change except for D. D determines number of Support Vectors.

24 MMLD23 Linearly Inseparable Case: Supporting Plane Method Just add non-negative error vector z.

25 MMLD24 Closest Points equivalent to Support Plane Method Solution only depends on support vectors:

26 MMLD25 Nonlinear Classification

27 MMLD26 Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Dual SVM becomes:

28 MMLD27 Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain  and K, e.g.

29 MMLD28 Final Classification via Kernels The Dual SVM becomes:

30 MMLD29

31 MMLD30 zSolve Dual SVM QP zRecover primal variable b zClassify new x Final SVM Algorithm Solution only depends on support vectors :

32 MMLD31 Support Vector Machines (SVM) zKey Formulation Ideas: y“Maximize Margins” y“Do the Dual” y“Construct Kernels” zGeneralization Error Bounds zPractical Algorithms

33 MMLD32 Nitty Gritty zNeed Dual of

34 MMLD33 Wolfe Dual Problem with Inequalities zPrimal zDual

35 MMLD34 Lagrangian Function zPrimal zLagrangian

36 MMLD35 Wolfe Dual Eliminate

37 MMLD36 Wolfe Dual Use grad b to simplify objective

38 MMLD37 Wolfe Dual Eliminate w

39 MMLD38 Wolfe Dual Simplify inner products

40 MMLD39 Final Wolfe Dual Usually convert to minimization at this point

41 MMLD40 Cortes and Vapnik: Figure 1: degree 2 polynomials SV =circles errors =crosses

42 MMLD41 Fig 6: US postal data 7.3K train 2K test (16 by 16 resolution)

43 MMLD42 Results on US postal service: Gaussian Kernel

44 MMLD43 Errors on US postal data

45 MMLD44 NIST data 60K train 10K test 28X28 resolution 4 degree polynomial misclassified examples below 1 false negatives, others false positives

46 MMLD45 NIST results

47 MMLD46 Hallelujah! zGeneralization theory and practice meet zGeneral methodology for many types of problems zSame Program + New Kernel = New method zNo problems with local minima zFew model parameters. Selects capacity. zRobust optimization methods. zSuccessful Applications BUT…

48 MMLD47 HYPE? zWill SVMs beat my best hand-tuned method Z for X? zDo SVM scale to massive datasets? zHow to chose C and Kernel? zWhat is the effect of attribute scaling? zHow to handle categorical variables? zHow to incorporate domain knowledge? zHow to interpret results?

49 MMLD48 Support Vector Machine Resources zhttp://www.support-vector.net/ zhttp://www.kernel-machines.org/ zLinks off my web page: http://www.rpi.edu/~bennek


Download ppt "MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst."

Similar presentations


Ads by Google