Download presentation
Presentation is loading. Please wait.
Published byClarence Smith Modified over 9 years ago
1
Machine Learning Weak 4 Lecture 2
2
Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour and only about the hand in If you nailed it you can stay home
3
Support Vector Machines Last Time Today
4
Functional Margins For each point we define the functional margin Define the functional margin of the hyperplane, e.g. the parameters w,b as “functional distance”
5
Geometric Margin xixi How far is x i from the hyperplane? How long is segment from x i to L? Hyperplane L w Since L on hyperplane Definition of L Multiply in Solve
6
Margins functional and Geometrical w Related by ||w||
7
Optimizing Margins Maximize Subject To Geometric Margin Point Margins Scale Constraint
8
Optimization Subject To Minimize Quadratic Programming - Convex w Functional margin =1 means sitting on margin
9
Linear Separable SVM Subject To Minimize Constrained Problem We need to study the theory of Lagrange Multipliers to understand the SVM
10
Lagrange Multipliers Define The Lagrangian Only consider convex f, g i, and affine h i (method is more general) α,β are called Lagrange Multipliers
11
Primal Problem Which is what we are looking for!!! We denote the solution x*
12
Dual Problem α,β are dual feasible if α i ≥ 0 for all i This implies for dual feasible α,β
13
Weak and Strong Duality Question: When are they equal? Technical Assume Strong Duality d* = p*
14
Complementary Slackness Let x* be primal optimal α*,β* dual optimal (p*=d*) All Non-Negative Must be zero since squeezed between p* for all i Complimentary Slackness
15
Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal α*,β* dual optimal (p*=d*) g i (x*) ≤ 0, for all i α i * ≥ 0 for all i α i * g i (x*) = 0 for all i h i (x*) = 0, for all i Primal Feasibility Dual Feasibility Complementary Slackness Stationary KKT Conditions for optimality, necessary and sufficient.
16
Finally Back To SVM Subject To Minimize Define the Lagrangian (no β required)
17
SVM Summary Dual S. t. Support Vectors w
18
SVM Generalization VC Dimension for hyperplanes is the number of parameters Theoretically Speaking Why bother finding large margins hyperplanes? There are other bounds: Rich Theory
19
Kernel Support Vector Machines
20
Kernels Nonlinear feature transforms Define Kernel and replace The two optimizations problems are identical!!! Kernel is an inner product in another space
21
Kernels K is an inner product in Φ space!
22
Polynomial Kernel Dimensional Space!!! Feature Transform would take n d time Computing the kernel takes n time
23
Gaussian Kernel Think of this as a similarity measure It is essentially 0 if x and z are not close
24
Gaussian Kernel Nonlinear Transform Simplest case, x,z are 1D e.g. numbers Inner product between infinitely long feature mapped x,z
25
Lets Apply It
31
Kernel Matrix Points Kernel K(x,z)=Φ(x) T Φ(z) Kernel Matrix (same name) If K is a valid Kernel e.g. K(x,z) = Φ(x) T Φ(z) for some Φ Then K is symmetric positive semidefinite (x T Kx≥0) Mercer Kernels, Positive semidefinite is sufficient and necessary condition
32
Kernels Add nonlinearity to our SVM Efficient computation in high and even infinite dimensional spaces. Few Support vectors (on margin) help us in generalization (in theory) and runtime Kernels are not limited to SVM Kernel Perceptrons, Kernel logistic Regression,…, any place where we only depend on the inner product
33
Non Separable Data SVM
34
Violating Margin w Wrong side of the track ξ
35
S. To Minimize If a point is on wrong side of the margin at distance ξ we penalize by Cξ Hyperparameter C controls the competing goals of a large margin and points being on the right side of it How to find C? Validation (Model Selection) w Does this look like regularization to you?
36
Effect of C C=1 C=100
37
Minimize S. To Primal Var. Lagrange Mult.
38
Defining The Problems Dual Primal Dual Opt Primal Opt
39
Find minimizing w,b,ξ Use Gradients New Constraint
40
Constraints
41
Look Familiar!
42
S. t. When done optimizing set β = C-α Convex Quadratic Program
43
KKT Complementary Slackness Optimal solution must have For all inequality constraints We know
44
On Margin Right Side Wrong Side Find b*: Use a point on margin Practice way: Average over margin points
45
S. t. S. To
46
Coordinate Ascent Pick Fix Solve Repeat until done
47
Sequential Minimal Optimization (SMO) Algorithm S. t. Coordinate Ascent: Cannot Change only one variable Take Two
48
Pick 2 indexes Fix nonpicked Optimize W for α’s selected Repeat Until Done subject to additional constraint Algorithm Outline
49
Linear Equation in α 1,α 2 α1α1 α2α2 0 C L H Constrains we have C
50
y 1 is either 1 or -1 α2α2 α1α1 0 C L H Optimize Subject to
51
i=j=1 i=1,j=2, i=2,j=1 i=j=2 i=1, j>2 i>3, j=2 Trying to say it is a second degree polynomial in α 2
52
= Second degree polynomial We can maximize such things: α2α2 α1α1 0 C L H
53
Remains How to pick α’s – Pick one that violate KKT or Heuristic – Pick another one and optimize Stopping Criterion – Close enough to KKT conditions or tired of waiting
54
The End Of SVMs Except you will use them in hand in 2…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.