Download presentation
Presentation is loading. Please wait.
Published byYuliani Budiono Modified over 6 years ago
1
Recap Finds the boundary with “maximum margin”
Uses “slack variables” to deal with outliers Uses “kernels”, and the “kernel trick”, to solve nonlinear problems.
2
SVM error function = hinge loss + 1/margin
…where hinge loss (summed over datapoints) proportional to the inverse of the margin, 1/m
3
Slack variables (aka soft margins) slack penalty for using slack
4
2D 3D What about much more complicated data?
- project into high dimensional space, and solve with a linear model - project back to the original space, and the linear boundary becomes non-linear 2D 3D
5
The Kernel Trick
6
Slight rearrangement of our model – still equivalent though.
Remember matrix notation – this is a “dot product”
7
…our new feature space. Project into higher dimensional space…
x3 x2 x2 x1 x1 …our new feature space. BUT WHERE DO WE GET THIS FROM!?
8
The Representer Theorem
(Kimeldorf and Wahba, 1978) For a linear model, The optimal parameter vector is always a linear combination of the training examples…
9
The Kernel Trick, PART 1 Substitute this into our model….
Or, if with our hypothetical high dimensional feature space:
10
The Kernel Trick, PART 2 scalar value
Wouldn’t it be nice if we didn’t have to think up the ? And just skip straight to the scalar value we need directly…? ….If we had this, ….our model would look like this.
11
Kernels When d=2, the implicit feature space is:
For example…. The polynomial kernel When d=2, the implicit feature space is: But we never actually calculate it!
12
- project into high dimensional space, and solve with a linear model
- project back to the original space, and the linear boundary becomes non-linear 2D 3D
13
Polynomial kernel, with d=2
14
The Polynomial Kernel
15
The RBF (Gaussian) Kernel
16
Varying two things at once!
17
Summary of things…
18
SVMs versus Neural Networks
Started from solid theory Theory led to many extensions (SVMS for text, images, graphs) Almost no parameter tuning Highly efficient to train Single optimum Highly resistant to overfitting Neural Nets Started from bio-inspired heuristics Ended up at theory equivalent to statistics ideas Good performance = lots of parameter tuning Computationally intensive to train Suffers from local optima Prone to overfitting
19
SVMs, done. Tonight… read chapter 4 while it’s still fresh.
Remember, by next week – read chapter 5.
20
Examples of Parameters obeying the Representer Theorem
10 5
21
10 p 5 n
22
n 10 p 5 n
23
We had before, for an x on the boundary:
And we just worked out: p Which gives us the expression for t … n The w and t are both linear functions of the examples.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.