SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

SVMs, cont’d Intro to Bayesian learning

Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems There are off-the-shelf methods to solve them Actually solving this is way, way beyond the scope of this class Consider it a black box If a solution exists, it will be found & be unique Expensive, but not intractably so

Nonseparable data What if the data isn’t linearly separable? Project into higher dim space (we’ll get there) Allow some “slop” in the system Allow margins to be violated “a little” w

The new “slackful” QP The are “slack variables” Allow margins to be violated a little Still want to minimize margin violations, so add them to QP instance: Minimize: Subject to:

You promised nonlinearity! Where did the nonlinear transform go in all this? Another clever trick With a little algebra (& help from Lagrange multipliers), can rewrite our QP in the form: Maximize: Subject to:

Kernel functions So??? It’s still the same linear system Note, though, that appears in the system only as a dot product: Can replace with : The inner product is called a “kernel function”

Why are kernel fns cool? The cool trick is that many useful projections can be written as kernel functions in closed form I.e., can work with K() rather than If you know K(X i,X j ) for every (i,j) pair, then you can construct the maximum margin hyperplane between the projected data without ever explicitly doing the projection!

Example kernels Homogeneous degree- k polynomial: Inhomogeneous degree- k polynomial: Gaussian radial basis function: Sigmoidal (neural network):

Side note on kernels What precisely do kernel functions mean? Metric functions take two points and return a (generalized) distance between them What is the equivalent interpretation for kernels? Hint: think about what kernel function replaces in the max margin QP formulation

Side note on kernels Kernel functions are generalized inner products Essentially, give you the cosine of the angle between vectors Recall the law of cosines:

Side note on kernels Replace traditional dot product with “generalized inner product” and get:

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as And our classification rule for query pt was:

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as And our classification rule for query pt was: So:

Using the classifier SVM images from lecture notes by S. Dreiseitl Support vectors

Putting it all together Original (low dimensional) data

Putting it all together Original data matrix Kernel function Kernel matrix

Putting it all together Kernel + orig labels Maximize Subject to: Quadratic Program instance

Putting it all together Support Vector weights Maximize Subject to: Quadratic Program instance QP Solver subroutine

Putting it all together Support Vector weights Hyperplane in

Putting it all together Support Vector weights Final classifier

Putting it all together Final classifier Nonlinear classifier in

Final notes on SVMs Note that only for which actually contribute to final classifier This is why they are called support vectors All the rest of the training data can be discarded Complexity of training (& ability to generalize) based only on amount of training data Not based on dimension of hyperplane space ( ) Good classification performance In practice, SVMs among the strongest classifiers we have Closely related to neural nets, boosting, etc.

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

Similar presentations

Presentation on theme: "SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

Similar presentations

Presentation on theme: "SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems."— Presentation transcript:

Similar presentations

About project

Feedback