Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
Linear Separators
Which Is Better?
Margin The margin of a linear separator is defined as the distance of the closest instance point to the linear hyperplane Large margins are intuitively more stable: If noise is added to data, then it is more likely to still be separated
The Margin
Hard-SVM
Hard-SVM Equivalent Formulation
Sample Complexity With Margin
NO! Margin must be relative to data scale (Could take any data of tiny margin and blow it up for free.)
Sample Complexity With Margin
Shattering With a Margin Separated with large margin Separated, but not with large margin
What does this replace? Sample Complexity of Hard-SVM with Margin
Soft-SVM 1
Soft-SVM: Equivalent Definition SRM (structural risk minimization) Hypothesis penalized by norm
Sample Complexity for Soft- SVM No dimensionality dependence
What About Computational Complexity in High Dimension?
The Representer Theorem
Gram
The Kernel
Polynomial Kernels
Gaussian Kernels (RBF: Radial Basis Functions)
Kernels As Prior Knowledge If we think that positive examples can (almost) be separated by some ellipse: then we should use polynomials of degree 2 What should we do if we believe that we can classify a text message using words in a dictionary? A Kernel encodes a measure of similarity between objects. Must be a valid inner product function.
Solving SVM’s Efficiently
SGD for SVM