Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Similar presentations


Presentation on theme: "Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy."— Presentation transcript:

1 Kernel Methods and SVM’s

2 Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y  {c 1,…,c m }: classification Real-valued y: regression Note: usually assume {c 1,…,c m } are mutually exclusive and exhaustive

3 Simple Two-Class Perceptron Initialize weight vector Repeat one or more times (indexed by k): For each training data point x i If … endIf “gradient descent”

4 Perceptron Dual Form Notice that ends up as a linear combination of y j x j : Thus: This leads to a dual form of the learning algorithm: +ve; bigger for “harder” examples

5 Perceptron Dual Form Note: the training data only enter the algorithm via This is generally true for linear models (eg linear regression, ridge regression). Initialize weight vector Repeat until no more mistakes For each training data point x i If … endIf

6 Learning in Feature Space We have already seen the idea of changing the representation of the predictors: is called the feature space

7 Linear Feature Space Models Now consider models of the form: equivalently: A kernel is a function K, such that for all x,z  X where  is a mapping from X to an inner product feature space F just need to know K, not  !

8 Making Kernels What properties must K satisfy to be a kernel? 1. Symmetry 2. Cauchy-Schwarz + other conditions

9 Mercer’s Theorem Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “Mercer Kernels” This kernel defines a set of functions H K, elements of which have an expansion as: So, some kernels correspond to infinite numers of transformed predictor variables K “pos. semi-definite”

10 Reproducing Kernel Hilbert Space Define an inner product in this function space as: Note then that: This is the reproducing property of H K Also note, Mercer kernel implies:

11 Regularization and RKHS A general class of regularization problems has the form: Suppose f lives in a RKHS with Some loss function (e.g. squared loss) Penalize complex f and Let: Then need to solve this “easy” problem:

12 RKHS Examples For regression with squared error loss, have so that: generalizes smoothing splines… Choosing: leads to the thin-plate spline models

13 Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted  ’s are usually zero; x’s corresponding the the non-zero  ’s are the support vectors.


Download ppt "Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy."

Similar presentations


Ads by Google