Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Kernel Methods and SVM’s

Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y  {c 1,…,c m }: classification Real-valued y: regression Note: usually assume {c 1,…,c m } are mutually exclusive and exhaustive

Simple Two-Class Perceptron Initialize weight vector Repeat one or more times (indexed by k): For each training data point x i If … endIf “gradient descent”

Perceptron Dual Form Notice that ends up as a linear combination of y j x j : Thus: This leads to a dual form of the learning algorithm: +ve; bigger for “harder” examples

Perceptron Dual Form Note: the training data only enter the algorithm via This is generally true for linear models (eg linear regression, ridge regression). Initialize weight vector Repeat until no more mistakes For each training data point x i If … endIf

Learning in Feature Space We have already seen the idea of changing the representation of the predictors: is called the feature space

Linear Feature Space Models Now consider models of the form: equivalently: A kernel is a function K, such that for all x,z  X where  is a mapping from X to an inner product feature space F just need to know K, not  !

Making Kernels What properties must K satisfy to be a kernel? 1. Symmetry 2. Cauchy-Schwarz + other conditions

Mercer’s Theorem Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “Mercer Kernels” This kernel defines a set of functions H K, elements of which have an expansion as: So, some kernels correspond to infinite numers of transformed predictor variables K “pos. semi-definite”

Reproducing Kernel Hilbert Space Define an inner product in this function space as: Note then that: This is the reproducing property of H K Also note, Mercer kernel implies:

Regularization and RKHS A general class of regularization problems has the form: Suppose f lives in a RKHS with Some loss function (e.g. squared loss) Penalize complex f and Let: Then need to solve this “easy” problem:

RKHS Examples For regression with squared error loss, have so that: generalizes smoothing splines… Choosing: leads to the thin-plate spline models

Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted  ’s are usually zero; x’s corresponding the the non-zero  ’s are the support vectors.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Similar presentations

Presentation on theme: "Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Similar presentations

Presentation on theme: "Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy."— Presentation transcript:

Similar presentations

About project

Feedback