Kernel Regression Prof. Bennett

Kernel Regression Prof. Bennett
Math Model of Learning and Discovery 2/24/03 Based on Chapter 2 of Shawe-Taylor and Cristianini

Outline Review Ridge Regression LS-SVM=KRR Dual Derivation Bias Issue
Summary

Ridge Regression Review
Use least norm solution for fixed Regularized problem Optimality Condition: Requires 0(n3) operations

Dual Representation Inverse always exists for any
Alternative representation: Solving ll equation is 0(l3)

Dual Ridge Regression To predict new point:
Note need only compute G, the Gram Matrix Ridge Regression requires only inner products between data points

Linear Regression in Feature Space
Key Idea: Map data to higher dimensional space (feature space) and perform linear regression in embedded space. Embedding Map:

Kernel Function A kernel is a function K such that
There are many possible kernels. Simplest is linear kernel.

Ridge Regression in Feature Space
To predict new point: To compute the Gram Matrix Use kernel to compute inner product

Alternative Dual Derivation
Original math model Equivalent math model Construct dual using Wolfe Duality

Lagrangian Function Consider the problem Lagrangian function is

Wolfe Dual Problem Primal Dual

Lagrangian Function Primal Lagrangian

Wolfe Dual Problem Construct Wolfe Dual Simplify by eliminating z=

Simplified Problem Get rid of z Simplify by eliminating w=X’

Simplified Problem Get rid of w

Optimal solution Problem in matrix notation with G=XX’
Solution satisfies

What about Bias If we limit regression function to f(x)=w’x means that solution must pass through origin. Many models may require a bias or constant factor f(x)=w’x+b

Eliminate Bias One way to eliminate bias is to “center” the data
Make data have mean of 0

Center y Y now has sample mean of 0
Frequently good to make y have standard length:

Center X Mean X Center X

You Try Consider data matrix with 3 points in 4 dimensions
Computer the centered by hand and with the following formula.

Center (X) in Feature Space
We cannot center (X) directly in feature space. Center G = XX’ Works in feature space too for G in kernel space

Centering Kernel Practical Computation:

Ridge Regression in Feature Space
Original way Predicted normalized y Predicted original y

Worksheet Normalized Y Invert to get unnormalized y

Centering Test Data Calculate test data just like training data:
Prediction of test data becomes:

Alternate Approach Directly add bias to the model:
Optimization problem becomes:

Lagrangian Function Consider the problem Lagrangian function is

Lagrangian Function Primal

Wolfe Dual Problem Simplify by eliminating z= and using e’ =0

Simplified Problem Simplify by eliminating w=X’

Simplified Problem Get rid of w

New Problem to be solved
Problem in matrix notation with G=XX’ This is a constrained optimization problem. Solution is also system of equations, but not as simple.

Kernel Ridge Regression
Centered algorithm just requires centering of the kernel and solving one equation. Can also add bias directly. + Lots of fast equation solvers. + Theory supports generalization - requires full training kernel to compute  - requires full training kernel to predict future points

Kernel Regression Prof. Bennett

Similar presentations

Presentation on theme: "Kernel Regression Prof. Bennett"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kernel Regression Prof. Bennett

Similar presentations

Presentation on theme: "Kernel Regression Prof. Bennett"— Presentation transcript:

Similar presentations

About project

Feedback