LMS Algorithm in a Reproducing Kernel Hilbert Space Weifeng Liu, P. P. Pokharel, J. C. Principe Computational NeuroEngineering Laboratory, University of.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Support Vector Machines (SVM)
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Capacity of MIMO Channels: Asymptotic Evaluation Under Correlated Fading Presented by: Zhou Yuan University of Houston 10/22/2009.
ECG Signal processing (2)
Introduction to Neural Networks Computing
Chapter Outline 3.1 Introduction
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
An Introduction of Support Vector Machine
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Support Vector Machine
x – independent variable (input)
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines and Kernel Methods
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
SVM Support Vectors Machines
Lecture 10: Support Vector Machines
Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.
Inverse Problems. Example Direct problem given polynomial find zeros Inverse problem given zeros find polynomial.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Correntropy as a similarity measure Weifeng Liu, P. P. Pokharel, Jose Principe Computational NeuroEngineering Laboratory University of Florida
Aug. 27, 2003IFAC-SYSID2003 Functional Analytic Framework for Model Selection Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA,
Support Vector Machine (SVM) Based on Nello Cristianini presentation
An Introduction to Support Vector Machines (M. Law)
1. 2  A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced.
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
Measure Independence in Kernel Space Presented by: Qiang Lou.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Non-Linear Dimensionality Reduction
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Online Kernel Learning Jose C. Principe Computational NeuroEngineering Laboratory (CNEL) University of Florida
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Nonlinear Adaptive Kernel Methods Dec. 1, 2009 Anthony Kuh Chaopin Zhu Nate Kowahl.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Kernel Regression Prof. Bennett
CSSE463: Image Recognition Day 14
ECE 5424: Introduction to Machine Learning
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
Principal Component Analysis
Welcome to the Kernel-Club
ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Presentation transcript:

LMS Algorithm in a Reproducing Kernel Hilbert Space Weifeng Liu, P. P. Pokharel, J. C. Principe Computational NeuroEngineering Laboratory, University of Florida Acknowledgment: This work was partially supported by NSF grant ECS and ECS

Outlines Introduction Least Mean Square algorithm (easy) Reproducing kernel Hilbert space (tricky) The convergence and regularization analysis (important) Learning from error models (interesting)

Introduction Puskal (2006) –Kernel LMS Kivinen, Smola (2004) –Online learning with kernels (more like leaky LMS) Moody, Platt (1990’s)—Resource allocation networks (growing and pruning)

LMS (1960, Widrow and Hoff) Given a sequence of examples from U×R: U: a compact set of R L. The model is assumed: The cost function:

LMS The LMS algorithm The weight after n iteration: (1) (2)

Reproducing kernel Hilbert space A continuous, symmetric, positive-definite kernel,a mapping Φ, and an inner product H is the closure of the span of all Φ(u). Reproducing Kernel trick The induced norm

RKHS Kernel trick: – An inner product in the feature space – A similarity measure you needed. Mercer’s theorem:

Common kernels Gaussian kernel Polynomial kernel

Kernel LMS Transform the input u i to Φ(u i ): Assume Φ(u i ) ∈ R M The model is assumed: The cost function:

Kernel LMS The KLMS algorithm The weight after n iteration: (3) (4)

Kernel LMS (5)

Kernel LMS After the learning, the input-output relation: (6)

KLMS vs. RBF KLMS: RBF: α satisfy G is the gram matrix: G(i,j)=ĸ(u i,u j ) RBF needs regularization. Does KLMS need regularization? (7) (8)

KLMS vs. LMS Kernel LMS is nothing but LMS in the feature space--a very high dimensional reproducing kernel Hilbert space (M>N) Eigen-spread is awful—does it converge?

Example: MG signal predication Time embedding: 10. Learn rate: training data 100 test data point. Gaussian noise noise variance:.04

Example: MG signal predication MSELinear LMS KLMSRBF (λ=0) RBF (λ=.1) RBF (λ=1) RBF (λ=10) training test

Complexity Comparison RBFKLMSLMS ComputationO(N 3 )O(N 2 )O(L) MemoryO(N 2 +N*L)O(N*L)O(L)

The asymptotic analysis on convergence—small step-size theory Denote The correlation matrix is singular. Assume and

The asymptotic analysis on convergence—small step-size theory Denote we have

The weight stays at the initial place in the 0-eigen-value directions If we have

The 0-eigen-value directions does not affect the MSE Denote It does not care about the null space! It only focuses on the data space!

The minimum norm initialization The initialization gives the minimum norm possible solution.

Minimum norm solution

Learning is Ill-posed

Over-learning

Regularization Technique Learning from finite data is ill-posed. A priori information--Smoothness is needed. The norm of the function, which indicates the ‘slope’ of the linear operator is constrained. In statistical learning theory, the norm is associated with the confidence of uniform convergence!

Regularized RBF The cost function: or equivalently

KLMS as a learning algorithm The model with The following inequalities hold The proof…(H ∞ robust + triangle inequality + matrix transformation + derivative + …)

The solution of regularized RBF is The reason of ill-posedness is the inversion of the matrix (G+λI) The numerical analysis

The solution of KLMS is By the inequality we have

Example: MG signal predication weightKLMSRBF (λ=0) RBF (λ=.1) RBF (λ=1) RBF (λ=10) norm e

The conclusion The LMS algorithm can be readily used in a RKHS to derive nonlinear algorithms. From the machine learning view, the LMS method is a simple tool to have a regularized solution.

Demo

LMS learning model An event happens, and a decision made. If the decision is correct, nothing happens. If an error is incurred, a correction is made on the original model. If we do things right, everything is fine and life goes on. If we do something wrong, lessons are drawn and our abilities are honed.

Would we over-learn? If the real world is attempted to be modeled mathematically, what dimension is appropriate? Are we likely to over-learn? Are we using the LMS algorithm? What is good to remember the past? What is bad to be a perfectionist?

"If you shut your door to all errors, truth will be shut out."---Rabindranath Tagore