ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO Machine Learning 2nd Edition
Advertisements

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
Basics of Kernel Methods in Statistical Learning Theory Mohammed Nasser Professor Department of Statistics Rajshahi University
Pattern Recognition and Machine Learning: Kernel Methods.
Support vector machine
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machine
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Principal Component Analysis
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines and The Kernel Trick William Cohen
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Support Vector Machines and Kernel Methods
Support Vector Machines
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
Support Vector Machine & Image Classification Applications
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.
Gaussian Processes Li An Li An
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Class 19, spring 2001 CBCl/AI MIT Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000 The “b” problem We said that the solution.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
ADVANCED TOPIC: KERNELS 1. The kernel trick where i 1,…,i k are the mistakes… so: Remember in our alternate perceptron:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Chapter 4 Vector Spaces Linear Algebra. Ch04_2 Definition 1: ……………………………………………………………………. The elements in R n called …………. 4.1 The vector Space R n Addition.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Nonlinear Adaptive Kernel Methods Dec. 1, 2009 Anthony Kuh Chaopin Zhu Nate Kowahl.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
1 Modification of Correlation Kernels in SVM, KPCA and KCCA in Texture Classification Yo Horikawa Kagawa University, Japan.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Discriminative Machine Learning Topic 3: SVM Duality Slides available online M. Pawan Kumar (Based on Prof.
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
Recitation 6: Kernel SVM
Welcome to the Kernel-Club
CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.
Linear Discrimination
Introduction to Machine Learning
Presentation transcript:

ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation Research Department of Electrical and Computer Engineering University of Maryland, College Park

ENEE698A Graduate Seminar Overview Reproducing Kernel Hilbert Space (RKHS) –From R N to RKHS Regularization Theory with RKHS –Regularization Network (RN) –Support Vector Regression (SVR) –Support Vector Classification (SVC) Kernel Methods –Kernel Principal Component Analysis (KPCA) –More examples

ENEE698A Graduate Seminar Vector Space R N Positive definite matrix S=[s i (j)] –S = [s 1,s 2,…,s N ] –Eigensystem : S =  n=1:N n  n  n T Inner product = f T S -1 g – =  n n -1 f T  n  n T g =  n n -1 (f,  n  (g,  n  –(u,v) = u T v, regular inner product Two properties: – = s i T S -1 s j = s i T e j = s i (j) – = s i T S -1 f = e i T f = f(i) with f=[f(1),f(2),…,f(N)] T

ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS) Positive kernel function k x (.)=k(x,.) –Mercer’s theorem –Eigensystem : k(x,y)=  n=1:∞ n  n (x)  n (y) with  n=1:∞ n 2 <∞ Inner product H – H =  n n -1 (f,  n  (g,  n  –(u,v) = ∫u(y)v(y)dy, regular inner product Two properties: – H = k(x,y) – H = f(x) reproducing property

ENEE698A Graduate Seminar More on RKHS Let f(y) be an element in RKHS – f(y) =  n=1:∞ a n  n (y) –(f,  n ) = a n – H =  n=1:∞ n -1 a n 2 One particular function f(y) –f(y) =  i=1:n c i k(y,x i ) –Is f(y) in the RKHS? – H =  i=1:n  j=1:n c i c j k(x i,x j ) = c T K c with c=[c 1,c 2,…, c i ] T and K=[k(x i,x j )] the Gram matrix

ENEE698A Graduate Seminar More on RKHS Nonlinear mapping  : R N  R ∞ –  (x)=[ 1 1/2  1 (x),…, n 1/2  n (x),…] T Regular inner product in feature space R ∞ –(  (x),  (y)) =  (x) T  (y) =  n=1:∞ n 1/2  n (x) n 1/2  n (y) = k(x,y) = H

ENEE698A Graduate Seminar Kernel Choices Gaussian kernel or RBF kernel – k(x,y)=exp(-  -2 ||x-y|| 2 ) Polynomial kernel –k(x,y) = ((x,y)+d) p Construction rule –Covariance function of Gaussian processes –k(x,y) = ∫g(x,z)g(z,y)dz –k(x,y) = c, c>0 –k(x,y) = k 1 (x,y) + k 2 (x,y) –k(x,y) = k 1 (x,y) * k 2 (x,y)

ENEE698A Graduate Seminar Regularization Theory Regularization task –min f in  H J(f) = [  i=1:n L(y i,f(x i )) + H ], where L is lost function and H is a stabilizer. Optimal solution –f(x)=  i=1:n c i k(x,x i ) = [k(x,x 1 ),…,k(x,x n )]c –{h i (x)=k(x,x i ); i=1,…,n} are basis functions –Optimal coefficients {c i ; i=1,…,n} depend on the function L and

ENEE698A Graduate Seminar Regularization Network (RN) RN assumes a quadratic loss function –min f in  H J(f) = [  i=1:n (y i -f(x i )) 2 + H ] Find {c i } –[f(x 1 ), f(x 2 ), …, f(x n )] T = Kc –J(f) = (y-Kc) T (y-Kc) +  c T Kc –c = (K+  ) -1 y Practical considerations –One term of intercept f(x) =  i=1:n c i k(x,x i )+b –Too many coefficients  Support vector regression (SVR)

ENEE698A Graduate Seminar Support Vector Regression (SVR) SVR assumes an  –insensitive loss function –min f in  H J(f) = [  i=1:n |y i -f(x i )|  + H ], with |x|  = max(0, |x|-  ) Primal problem –min J(f, , )=  i=1:n (  i + i ) + H –s.t. (1) f(x i )-y i =0; (4) i >=0 –Quadratic programming (QP)  Dual problem –x i is called support vector (SV) if its Langrange multipler is nonzero

ENEE698A Graduate Seminar Support Vector Classification (SVC) SVR assumes a soft margin loss function –min f in  H J(f) =[  i=1:n |1-y i f(x i )|  + H ], with |x|  = max(0, x) –Determine the label of x as sgn(  i c i y i k(x,x i )+b) Primal problem –min J(f,  )=  i=1:n  i + H –s.t. (1) 1- y i f(x i ) =0; –Quadratic programming (QP)  Dual problem –x i is called support vector (SV) if its Langrange multipler is nonzero

ENEE698A Graduate Seminar Kernel Methods General strategy of kernel methods –Nonlinear mapping  : R N  R ∞ embedded in the kernel function –Linear learning methods employing geometry / linear algebra –Kernel trick: cast all computations in dot product

ENEE698A Graduate Seminar Gram Matrix Gram matrix (dot product matrix, kernel matrix) –Covariance matrix of any Gaussian process for any finite sample –Combines the information of the data and the kernel –Contains all needed information for the learning kernel –K = [k(x i,x j )] = [  (x i ) T  (x j )] =  T  where  = [  (x 1 ),  (x 2 ),…,  (x n )]

ENEE698A Graduate Seminar Geometry in the RKHS Distance in the RKHS –(  (x)-  (y)) T (  (x)-  (y)) =  (x) T  (x)+  (y) T  (y)–2  (x) T  (y) = k(x,x) + k(y,y)- 2k(x,y) Distance to center –  0 =  i=1:n  (x i )/n =  1/n –(  (x)-  0 ) T (  (x)-  0 ) =  (x) T  (x) +  0 T  0 – 2  (x) T  0 = k(x,x) + 1 T  T  1/n 2 – 2  (x) T  1/n = k(x,x) + 1 T K1/n 2 – 2 g  (x) T 1/n –g  (x) =  T  (x) = [k(x,x 1 ),…,k(x,x n )] T

ENEE698A Graduate Seminar Geometry in the RKHS Centered distance in the RKHS –(  (x) –  0 ) T (  (y) –  0 ) =  (x) T  (y) +  0 T  0 –  (x) T  0 –  (y) T  0 = k(x,y) +1 T K1/n 2 -g  (x) T 1/n-g  (y) T 1/n Centered Gram matrix –K ^ = [  (x 1 )–  0,…,  (x n )–  0 ] T [  (x 1 )–  0,…,  (x n )–  0 ] = [  11 T /n] T [  11 T /n] = [  Q] T [  Q] =  T  Q T KQ Q = I n -11 T /n

ENEE698A Graduate Seminar Kernel Principal Component Analysis (KPCA) Kernel PCA –Mean  0 =  i=1:n  (x i )/n =  1/n –Covariance matrix C = n -1 [  (x 1 )–  0,…,  (x n )–  0 ][  (x 1 )–  0,…,  (x n )–  0 ] T = n -1 [  Q][  Q] T = n -1  T ;  Q Eigensystem of C –The ‘reciprocal’ matrix:  T  u  = K ^ u  = u – n -1  T  u  = n -1  u; Cv= n -1 v; v=  u –Normalizaton : v T v= u T K ^ u=  u T u=  v ~ =  u -1/2 

ENEE698A Graduate Seminar Kernel Principal Component Analysis (KPCA) Eigen-projection –(  (x)–  0 ) T v ~ = (  (x)–  0 ) T  Qu -1/2 =  (x) T  Qu -1/2 - 1 T  T  Qu -1/2 /n = g  (x) T Qu -1/2 - 1 T KQu -1/2 /n

ENEE698A Graduate Seminar Kernel Principal Component Analysis (KPCA) Contour plots of PCA features

ENEE698A Graduate Seminar More Examples of Kernel Methods Examples –Kernel Fisher Discriminant Analysis (KFDA) –Kernel K-Means Clustering –Spectral Clustering and Graph Cutting –Kernel … –Kernel Independent Component Analysis (KICA) ?

ENEE698A Graduate Seminar Summary of Kernel Methods Pros and Cons –Nonlinear embedding –Linear algorithm –Large storage requirement –Computational inefficiency Important Issues –Kernel selection and design