Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Pattern Recognition and Machine Learning: Kernel Methods.
LOGO Classification IV Lecturer: Dr. Bo Yuan
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Support Vector Machines
Support Vector Machine
Pattern Recognition and Machine Learning
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Chapter 5: Linear Discriminant Functions
Lecture Notes for CMPUT 466/551 Nilanjan Ray
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Support Vector Machines and Kernel Methods
Support Vector Machines
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
SVM Support Vectors Machines
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Properties of Kernels Presenter: Hongliang Fei Date: June 11, 2009.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Support Vector Machines Part 2. Recap of SVM algorithm Given training set S = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) | (x i, y i )   n  {+1, -1}
Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification.
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Principal Component Analysis
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Recitation 6: Kernel SVM
Welcome to the Kernel-Club
CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.
Support Vector Machines 2
Presentation transcript:

Kernels CMPUT 466/551 Nilanjan Ray

Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier Kernel function: a deeper understanding A case study

Kernel Functions: SVM The dual cost function: The non-linear classifier in dual variables: The kernel function K is symmetric and positive (semi)definite by definition

Input Space to Feature Space Picture taken from: Kernel methods for pattern analysis By Shawe-Taylor and Cristianini

Input Space to Feature Space: Example

Kernel Ridge Regression Consider the regression problem: fit the function to N data points Basis functions are non-linear in x Form the cost function: The solution is given by: where, Using the identity we have Ex. Prove this identity We have defined:Note that is the kernel matrix Finally the solution is given by The basis functions h have disappeared!

Kernel k-Nearest Neighbor Classifier Basis functions are typically non-linear in x Consider the k-nn classification problem in the feature space. The Euclidean distance in the feature space can be written as follows: Once again, the basis functions h have disappeared! Note also that a kernel function essentially provides similarity between two points in the input space (opposite of distance measure!)

The Kernel Architecture Picture taken from: Learning with kernels By Scholkopf and Smola

Inside Kernels Picture taken from: Learning with kernels

Inside Kernels… Given a point x in the input space, the function k(., x) is essentially function So, x is mapped into a function space (known as Reproducing kernel Hilbert space (RKHS) When we measure similarity of two points x and y in the input space, we are actually measuring the similarity between two functions k(., x) and k(., y) in RKHS. How is this similarity defined in in RKHS? By a (defined) inner product in RKHS: All the solutions so far we obtained has the form: This means these solutions are functions in RKHS. Functions in RKHS are nicer: they are smooth, they have finite-dimensional representation. Good for computations and practical solutions See “Learning with kernels” for more; Must read G. Wahba’s work to learn more on RKHS vis-à-vis M/C Learning. Reproducing property