INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture.

Slides:



Advertisements
Similar presentations
CHAPTER 13: Alpaydin: Kernel Machines
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 10: Support Vector Machines
Distance Metric Learning for Large Margin Nearest Neighbor Classification (LMNN) NIPS 2006 Kilian Q. Weinberger, John Blitzer and Lawrence K. Saul.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Outline Separating Hyperplanes – Separable Case
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Copyright 2005 by David Helmbold1 Support Vector Machines (SVMs) References: Cristianini & Shawe-Taylor book; Vapnik’s book; and “A Tutorial on Support.
Support vector machines
INTRODUCTION TO Machine Learning 3rd Edition
Support Vector Machines
Online Learning Kernels
Support vector machines
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning
Support Vector Machines and Kernels
Support vector machines
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Support vector machines
INTRODUCTION TO Machine Learning 3rd Edition
Support vector machines
Linear Discrimination
Review for test #3 Radial basis functions SVM SOM.
SVMs for Document Ranking
Presentation transcript:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

CHAPTER 13: KERNEL MACHINES

Kernel Machines 3  Discriminant-based: No need to estimate densities first  Define the discriminant in terms of support vectors  The use of kernel functions, application-specific measures of similarity  No need to represent instances as vectors  Convex optimization problems with a unique solution

4 Optimal Separating Hyperplane (Cortes and Vapnik, 1995; Vapnik, 1995)

5 Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ ||w||=1, and to max margin

Margin 6

7

8 Most α t are 0 and only a small number have α t >0; they are the support vectors

9 Soft Margin Hyperplane Not linearly separable Soft error New primal is

10

Hinge Loss 11

-SVM 12 controls the fraction of support vectors

13 Kernel Trick Preprocess input x by basis functions z = φ (x)g(z)=w T z g(x)=w T φ (x) The SVM solution

14 Vectorial Kernels Polynomials of degree q:

15 Vectorial Kernels Radial-basis functions:

Defining kernels 16  Kernel “engineering”  Defining good measures of similarity  String kernels, graph kernels, image kernels,...  Empirical kernel map: Define a set of templates m i and score function s(x,m i )  (x t )=[s(x t,m 1 ), s(x t,m 2 ),..., s(x t,m M )] and K(x,x t )=  (x) T  (x t )

Fixed kernel combination Adaptive kernel combination Localized kernel combination Multiple Kernel Learning 17

Multiclass Kernel Machines 18  1-vs-all  Pairwise separation  Error-Correcting Output Codes (section 17.5)  Single multiclass optimization

19 SVM for Regression Use a linear model (possibly kernelized) f(x)=w T x+w 0 Use the є -sensitive error function

20

Kernel Regression  Polynomial kernel  Gaussian kernel 21

Kernel Machines for Ranking 22  We require not only that scores be correct order but at least +1 unit margin.  Linear case:

One-Class Kernel Machines 23  Consider a sphere with center a and radius R

24

Large Margin Nearest Neighbor 25  Learns the matrix M of Mahalanobis metric D(x i, x j )=(x i -x j ) T M(x i -x j )  For three instances i, j, and l, where i and j are of the same class and l different, we require D(x i, x l ) > D(x i, x j )+1 and if this is not satisfied, we have a slack for the difference and we learn M to minimize the sum of such slacks over all i,j,l triples (j and l being one of k neighbors of i, over all i)

 LMNN algorithm (Weinberger and Saul 2009)  LMCA algorithm (Torresani and Lee 2007) uses a similar approach where M=L T L and learns L Learning a Distance Measure 26

Kernel Dimensionality Reduction 27  Kernel PCA does PCA on the kernel matrix (equal to canonical PCA with a linear kernel)  Kernel LDA, CCA