CHAPTER 13: Alpaydin: Kernel Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Machine Learning Math Essentials Part 2
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
CHAPTER 6: Dimensionality Reduction Author: Christoph Eick The material is mostly based on the Shlens PCA Tutorial
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
INTRODUCTION TO Machine Learning 2nd Edition
S UPPORT V ECTOR M ACHINES Jianping Fan Dept of Computer Science UNC-Charlotte.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Linear Classifiers/SVMs

CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines and Kernel Methods
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Support Vector Machines
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Outline Separating Hyperplanes – Separable Case
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Copyright 2005 by David Helmbold1 Support Vector Machines (SVMs) References: Cristianini & Shawe-Taylor book; Vapnik’s book; and “A Tutorial on Support.
SUPPORT VECTOR MACHINES
PREDICT 422: Practical Machine Learning
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Support Vector Machines Most of the slides were taken from:
Principal Component Analysis
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
COSC 4368 Machine Learning Organization
Linear Discrimination
INTRODUCTION TO Machine Learning
SVMs for Document Ranking
Introduction to Machine Learning
Presentation transcript:

CHAPTER 13: Alpaydin: Kernel Machines Significantly edited and extended by Ch. Eick COSC 6342: Support Vectors and using SVMs/Kernels for Regression, PCA, and Outlier Detection Coverage in Spring 2011: Transparencies for which it does not say “cover” will be skipped!

cover Kernel Machines Discriminant-based: No need to estimate densities first; focus is learning the decision boundary and not on learning a large number of parameters of a density function. Define the discriminant in terms of support vectors a subset of the training examples The use of kernel functions, application-specific measures of similarity. Many kernels map the data to a higher dimensional space for which linear discrimination is simpler! No need to represent instances as vectors; can deal with other types e.g. graphs, sequences in bioinformatics which assume edit distance. Convex optimization problems with a unique solution Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Optimal Separating Hyperplane (Cortes and Vapnik, 1995; Vapnik, 1995) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ||w||=1, and to max margin Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Margin cover 2/||w|| Remark: Circled points are support vectors Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Alternative Formulation of the optimization problem: at>0 only cover Alternative Formulation of the optimization problem: at>0 only support vectors are relevant for determining the hyperplane  this can be used to reduce the complexity of the SVM optimization procedure by only using support vectors instead the whole dataset for it. If you are interested in understanding the mathematical details: Read paper on PCA regression Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Most αt are 0 and only a small number have αt >0; they are the support vectors Idea: If we remove all examples which are not support vectors from the dataset we still obtain the same hyperplane, but can do so more quickly!

Soft Margin Hyperplane Not linearly separable Soft error New primal is Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Soft Margin SVM cover Blue Class Hyperplane SVM Decision Boundary Red Class Hyperplane Indicate errors; note that points which on the correct side of each class’ hyperplane have an error of 0. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multiclass Kernel Machines cover Multiclass Kernel Machines 1-vs-all (“popular” choice) Pairwise separation Error-Correcting Output Codes (section 17.5) Single multiclass optimization Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

COSC 6342: Using SVMs for Regression, PCA, and Outlier Detection

Example Flatness in Prediction cover Example Flatness in Prediction For example, let us assume we predict the price of a house based on the number of rooms, and we have 2 functions: f1: #rooms*10000 + 10000 f2: #rooms*20000 -10000 Both agree in their prediction for a two room house costing 30000 f1 is flatter than f2; f1 is less sensitive to noise Typically, flatness is measured using ||w|| which is 20000 for f2 and 10000 for f1; the lower ||w|| is the flatter f is… Consequently, ||w|| is minimized in support vector regression; however, in most cases, ||w||2 is minimized instead to get rid of the sqrt-function. Reminder: ||w||=sqrt(ww)=sqrt(w*wT)

SVM for Regression Use a linear model (possibly kernelized) cover SVM for Regression “Flatness“ of f; the smaller ||w||, the smoother f is / the less sensitive f is to noise; also sometimes called regularization. Remember: Dataset={ (x1,r1),..,(xn,rn)} Use a linear model (possibly kernelized) f(x)=wTx+w0 Use the є-sensitive error function subject to: for t=1,..,n Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

SVMs for Regressions cover For a more thorough discussion see: http://www2.cs.uh.edu/~ceick/ML/SVM-Regression.pdf or for a more high-level discussion see: http://kernelsvm.tripod.com/ Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Kernel Regression Polynomial kernel Gaussian kernel cover Again we can employ mappings  to a higher dimensional space and kernel functions K, because regression coefficients can be computed by just using the gram matrix for (xi)(xj). In this case we obtain regression functions which are linear in the mapped space, but not linear in the original space, as depicted above! Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

One-Class Kernel Machines for Outlier Detection cover One-Class Kernel Machines for Outlier Detection Consider a sphere with center a and radius R Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

cover Again kernel functions/mapping to a higher dimensional space can be employed in which case the class boundary shapes change as depicted. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

cover Motivation Kernel PCA Example: we want to cluster the following dataset using K-means which will be difficult; idea: change coordinate system using a few new, non-linear features. Remark: This approach uses kernels, but is unrelated to SVMs!

Space (less dimensions) cover Kernel PCA Kernel PCA does PCA on the kernel matrix (equal to doing PCA in the mapped space selecting some orthogonal eigenvectors in the mapped space as the new coordinate system) Kind of PCA using non-linear transformations in the original space, moreover, the vectors of the chosen new coordinate system are usually not orthogonal in the original space. Then, ML/DM algorithms are used in the Reduced Feature Space. Reduced Feature Space (less dimensions) Original Space Feature Space features are a few linear combinations of features in the Feature Space  PCA Illustration: http://en.wikipedia.org/wiki/Kernel_principal_component_analysis