Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Introduction to Support Vector Machines (SVM)
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSE 4705 Artificial Intelligence
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1 New Horizon in Machine Learning — Support Vector Machine for non-Parametric Learning Zhao Lu, Ph.D. Associate Professor Department of Electrical Engineering,
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support vector machines
Machine Learning Week 3.
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
Presentation transcript:

Kernels Usman Roshan CS 675 Machine Learning

Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane

Feature space representation Suppose we square each coordinate In other words (x 1, x 2 ) => (x 1 2, x 2 2 ) Now the data are well separated

Feature spaces/Kernel trick Using a linear classifier (nearest means or SVM) we solve a non-linear problem simply by working in a different feature space. With kernels – we don’t have to make the new feature space explicit. – we can implicitly work in a different space and efficiently compute dot products there.

Support vector machine Consider the hard margin SVM optimization Solve by applying KKT. Think of KKT as a tool for constrained convex optimization. Form Lagrangian

Support vector machine KKT says the optimal w and w 0 are given by the saddle point solution And KKT conditions imply that and

Support vector machine After applying the Lagrange multipliers we obtain the dual by substituting w into the primal (dual is maximized)

SVM and kernels We can rewrite the dual in a compact form:

Optimization The SVM is thus a quadratic program that can be solved by any quadratic program solver. Platt’s Sequential Minimization Optimization (SMO) algorithm offers a simple specific solution to the SVM dual Idea is to perform coordinate ascent by selecting two variables at a time to optimize Let’s look at some kernels.

Example kernels Polynomial kernels of degree d give a feature space with higher order non-linear terms Radial basis kernel gives infinite dimensional space (Taylor series)

Example kernels Empirical kernel map – Define a set of reference vectors for – Define a score between x i and m j – Then – And

Example kernels Bag of words – Given two documents D 1 and D 2 the we define the kernel K(D 1,D 2 ) as the number of words in common – To prove this is a kernel first create a large set of words W i. Define the mapping Φ(D 1 ) as a high dimensional vector where Φ(D 1 )[i] is 1 if the word W i is present in the document.

SVM and kernels What if we make the kernel matrix K a variable and optimize the dual But now there is no way to tie the kernel matrix to the training data points.

SVM and kernels To tie the kernel matrix to training data we assume that the kernel to be determined is a linear combination of some existing base kernels. Now we have a problem that is not a quadratic program anymore. Instead we have a semi-definite program (Lanckriet et. al. 2002)

Theoretical foundation Recall the margin error theorem (7.3 from Learning with kernels)

Theoretical foundation The kernel analogue of Theorem 7.3 from Lackriet et. al. 2002:

How does MKL work in practice? Gonnen and Alpaydin, JMLR, 2011 Datasets: – Digit recognition, – Internet advertisements – Protein folding Form kernels with different sets of features Apply SVM with various kernel learning algorithms.

How does MKL work in practice? From Gonnen and Alpaydin, JMLR, 2011

How does MKL work in practice? From Gonnen and Alpaydin, JMLR, 2011

How does MKL work in practice? From Gonnen and Alpaydin, JMLR, 2011

How does MKL work in practice? MKL better than single kernel Mean kernel hard to beat Non-linear MKL looks promising