Support Vector Machines and Kernel Methods

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Introduction to Support Vector Machines (SVM)
Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

Support Vector Machines
Support vector machine
Machine learning continued Image source:
Separating Hyperplanes
Discriminative and generative methods for bags of features
Support Vector Machines
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction to Support Vector Machine (SVM)
SVM – Support Vector Machines Presented By: Bella Specktor.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
CS 9633 Machine Learning Support Vector Machines
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Geometrical intuition behind the dual problem
Lecture 19. SVM (III): Kernel Formulation
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
SVMs for Document Ranking
Presentation transcript:

Support Vector Machines and Kernel Methods Kenan Gençol Department of Electrical and Electronics Engineering Anadolu University submitted in the course MAT592 Seminar Advisor: Prof. Dr. Yalçın Küçük Department of Mathematics

Agenda Linear Discriminant Functions and Decision Hyperplanes Introduction to SVM Support Vector Machines Introduction to Kernels Nonlinear SVM Kernel Methods

Linear Discriminant Functions and Decision Hyperplanes Figure 1. Two classes of patterns and a linear decision function

Linear Discriminant Functions and Decision Hyperplanes Each pattern is represented by a vector Linear decision function has the equation where w1,w2 are weights and w0 is the bias term

Linear Discriminant Functions and Decision Hyperplanes The general decision hyperplane equation in d-dimensional space has the form: where w = [w1 w2 ....wd] is the weight vector and w0 is the bias term.

Figure 2. An example of two possible classifiers Introduction to SVM There are many hyperplanes that separates two classes Figure 2. An example of two possible classifiers

Introduction to SVM THE GOAL: Our goal is to search for direction w and bias w0 that gives the maximum possible margin, or in other words, to orientate this hyperplane in such a way as to be as far as possible from the closest members of both classes.

SVM: Linearly Separable Case Figure 3. Hyperplane through two linearly separable classes

SVM: Linearly Separable Case Our training data is of the form: This hyperplane can be described by and called separating hyperplane.

SVM: Linearly Separable Case Select variables w and b so that: These equations can be combined into:

SVM: Linearly Separable Case The points that lie closest to the separating hyperplane are called support vectors (circled points in diagram) and are called supporting hyperplanes.

SVM: Linearly Separable Case Figure 3. Hyperplane through two linearly separable classes (repeated)

SVM: Linearly Separable Case The hyperplane’s equidistance from H1 and H2 means that d1= d2 and this quantity is known as SVM Margin: d1+ d2 = d1= d2=

SVM: Linearly Separable Case Maximizing  Minimizing min such that yi(xi . w + b) -1 >= 0 Minimizing is equivalent to minimizing to perform Quadratic Programming (QP) optimization

SVM: Linearly Separable Case Optimization problem: Minimize subject to

SVM: Linearly Separable Case This is an inequality constrained optimization problem with Lagrangian function: where αi >= 0 i=1,2,....,L are Lagrange multipliers. (1)

SVM The corresponding KKT conditions are: (2) (3)

SVM This is a convex optimization problem.The cost function is convex and the set of constraints are linear and define a convex set of feasible solutions. Such problems can be solved by considering the so called Lagrangian Duality

SVM Substituing (2) and (3) gives a new formulation which being dependent on α, we need to maximize.

SVM This is called Dual form (Lagrangian Dual) of the primary form. Dual form only requires the dot product of each input vector to be calculated. This is important for the Kernel Trick which will be described later.

SVM So the problem becomes a dual problem: Maximize subject to

SVM Differentiating with respect to αi ‘s and using the constraint equation, a system of equations is obtained. Solving the system, the Lagrange multipliers are found and optimum hyperplane is given according to the formula:

SVM SUPPORT VECTORS are the feature vectors for αi > 0 i=1,2,....,L Some Notes: SUPPORT VECTORS are the feature vectors for αi > 0 i=1,2,....,L The cost function is strictly convex. Hessian matrix is positive definite. Any local minimum is also global and unique. The optimal hyperplane classifier of a SVM is UNIQUE. Although the solution is unique, the resulting Lagrange multipliers are not unique.

Kernels: Introduction When applying our SVM to linearly separable data we have started by creating a matrix H from the dot product of our input variables: being known as Linear Kernel, an example of a family of functions called Kernel functions.

Kernels: Introduction The set of kernel functions are all based on calculating inner products of two vectors. This means if the function is mapped to a higher dimensionality space by a nonlinear mapping function only the inner products of the mapped inputs need to be determined without needing to explicitly calculate Ф . This is called “Kernel Trick”

Kernels: Introduction Kernel Trick is useful because there are many classification/regression problems that are not fully separable/regressable in the input space but separable/regressable in a higher dimensional space.

Kernels: Introduction Popular Kernel Families: Radial Basis Function (RBF) Kernel Polynomial Kernel Sigmodial (Hyperbolic Tangent) Kernel

Nonlinear Support Vector Machines The support vector machine with kernel functions becomes: and the resulting classifier:

Nonlinear Support Vector Machines Figure 4. The SVM architecture employing kernel functions.

Kernel Methods Recall that a kernel function computes the inner product of the images under an embedding of two data points is a kernel if 1. k is symmetric: k(x,y) = k(y,x) 2. k is positive semi-definite, i.e., the “Gram Matrix” Kij = k(xi,xj) is positive semi-definite.

Kernel Methods The answer for which kernels does there exist a pair {H,φ}, with the properties described above, and for which does there not is given by Mercer’s condition.

Mercer’s condition Let be a compact subset of and let and a mapping where H is an Euclidean space. Then the inner product operation has an equivalent representation and is a symmetric function satisfying the following condition for any , such that

Mercer’s Theorem Theorem. Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis {ei}i of L2[a, b] consisting of eigenfunctions of TK  such that the corresponding sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation where the convergence is absolute and uniform.

Kernel Methods Suppose k1and k2 are valid (symmetric, positive definite) kernels on X. Then the following are valid kernels: 1. 2. 3.

Kernel Methods 4. 5. 6. 7.

References [1] C.J.C. Burges, “Tutorial on support vector machines for pattern recognition”, Data Mining and Knowledge Discovery 2, 121-167, 1998. [2] Marques de Sa, J.P., “Pattern Recognition Concepts,Methods and Applications”, Springer, 2001. [3] S. Theodoridis, “Pattern Recognition”, Elsevier Academic Press, 2003.

References [4] T. Fletcher, “Support Vector Machines Explained”, UCL, March,2005. [5] Cristianini,N., Shawe-Taylor,J., “Kernel Methods for Pattern Analysis”, Cambridge University Press, 2004. [6] “Subject Title: Mercer’s Theorem”, Wikipedia: http://en.wikipedia.org/wiki/Mercer’s_theorem

Thank You