An Introduction to Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
CSCE822 Data Mining and Warehousing
Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
CS 4700: Foundations of Artificial Intelligence
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
CpSc 810: Machine Learning Support Vector Machine.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Neural networks and support vector machines
Support Vector Machines
Support vector machines
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
Lecture 19. SVM (III): Kernel Formulation
A Simple Introduction to Support Vector Machines
Support Vector Machines (SVM)
Kernels Usman Roshan.
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
COSC 4335: Other Classification Techniques
Support vector machines
Machine Learning Week 3.
Introduction to Support Vector Machines
CSSE463: Image Recognition Day 14
Support Vector Machines
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
CSSE463: Image Recognition Day 14
COSC 4368 Machine Learning Organization
CSE 802. Prepared by Martin Law
CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

An Introduction to Support Vector Machines

CSE 802. Prepared by Martin Law Outline What is a good decision boundary for binary classification problem? From minimizing the misclassification error to maximize the margin Two classes, linearly inseparable How to deal with some noisy data How to make SVM non-linear: kernel Conclusion 11/12/2018 CSE 802. Prepared by Martin Law

Two Class Problem: Linear Separable Case The problem of minimizing the misclassification: Many decision boundaries can separate these two classes without misclassification Which one should we choose? Class 2 Perceptron learning rule can be used to find any decision boundary between class 1 and class 2 Class 1 11/12/2018 CSE 802. Prepared by Martin Law

CSE 802. Prepared by Martin Law Maximizing the margin The decision boundary should be as far away from the data of both classes as possible We should maximize the margin, m Class 2 m Class 1 11/12/2018 CSE 802. Prepared by Martin Law

The Optimization Problem Let {x1, ..., xn} be our data set and let yi Î {1,-1} be the class label of xi The decision boundary should classify all points correctly Þ A constrained optimization problem 11/12/2018 CSE 802. Prepared by Martin Law

CSE 802. Prepared by Martin Law The dual Problem We can transform the problem to its dual This is a quadratic programming (QP) problem Global maximum of ai can always be found w can be recovered by Let x(1) and x(-1) be two S.V. Then b = -1/2( w^T x(1) + w^T x(-1) ) 11/12/2018 CSE 802. Prepared by Martin Law

A Geometrical Interpretation Class 2 a10=0 a8=0.6 a7=0 a2=0 a5=0 a1=0.8 a4=0 So, if change internal points, no effect on the decision boundary a6=1.4 a9=0 a3=0 Class 1 11/12/2018 CSE 802. Prepared by Martin Law

Characteristics of the Solution Many of the ai are zero w is a linear combination of a small number of data Sparse representation xi with non-zero ai are called support vectors (SV) The decision boundary is determined only by the SV Let tj (j=1, ..., s) be the indices of the s support vectors. We can write For testing with a new data z Compute and classify z as class 1 if the sum is positive, and class 2 otherwise 11/12/2018 CSE 802. Prepared by Martin Law

CSE 802. Prepared by Martin Law Some Notes There are theoretical upper bounds on the error on unseen data for SVM The larger the margin, the smaller the bound The smaller the number of SV, the smaller the bound Note that in both training and testing, the data are referenced only as inner product, xTy This is important for generalizing to the non-linear case 11/12/2018 CSE 802. Prepared by Martin Law

How About Not Linearly Separable We allow “error” xi in classification to tolerate some noisy data Class 2 Class 1 11/12/2018 CSE 802. Prepared by Martin Law

Soft Margin Hyperplane Define xi=0 if there is no error for xi xi are just “slack variables” in optimization theory We want to minimize C : tradeoff parameter between error and margin The optimization problem becomes 11/12/2018 CSE 802. Prepared by Martin Law

The Optimization Problem The dual of the problem is w is also recovered as The only difference with the linear separable case is that there is an upper bound C on ai Once again, a QP solver can be used to find ai Note also, everything is done by inner-products 11/12/2018 CSE 802. Prepared by Martin Law

Extension to Non-linear Decision Boundary In most of the situation, the decision boundary we are looking for should NOT be a straight line. f( ) f( ) f( ) f( ) f( ) f(.) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Feature space Input space 11/12/2018 CSE 802. Prepared by Martin Law

Extension to Non-linear Decision Boundary Key idea: Use a function f(x) Transform xi to a higher dimensional space to “make life easier” Input space: the space xi are in Feature space: the space of f(xi) after transformation Searching a hyper plane in Feature space to maximize the margin. The hyper plane in Feature space correspond to a curve in input space. Why transform? We still like the idea of maximizing the margin. More powerful in mining knowledge, more flexible. XOR: x_1, x_2, and we want to transform to x_1^2, x_2^2, x_1 x_2 It can also be viewed as feature extraction from the feature vector x, but now we extract more feature than the number of features in x. 11/12/2018 CSE 802. Prepared by Martin Law

Transformation and Kernel 11/12/2018 CSE 802. Prepared by Martin Law

Kernel: Efficient computation Define the kernel function K (x,y) as Consider the following transformation In practice we don’t need to worry about the transformation function f(x), what we have to do is to select a good kernel for our problem. 11/12/2018 CSE 802. Prepared by Martin Law

Examples of Kernel Functions Polynomial kernel with degree d Radial basis function kernel with width s Closely related to radial basis function neural networks Research on different kernel functions in different applications is very active Despite violating Mercer condition, the sigmoid kernel function can still work 11/12/2018 CSE 802. Prepared by Martin Law

Summary: Steps for Classification Prepare the data matrix Select the kernel function to use Select the parameter of the kernel function and the value of C You can use the values suggested by the SVM software, or you can set apart a validation set to determine the values of the parameter Execute the training algorithm and obtain the ai Unseen data can be classified using the ai and the support vectors 11/12/2018 CSE 802. Prepared by Martin Law

Classification result of SVM 11/12/2018 CSE 802. Prepared by Martin Law

CSE 802. Prepared by Martin Law Conclusion Most popular tools for numeric binary classification Key ideas of SVM: Maximizing the margin can lead to a “good” classifier Transformation to higher space to make the classifier more flexible. Kernel tricks for efficient computation Weaknesses of SVM Need a “good” kernel function 11/12/2018 CSE 802. Prepared by Martin Law

CSE 802. Prepared by Martin Law Resources http://www.kernel-machines.org/ http://www.support-vector.net/ http://www.support-vector.net/icml-tutorial.pdf http://www.kernel-machines.org/papers/tutorial-nips.ps.gz http://www.clopinet.com/isabelle/Projects/SVM/applist.html 11/12/2018 CSE 802. Prepared by Martin Law