Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
CSCE822 Data Mining and Warehousing
Machine learning continued Image source:
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Radial Basis-Function Networks. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
CS 4700: Foundations of Artificial Intelligence
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
An Introduction to Support Vector Machine (SVM)
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CpSc 810: Machine Learning Support Vector Machine.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
CSSE463: Image Recognition Day 14
SVMs for Document Ranking
Presentation transcript:

Support Vector Machines

RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

Gaussian response function Each hidden layer unit computes x = an input vector u = weight vector of hidden layer neuron i

Location of centers u The location of the receptive field is critical Apply clustering to the training set each determined cluster center would correspond to a center u of a receptive field of a hidden neuron

Determining  Following heuristic will perform well in practice For each hidden layer neuron, find the RMS distance between ui and the center of its N nearest neighbors cj Assign this value to i

The output neuron produces the linear weighted sum The weights have to be adopted (LMS)

Why does a RBF network work? The hidden layer applies a nonlinear transformation from the input space to the hidden space In the hidden space a linear discrimination can be performed f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( )

Support Vector Machines Linear machine Constructs a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is maximized Good generalization performance Support vector learning algorithm may construct following three learning machines Polynominal learning machines Radial-Basis functions networks Two-layer perceptrons

Two Class Problem: Linear Separable Case Many decision boundaries can separate these two classes Which one should we choose? Class 2 Perceptron learning rule can be used to find any decision boundary between class 1 and class 2 Class 1

Example of Bad Decision Boundaries Class 2 Class 2 Class 1 Class 1

Good Decision Boundary: Margin Should Be Large The decision boundary should be as far away from the data of both classes as possible We should maximize the margin, m w/||w|| * (x1-x2) = 2/||w|| Class 2 m Class 1

The Optimization Problem Let {x1, ..., xn} be our data set and let yi  {1,-1} be the class label of xi The decision boundary should classify all points correctly  A constrained optimization problem

The Optimization Problem Introduce Lagrange multipliers , Lagrange function: Minimized with respect to w and b

The Optimization Problem We can transform the problem to its dual This is a quadratic programming (QP) problem Global maximum of ai can always be found w can be recovered by Let x(1) and x(-1) be two S.V. Then b = -1/2( w^T x(1) + w^T x(-1) )

A Geometrical Interpretation Class 2 a8=0.6 a7=0 a2=0 a5=0 a1=0.8 a4=0 So, if change internal points, no effect on the decision boundary a6=1.4 a9=0 a3=0 Class 1

How About Not Linearly Separable We allow “error” xi in classification Class 2 Class 1

Soft Margin Hyperplane Define xi=0 if there is no error for xi xi are just “slack variables” in optimization theory We want to minimize C : tradeoff parameter between error and margin The optimization problem becomes

The Optimization Problem The dual of the problem is w is also recovered as The only difference with the linear separable case is that there is an upper bound C on ai Once again, a QP solver can be used to find ai Note also, everything is done by inner-products

Extension to Non-linear Decision Boundary Key idea: transform xi to a higher dimensional space to “make life easier” Input space: the space xi are in Feature space: the space of f(xi) after transformation Why transform? Linear operation in the feature space is equivalent to non-linear operation in input space The classification task can be “easier” with a proper transformation. Example: XOR XOR: x_1, x_2, and we want to transform to x_1^2, x_2^2, x_1 x_2 It can also be viewed as feature extraction from the feature vector x, but now we extract more feature than the number of features in x.

Extension to Non-linear Decision Boundary Possible problem of the transformation High computation burden and hard to get a good estimate SVM solves these two issues simultaneously Kernel tricks for efficient computation Minimize ||w||2 can lead to a “good” classifier f( ) f( ) f( ) f( ) f( ) f(.) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Feature space Input space

Example Transformation Define the kernel function K (x,y) as Consider the following transformation The inner product can be computed by K without going through the map f(.)

Kernel Trick The relationship between the kernel function K and the mapping f(.) is This is known as the kernel trick In practice, we specify K, thereby specifying f(.) indirectly, instead of choosing f(.) Intuitively, K (x,y) represents our desired notion of similarity between data x and y and this is from our prior knowledge K (x,y) needs to satisfy a technical condition (Mercer condition) in order for f(.) to exist

Examples of Kernel Functions Polynomial kernel with degree d Radial basis function kernel with width s Closely related to radial basis function neural networks Sigmoid with parameter k and q It does not satisfy the Mercer condition on all k and q Research on different kernel functions in different applications is very active Despite violating Mercer condition, the sigmoid kernel function can still work

Multi-class Classification SVM is basically a two-class classifier One can change the QP formulation to allow multi-class classification More commonly, the data set is divided into two parts “intelligently” in different ways and a separate SVM is trained for each way of division Multi-class classification is done by combining the output of all the SVM classifiers Majority rule Error correcting code Directed acyclic graph

Conclusion SVM is a useful alternative to neural networks Two key concepts of SVM: maximize the margin and the kernel trick Many active research is taking place on areas related to SVM Many SVM implementations are available on the web for you to try on your data set!

Measuring Approximation Accuracy Comparing its output with correct values Mean squared Error F(w) of the network D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

Bibliography Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999