CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction to Support Vector Machine (SVM)
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Today’s Topics 11/17/15CS Fall 2015 (Shavlik©), Lecture 23, Week 111 Review of the linear SVM with Slack Variables Kernels (for non-linear models)
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
SVMs for Document Ranking
Presentation transcript:

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based on maximizing the notion of a margin –Based on PAC learning Has mechanisms for –Noise –Non-linear separating surfaces (kernel functions) Notes based on those of Prof. Jude Shavlik

CS 8751 ML & KDDSupport Vector Machines2 A+ A- Find the best separating plane in feature space - many possibilities to choose from which is the best choice?

CS 8751 ML & KDDSupport Vector Machines3 SVMs – The General Idea How to pick the best separating plane? Idea: –Define a set of inequalities we want to satisfy –Use advanced optimization methods (e.g., linear programming) to find satisfying solutions Key issues: –Dealing with noise –What if no good linear separating surface?

CS 8751 ML & KDDSupport Vector Machines4 Linear Programming Subset of Math Programming Problem has the following form: function f(x 1,x 2,x 3,…,x n ) to be maximized subject to a set of constraints of the form: g(x 1,x 2,x 3,…,x n ) > b Math programming - find a set of values for the variables x 1,x 2,x 3,…,x n that meets all of the constraints and maximizes the function f Linear programming - solving math programs where the constraint functions and function to be maximized use linear combinations of the variables –Generally easier than general Math Programming problem –Well studied problem

CS 8751 ML & KDDSupport Vector Machines5 Maximizing the Margin A+ A- the decision boundary The margin between categories - want this distance to be maximal - (we’ll assume linearly separable for now)

CS 8751 ML & KDDSupport Vector Machines6 PAC Learning PAC – Probably Approximately Correct learning Theorems that can be used to define bounds for the risk (error) of a family of learning functions Basic formula, with probability (1 -  ): R – risk function,  is the parameters chosen by the learner, N is the number of data points, and h is the VC dimension (something like an estimate of the complexity of the class of functions)

CS 8751 ML & KDDSupport Vector Machines7 Margins and PAC Learning Theorems connect PAC theory to the size of the margin Basically, the larger the margin, the better the expected accuracy See, for example, Chapter 4 of Support Vector Machines by Christianini and Shawe-Taylor, Cambridge University Press, 2002

CS 8751 ML & KDDSupport Vector Machines8 Some Equations 1s result from dividing through by a constant for convenience Euclidean length (“2 norm”) of the weight vector

CS 8751 ML & KDDSupport Vector Machines9 What the Equations Mean A+ A- Support Vectors Margin x´w = γ + 1 x´w = γ / ||w|| 2

CS 8751 ML & KDDSupport Vector Machines10 Choosing a Separating Plane A+ A- ?

CS 8751 ML & KDDSupport Vector Machines11 Our “Mathematical Program” (so far) for technical reasons easier to optimize this “quadratic program”

CS 8751 ML & KDDSupport Vector Machines12 Dealing with Non-Separable Data We can add what is called a “slack” variable to each example This variable can be viewed as: 0if the example is correctly separated y“distance” we need to move example to make it correct (i.e., the distance from its surface)

CS 8751 ML & KDDSupport Vector Machines13 “Slack” Variables A+ A- y Support Vectors

CS 8751 ML & KDDSupport Vector Machines14 The Math Program with Slack Variables This is the “traditional” Support Vector Machine

CS 8751 ML & KDDSupport Vector Machines15 Why the word “Support”? All those examples on or on the wrong side of the two separating planes are the support vectors –We’d get the same answer if we deleted all the non- support vectors! –i.e., the “support vectors [examples]” support the solution

CS 8751 ML & KDDSupport Vector Machines16 PAC and the Number of Support Vectors The fewer the support vectors, the better the generalization will be Recall, non-support vectors are –Correctly classified –Don’t change the learned model if left out of the training set So

CS 8751 ML & KDDSupport Vector Machines17 Finding Non-Linear Separating Surfaces Map inputs into new space Example: features x 1 x Example: features x 1 x 2 x 1 2 x 2 2 x 1 *x Solve SVM program in this new space –Computationally complex if many features –But a clever trick exists

CS 8751 ML & KDDSupport Vector Machines18 The Kernel Trick Optimization problems often/always have a “primal” and a “dual” representation –So far we’ve looked at the primal formulation –The dual formulation is better for the case of a non- linear separating surface

CS 8751 ML & KDDSupport Vector Machines19 Perceptrons Re-Visited

CS 8751 ML & KDDSupport Vector Machines20 Dual Form of the Perceptron Learning Rule

CS 8751 ML & KDDSupport Vector Machines21 Primal versus Dual Space Primal – “weight space” –Weight features to make output decision Dual – “training-examples space” –Weight distance (which is based on the features) to training examples

CS 8751 ML & KDDSupport Vector Machines22 The Dual SVM

CS 8751 ML & KDDSupport Vector Machines23 Non-Zero α i ’s

CS 8751 ML & KDDSupport Vector Machines24 Generalizing the Dot Product

CS 8751 ML & KDDSupport Vector Machines25 The New Space for a Sample Kernel Our new feature space (with 4 dimensions) - we’re doing a dot product in it

CS 8751 ML & KDDSupport Vector Machines26 g(+) Visualizing the Kernel Input Space Original Space Separating plane (non-linear here but linear in derived space) g(+) g(-) Derived Feature Space New Space g(+) g(-) g() is feature transformation function process is similar to what hidden units do in ANNs but kernel is user chosen

CS 8751 ML & KDDSupport Vector Machines27 More Sample Kernels

CS 8751 ML & KDDSupport Vector Machines28 What Makes a Kernel

CS 8751 ML & KDDSupport Vector Machines29 Key SVM Ideas Maximize the margin between positive and negative examples (connects to PAC theory) Penalize errors in non-separable case Only the support vectors contribute to the solution Kernels map examples into a new, usually non- linear space –We implicitly do dot products in this new space (in the “dual” form of the SVM program)