1 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Machine learning, pattern recognition and statistical data modelling Lecture 9. Support.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Lecture 9 Support Vector Machines
INTRODUCTION TO Machine Learning 2nd Edition
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Kernels in Pattern Recognition. A Langur - Baboon Binary Problem m/2006/ /himplu s4.jpg … HA HA HA …
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Chapter 10 The Support Vector Method For Estimating Indicator Functions Intelligent Information Processing Laboratory, Fudan University.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Support vector machines
PREDICT 422: Practical Machine Learning
Computational Intelligence: Methods and Applications
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CS 2750: Machine Learning Support Vector Machines
Support vector machines
Machine Learning Week 3.
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Presentation transcript:

1 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Machine learning, pattern recognition and statistical data modelling Lecture 9. Support vector machines Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Separating hyperplanes

3 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Separable problem

4 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Separating hyperplanes

5 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Lagrangian formulation

6 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Lagrangian dual

7 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Applying the classifier

8 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Non-separable case Lagrangian for separable case now has no solution L D grows arbitrarily large

9 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Non-separable case

10 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Non-separable case

11 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Non-separable case

12 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Nonlinear functions via kernels

13 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Nonlinear functions via kernels

14 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Support Vector Machines ● for nonlinear, multidimensional two class problems ● implicit projection into a higher (possibly infinite) dim. space – where we hope to get closer to linear separation ● free parameters – problem is a convex linear programming problem – so the solution is unique (still needs numerical methods) ● solution is a combination of the support vectors (only) – solution is sparse in these, so suitable for high-dim. problems with limited training data (does not have scaling problem like an MLP) ● two control parameters – C controls regularization – Generally: larger C => error term dominates => less regularization – 1/ is a correlation/relevance length scale in the data space

15 C.A.L. Bailer-Jones. Machine Learning. Support vector machines The margin, cost and no. of SVs in the non-separable case We are still trying to maximize the margin width, even though the margin is no longer defined as the region void of data points (as in the separable case). Rather, the margin is defined as the region which contains all points relevant to (i.e. included in) the solution, that is, the support vectors, even if they are on the correct side of the separating hyperplane. However, support vectors only contribute to the errors (i.e. have >1) if they are on the wrong side of the separating hyperplane. We are maximizing the margin width with a cost penalty. If we increase the cost, C, then errors incur a higher penalty and so tend to make the margin smaller. If the separating hyperplane were fixed this would generally reduce the number of support vectors. But of course it is not fixed (we change C in order to find a different solution!), so the support vectors, and their number, will change, and the new margin is not necessarily narrower. Thus the number of support vectors and size of the margin in the solution do not have a monotonic dependence on C (although they might for some problems). Generally, a larger cost produces a more flexible solution (as we are encouraging fewer error) and thus a smaller error on the training data, and we would normally associate a more complex solution with more support vectors. But this is depends very much on the specific problem, i.e. how well the data can be separated. Moreover, there is no guarantee that a larger cost produces a better solution.

16 C.A.L. Bailer-Jones. Machine Learning. Support vector machines SVM application: star-galaxy separation with PS1 blue: stellar library red: galaxy library ● 4 noise-free simulated colours ● package libsvm – C++, Java – R import in svm{e1071} ● see R scripts for example and syntax

17 C.A.L. Bailer-Jones. Machine Learning. Support vector machines SVM application For i-z = 0.2 z-y = 0.2

18 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Multiclass problems ● with K classes, either solve – K models, each of type 1 class vs. combined K-1 classes, or – K(K-1)/2 models, each with 1 class vs. 1 class (“pairwise coupling”)

19 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Summary ● SVMs operate on principle of separating hyperplanes – maximize margin – only data points near margin are relevant (the support vectors) ● Nonlinearity via kernels – possible because training data appear only as dot products – project data into a higher dimensional space where more separation is possible – projection is only implicit (big computational saving) ● Unique solution (convex objective function) – optimization via the Lagrangain dual ● Various extensions – multiclass case. No natural probabilistic interpretation, but we can remap outputs – regression