Support Vector Machine (SVM) Classification

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Separating Hyperplanes
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Support Vector Machines Kernel Machines
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVM Support Vectors Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Support Vector Machines
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
SVM by Sequential Minimal Optimization (SMO)
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Classification Slides by Greg Grudic, CSCI 3202 Fall 2007
CSSE463: Image Recognition Day 14
Support vector machines
Machine Learning Week 3.
Support Vector Machines
Support vector machines
Support vector machines
Presentation transcript:

Support Vector Machine (SVM) Classification Greg Grudic Greg Grudic Intro AI

Last Class Linear separating hyperplanes for binary classification Rosenblatt’s Perceptron Algorithm Based on Gradient Descent Convergence theoretically guaranteed if data is linearly separable Infinite number of solutions For nonlinear data: Mapping data into a nonlinear space where it is linearly separable (or almost) However, convergence still not guaranteed… Greg Grudic Intro AI

Questions? Greg Grudic Intro AI

Why Classification? Uncertainty world Signals Sensing Actions Computation Not typically addressed in CS State Symbols (The Grounding Problem) Decisions/Planning Agent Greg Grudic Introduction to AI

The Problem Domain for Project Test 1: Identifying (and Navigating) Paths Non-path Data Data Construct a Classifier Classifier Data Path labeled Image 4/17/2017 Intro AI

Today’s Lecture Goals Support Vector Machine (SVM) Classification Another algorithm for linear separating hyperplanes A Good text on SVMs: Bernhard Schölkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002 Greg Grudic Intro AI

Support Vector Machine (SVM) Classification Classification as a problem of finding optimal (canonical) linear hyperplanes. Optimal Linear Separating Hyperplanes: In Input Space In Kernel Space Can be non-linear Greg Grudic Intro AI

Linear Separating Hyper-Planes How many lines can separate these points? Which line should we use? NO! Greg Grudic Intro AI

Initial Assumption: Linearly Separable Data Greg Grudic Intro AI

Linear Separating Hyper-Planes Greg Grudic Intro AI

Linear Separating Hyper-Planes Given data: Finding a separating hyperplane can be posed as a constraint satisfaction problem (CSP): Or, equivalently: If data is linearly separable, there are an infinite number of hyperplanes that satisfy this CSP Greg Grudic Intro AI

The Margin of a Classifier Take any hyper-plane (P0) that separates the data Put a parallel hyper-plane (P1) on a point in class 1 closest to P0 Put a second parallel hyper-plane (P2) on a point in class -1 closest to P0 The margin (M) is the perpendicular distance between P1 and P2 Greg Grudic Intro AI

Calculating the Margin of a Classifier P2 P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI

SVM Constraints on the Model Parameters Model parameters must be chosen such that, for on P1 and for on P2: For any P0, these constraints are always attainable. Given the above, then the linear separating boundary lies half way between P1 and P2 and is given by: Resulting Classifier: Greg Grudic Intro AI

Remember: signed distance from a point to a hyperplane: Hyperplane define by: Greg Grudic Intro AI

Calculating the Margin (1) Greg Grudic Intro AI

Calculating the Margin (2) Signed Distance Take absolute value to get the unsigned margin: Greg Grudic Intro AI

Different P0’s have Different Margins P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI

Different P0’s have Different Margins P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI

Different P0’s have Different Margins P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class P2 P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI

How Do SVMs Choose the Optimal Separating Hyperplane (boundary)? Find the that maximizes the margin! P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Greg Grudic Intro AI

SVM: Constraint Optimization Problem Given data: Minimize subject to: The Lagrange Function Formulation is used to solve this Minimization Problem Greg Grudic Intro AI

The Lagrange Function Formulation For every constraint we introduce a Lagrange Multiplier: The Lagrangian is then defined by: Where - the primal variables are - the dual variables are Goal: Minimize Lagrangian w.r.t. primal variables, and Maximize w.r.t. dual variables Intro AI Greg Grudic

Derivation of the Dual Problem At the saddle point (extremum w.r.t. primal) This give the conditions Substitute into to get the dual problem Greg Grudic Intro AI

Using the Lagrange Function Formulation, we get the Dual Problem Maximize Subject to Greg Grudic Intro AI

Properties of the Dual Problem Solving the Dual gives a solution to the original constraint optimization problem For SVMs, the Dual problem is a Quadratic Optimization Problem which has a globally optimal solution Gives insights into the NON-Linear formulation for SVMs Greg Grudic Intro AI

Support Vector Expansion (1) is also computed from the optimal dual variables Greg Grudic Intro AI

Support Vector Expansion (2) Substitute OR Greg Grudic Intro AI

What are the Support Vectors? Maximized Margin Greg Grudic Intro AI

Why do we want a model with only a few SVs? Leaving out an example that does not become an SV gives the same solution! Theorem (Vapnik and Chervonenkis, 1974): Let be the number of SVs obtained by training on N examples randomly drawn from P(X,Y), and E be an expectation. Then Greg Grudic Intro AI

Next Class Nonlinear data Now a quiz! Greg Grudic Intro AI