Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Support Vector Machines
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Separating Hyperplanes
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Non-Bayes classifiers. Linear discriminants, neural networks.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab 5 posted, due Weds, Jan 13. Sunset detector posted,
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation;
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
CS 9633 Machine Learning Support Vector Machines
Lecture 19. SVM (III): Kernel Formulation
An Introduction to Support Vector Machines
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Multi-layer perceptron
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support vector machines
Multi-layer perceptron
CSSE463: Image Recognition Day 14
Support Vector Machines
CSSE463: Image Recognition Day 14
Usman Roshan CS 675 Machine Learning
Support vector machines
CSSE463: Image Recognition Day 15
Support vector machines
CSSE463: Image Recognition Day 14
Support Vector Machines 2
Presentation transcript:

Linear hyperplanes as classifiers Usman Roshan

Hyperplane separators

Hyperplane separators w

Hyperplane separators w

Hyperplane separators x xpxp w r

Hyperplane separators x xpxp w r

Nearest mean as hyperplane separator m1m1 m2m2

Nearest mean as hyperplane separator m1m1 m2m2 m 1 + (m 2 -m 1 )/2

Nearest mean as hyperplane separator m1m1 m2m2

Separating hyperplanes

Perceptron

Gradient descent

Perceptron training

Perceptron training

Perceptron training by gradient descent

Obtaining probability from hyperplane distances

Multilayer perceptrons Many perceptrons with hidden layer Can solve XOR and model non-linear functions Leads to non-convex optimization problem solved by back propagation

Back propagation Ilustration of back propagation – /~vlsi/AI/backp_t_en/b ackprop.htmlhttp://home.agh.edu.pl /~vlsi/AI/backp_t_en/b ackprop.html Many local minima

Training issues for multilayer perceptrons Convergence rate –Momentum Adaptive learning Overtraining –Early stopping

Separating hyperplanes y x For two sets of points there are many hyperplane separators Which one should we choose for classification? In other words which one is most likely to produce least error?

Separating hyperplanes Best hyperplane is the one that maximizes the minimum distance of all training points to the plane (Learning with kernels, Scholkopf and Smola, 2002) Its expected error is at most the fraction of misclassified points plus a complexity term (Learning with kernels, Scholkopf and Smola, 2002)

Margin of a plane We define the margin as the minimum distance to training points (distance to closest point) The optimally separating plane is the one with the maximum margin

Optimally separating hyperplane y x w

How do we find the optimally separating hyperplane? Recall distance of a point to the plane defined earlier

Hyperplane separators x xpxp w r

Distance of a point to the separating plane And so the distance to the plane r is given by or where y is -1 if the point is on the left side of the plane and +1 otherwise.

Support vector machine: optimally separating hyperplane Distance of point x (with label y) to the hyperplane is given by We want this to be at least some value By scaling w we can obtain infinite solutions. Therefore we require that So we minimize ||w|| to maximize the distance which gives us the SVM optimization problem.

Support vector machine: optimally separating hyperplane SVM optimization criterion We can solve this with Lagrange multipliers. That tells us that The x i for which  i is non-zero are called support vectors.

Support vector machine: optimally separating hyperplane

Inseparable case What is there is no separating hyperplane? For example XOR function. One solution: consider all hyperplanes and select the one with the minimal number of misclassified points Unfortunately NP-complete (see paper by Ben-David, Eiron, Long on course website) Even NP-complete to polynomially approximate (Learning with kernels, Scholkopf and Smola, and paper on website)

Inseparable case But if we measure error as the sum of the distance of misclassified points to the plane then we can solve for a support vector machine in polynomial time Roughly speaking margin error bound theorem applies (Theorem 7.3, Scholkopf and Smola) Note that total distance error can be considerably larger than number of misclassified points

Optimally separating hyperplane with errors y x w

Support vector machine: optimally separating hyperplane In practice we allow for error terms in case there is no hyperplane.

SVM software Plenty of SVM software out there. Two popular packages: –SVM-light –LIBSVM

Kernels What if no separating hyperplane exists? Consider the XOR function. In a higher dimensional space we can find a separating hyperplane Example with SVM-light

Kernels The solution to the SVM is obtained by applying KKT rules (a generalization of Lagrange multipliers). The problem to solve becomes

Kernels The previous problem can be solved in turn again with KKT rules. The dot product can be replaced by a matrix K(i,j)=x i T x j or a positive definite matrix K.

Kernels With the kernel approach we can avoid explicit calculation of features in high dimensions How do we find the best kernel? Multiple Kernel Learning (MKL) solves it for K as a linear combination of base kernels.