Fun with Hyperplanes: Perceptrons, SVMs, and Friends

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

Linear Classifiers (perceptrons)
CHAPTER 10: Linear Discrimination
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Support Vector Machine (SVM) Classification
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Linear Discriminators Chapter 20 From Data to Knowledge.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Classification: Feature Vectors
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines Optimization objective Machine Learning.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support vector machines
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
ECE 5424: Introduction to Machine Learning
Dan Roth Department of Computer and Information Science
Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines
Support Vector Machines
CS 4/527: Artificial Intelligence
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Linear Discriminators
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
CS 188: Artificial Intelligence
Statistical Learning Dong Liu Dept. EEIS, USTC.
Large Scale Support Vector Machines
CSSE463: Image Recognition Day 14
Machine Learning in Practice Lecture 26
COSC 4335: Other Classification Techniques
Machine Learning Week 3.
Advanced Artificial Intelligence Classification
Support Vector Machines
Support vector machines
COSC 4368 Machine Learning Organization
Machine Learning Support Vector Machine Supervised Learning
SVMs for Document Ranking
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Introduction to Machine Learning
Presentation transcript:

Fun with Hyperplanes: Perceptrons, SVMs, and Friends Adapted from slides by Ryan Gabbard CIS 391 – Introduction to Artificial Intelligence

Universal Machine Learning Diagram Naïve Bayes Classifiers are one example CIS 391 - Intro to AI

Generative v. Discriminative Models Generative question: “How can we model the joint distribution of the classes and the features?” Why waste energy on stuff we don’t care about? Let’s optimize the job we’re trying to do directly! Discriminative question: “What features distinguish the classes from one another?” CIS 391 - Intro to AI

chart from MIT tech report #507, Tony Jebara Example Modeling what sort of bizarre distribution produced these training points is hard, but distinguishing the classes is a piece of cake! chart from MIT tech report #507, Tony Jebara CIS 391 - Intro to AI

Linear Classification CIS 391 - Intro to AI

Why bother with this weird representation? Representing Lines How do we represent a line? In general a hyperplane is defined by Why bother with this weird representation? CIS 391 - Intro to AI

Projections alternate intuition: recall the dot product of two vectors is simply the product of their lengths and the cosine of the angle between them CIS 391 - Intro to AI

Now classification is easy! But... how do we learn this mysterious model vector? CIS 391 - Intro to AI

Perceptron Learning Algorithm CIS 391 - Intro to AI

Perceptron Update Example I CIS 391 - Intro to AI

Perceptron Update Example II CIS 391 - Intro to AI

Properties of the Simple Perceptron You can prove that If it’s possible to separate the data with a hyperplane (i.e. if it’s linearly separable), Then the algorithm will converge to that hyperplane. But what if it isn’t? Then perceptron is very unstable and bounces all over the place CIS 391 - Intro to AI

Voted Perceptron Works just like a regular perceptron, except you keep track of all the intermediate models you created When you want to classify something, you let each of the (many, many) models vote on the answer and take the majority CIS 391 - Intro to AI

Properties of Voted Perceptron Simple! Much better generalization performance than regular perceptron For later: (almost as good as SVMs) For later: Can use the ‘kernel trick’ Training as fast as regular perceptron But run-time is slower CIS 391 - Intro to AI

Averaged Perceptron Extremely simple! Return as your final model the average of all your intermediate models Approximation to voted perceptron Nearly as fast to train and exactly as fast to run as regular perceptron CIS 391 - Intro to AI

What’s wrong with these hyperplanes? CIS 391 - Intro to AI

They’re unjustifiably biased! CIS 391 - Intro to AI

A less biased choice CIS 391 - Intro to AI

Margin The margin is the distance to closest point in the training data We tend to get better generalization to unseen data if we choose the separating hyperplane which maximizes the margin CIS 391 - Intro to AI

Support Vector Machines Another learning method which explicitly calculates the maximum margin hyperplane by solving a gigantic quadratic programming problem. Generally considered the highest-performing current machine learning technique. But it’s relatively slow and very complicated. CIS 391 - Intro to AI

Margin-Infused Relaxed Algorithm (MIRA) Multiclass; each class has a prototype vector Classify an instance by choosing the class whose prototype vector has the greatest dot product with the instance During training, when updating make the ‘smallest’ (in a sense) change to the prototype vectors which guarantees correct classification by a minimum margin Pays attention to the margin directly CIS 391 - Intro to AI

What if it isn’t separable? CIS 391 - Intro to AI

Project it to someplace where it is! CIS 391 - Intro to AI

Kernel Trick If our data isn’t linearly separable, we can define a projection to map it into a much higher dimensional feature space where it is. For some algorithms where everything can be expressed as the dot products of instances (SVM, voted perceptron, MIRA) this can be done efficiently using something called the `kernel trick’ CIS 391 - Intro to AI