Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Efficient Model Selection for Support Vector Machines
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
SUPPORT VECTOR MACHINES
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Recap Finds the boundary with “maximum margin”
Support Vector Machines
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Nonparametric Methods: Support Vector Machines
Basic machine learning background with Python scikit-learn
Perceptrons Support-Vector Machines
Support Vector Machines (SVM)
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pawan Lingras and Cory Butz
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
CS 2770: Computer Vision Intro to Visual Recognition
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
Recap Finds the boundary with “maximum margin”
CSSE463: Image Recognition Day 14
Support Vector Machines
Recitation 6: Kernel SVM
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
Support Vector Machine _ 2 (SVM)
Support Vector Machines and Kernels
Support Vector Machine I
CSSE463: Image Recognition Day 15
Other Classification Models: Support Vector Machine (SVM)
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
COSC 4368 Machine Learning Organization
Machine Learning Support Vector Machine Supervised Learning
Linear Discrimination
SVMs for Document Ranking
Presentation transcript:

Support Vector Machines Mihir Patel and Nikhil Sardana A simple binary classification method

Introduction 2 Dimensional Data How do we classify (separate) the data? A Line How do we separate the data better? Maximize margin between line and data For n-dimensional data, we can find an n-1 dimensional plane (called a hyperplane) to separate the data Let us consider some 2 dimensional data A simple way to classify the 2-dimensional data is a 1 dimensional line The best line is one that maximizes the distance between the data points and the line (H3 in the image above) Let us generalize this for any dimension n If we have data points in n dimensions (x, y, z, a, b, c, … n) Then we can find a n-1 dimensional plane (in the case with 2 dimensions, n-1 = 1 dimensional plane = line) This plane separates the data This is called a hyperplane

Linearly Separable Data Data that works well for this separation by n-1 dimensional hyperplane is called “linearly separable data”

The Hyperplane Computing the Hyperplane Lots of Math (Unimportant) Important: Can be solved computationally in quadratic time Picking the Hyperplane SVMs rely only on “difficult points” close to decision boundary called Support Vectors Prevents overfitting How do we compute the hyperplane? Lots of math that I don’t want to explain (because its hard and unnecessary for us) What is important is that its is polynomial computationally

Non-Linearly Separable Data So thats all great for linearly separable data But what about non-linearly separable data? SVM’s dont work? Well running what we have learned so far, we get terrible results How do we solve this?

The Kernel Trick Move data into higher dimensions to create linear separation Example: transformation applied: (x, y) → (x, y, x2 + y2) x2 + y2 is the radius2 Results in linear separation Blue Data: x2+y2 ~ 0.2 The kernel trick is a method of creating separability by transforming the data into higher dimensions Lets consider the data in the image Since they are in circles, the radius of the blue data is generally much smaller than the radius of the red data So if we take the x^2 +y^2 of each data and graph it on the z-plane, we get the blue data points all clustered around z=0.2 (smaller radius = lower on z-plane) And the red data around z = 1.0 Now we can easily find an n-1 dimensional plane (a 2 dimensional plane) that separates the data Red Data: x2+y2 ~ 1.0

Common Transformations So thats great for circular data, but what do we do when we don’t know what shape the data is? We can use some common kernels, which include polynomial and rbf

More Kernels 13. Wave Kernel 14. Power Kernel 15. Log Kernel Linear Kernel Polynomial Kernel Gaussian Kernel Exponential Kernel Laplacian Kernel ANOVA Kernel Hyperbolic Tangent (Sigmoid) Kernel Rational Quadratic Kernel Multiquadric Kernel Inverse Multiquadric Kernel Circular Kernel Spherical Kernel 13. Wave Kernel 14. Power Kernel 15. Log Kernel 16. Spline Kernel 17. B-Spline Kernel 18. Bessel Kernel 19. Cauchy Kernel 20. Chi-Square Kernel 21. Histogram Intersection Kernel 22. Generalized Histogram Intersection Kernel 23. Generalized T-Student Kernel 24. Bayesian Kernel 25. Wavelet Kernel Basically there are a ton of different kernels. The default options in scikit-learn (the library which we are using) are sigmoid, rbf, linear, polynomial Just know that all these kernels do the same thing essentially: they transform the data into higher dimensions to create linear separability

Summary Advantages Cons Always global minima (no local minima issues) Minimal hyperparameter tuning Minimal data - You can have more input than test data Custom kernels Cons May not give as accurate results Difficult to figure out which kernel is ideal It always reaches same percentage accuracy which is lowest possible (better than say neural networks) Easy to make, little work Good for small datasets Adaptable It's just not as good Data is hard to display bc of the n-dimension thing

Applications Text Categorization Image Categorization Medicinal Diagnosis Tests Why Use SVMs? Lots of features of input but not very large datasets Minimal human hyperparameter tuning Binary / few categories Sentiment Is it X? Do you have something or not Why Use SVMS Good for detailed objects with minimal data points Easy to do If its simple outcome state

Code: Scikit-Learn Each x data point is an array of parameters Each y output is classification Scikit auto-splits for train vs. test data You can set kernels through kernel = ... Clf.fit trains Clf.predict runs through network, accuracy gives accuracy

Contest Breast Cancer Tumors Key Things The default kernels are cool, but they won’t give great results If you want to win, graph the data to find better kernels: Plot.ly Look at prior research! Try the Liver Data again (You should be able to reach 100% accuracy) http://tjmachinelearning.com/schedule.html Actually do the contest (the first part takes 15 minutes) You MUST edit the data input shell to work with the data. Using the default kernels (polynomial, rbf, etc.) you most likely won’t get >70% ish If you get the default ones working, go ahead and submit that code (this should take 15 minutes) Then, if you want to win the contest (or do well), graph some of the data and try to come up with your own kernel function Submit again once you’ve got the custom kernels working Submit everything by next wednesday meeting