Machine Learning Queens College Lecture 13: SVM Again.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
An Overview of Machine Learning
Supervised Learning Recap
Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.
Support Vector Machines
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Machines Kernel Machines
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
This week: overview on pattern recognition (related to machine learning)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
SVM – Support Vector Machines Presented By: Bella Specktor.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
CSSE463: Image Recognition Day 14
Support vector machines
Support Vector Machines and Kernels
Machine Learning Support Vector Machine Supervised Learning
Linear Discrimination
Review for test #3 Radial basis functions SVM SOM.
SVMs for Document Ranking
Presentation transcript:

Machine Learning Queens College Lecture 13: SVM Again

Today Completion of Support Vector Machines Project Description and Topics 1

Support Vectors Support Vectors are those input points (vectors) closest to the decision boundary 1. They are vectors 2. They “support” the decision hyperplane 2

Support Vectors Define this as a decision problem The decision hyperplane: No fancy math, just the equation of a hyperplane. 3

Support Vectors The decision hyperplane: Scale invariance 4

Support Vectors The decision hyperplane: Scale invariance 5 This scaling does not change the decision hyperplane, or the support vector hyperplanes. But we will eliminate a variable from the optimization

What are we optimizing? We will represent the size of the margin in terms of w. This will allow us to simultaneously –Identify a decision boundary –Maximize the margin 6

Max Margin Loss Function If constraint optimization then Lagrange Multipliers Optimize the “Primal” 7

Visualization of Support Vectors 8

Interpretability of SVM parameters What else can we tell from alphas? –If alpha is large, then the associated data point is quite important. –It’s either an outlier, or incredibly important. But this only gives us the best solution for linearly separable data sets… 9

Basis of Kernel Methods The decision process doesn’t depend on the dimensionality of the data. We can map to a higher dimensionality of the data space. Note: data points only appear within a dot product. The error is based on the dot product of data points – not the data points themselves. 10

Basis of Kernel Methods Since data points only appear within a dot product. Thus we can map to another space through a replacement The error is based on the dot product of data points – not the data points themselves. 11

Learning Theory bases of SVMs Theoretical bounds on testing error. –The upper bound doesn’t depend on the dimensionality of the space –The lower bound is maximized by maximizing the margin, γ, associated with the decision boundary. 12

Why we like SVMs They work –Good generalization Easily interpreted. –Decision boundary is based on the data in the form of the support vectors. Not so in multilayer perceptron networks Principled bounds on testing error from Learning Theory (VC dimension) 13

SVM vs. MLP SVMs have many fewer parameters –SVM: Maybe just a kernel parameter –MLP: Number and arrangement of nodes and eta learning rate SVM: Convex optimization task –MLP: likelihood is non-convex -- local minima 14

Soft margin classification There can be outliers on the other side of the decision boundary, or leading to a small margin. Solution: Introduce a penalty term to the constraint function 15

Soft Max Dual 16 Still Quadratic Programming!

Points are allowed within the margin, but cost is introduced. Soft margin example 17 Hinge Loss

Probabilities from SVMs Support Vector Machines are discriminant functions –Discriminant functions: f(x)=c –Discriminative models: f(x) = argmax c p(c|x) –Generative Models: f(x) = argmax c p(x|c)p(c)/p(x) No (principled) probabilities from SVMs SVMs are not based on probability distribution functions of class instances. 18

Efficiency of SVMs Not especially fast. Training – n^3 –Quadratic Programming efficiency Evaluation – n –Need to evaluate against each support vector (potentially n) 19

Research Projects Run a machine learning experiment –Identify a problem/task. –Find appropriate data –Implement one or more ML algorithm –Evaluate the performance. Write a report of the experiment –4 pages including references –Abstract One paragraph describing the experiment –Introduction Describe the problem/task –Data Describe the data set, features extracted, cleaning processes –Method Describe the algorithm/approach –Results Present and Discuss results –Conclusion Summarize the experiment and results Teams of two people are acceptable. –Requires a report from each participant (written independently) describing who was responsible for the components of the work. 20

Sample Problems/Tasks Vision/Graphics –Object Classification –Facial Recognition –Fingerprint Identification –Fingerprint ID –Handwriting recognition Non English languages? Language –Topic classification –Sentiment analysis –Speech recognition –Speaker identification –Punctuation restoration –Semantic Segmentation –Recognition of Emotion, Sarcasm, etc. –SMS Text normalization –Chat participant Id –Twitter classification –Twitter threading 21

Sample Problems/Tasks Games –Chess –Checkers –Poker –Blackjack –Go Recommenders (Collaborative Filtering) –Netflix –Courses –Jokes –Books –Facebook Video Classification –Motion classification –Segmentation 22

ML Topics to explore in the project L1-regularization Non-linear kernels Loopy belief propagation Non-parametric Belief propagation Soft-decision trees Analysis of Neural Network Hidden Layers Structured Learning Generalized Expectation One-class learning Evaluation Measures –Cluster Evaluation –Semi-supervised evaluation –Skewed Data Graph Embedding Dimensionality Reduction Feature Selection Graphical Model Construction Non-parametric Bayesian Methods Latent Dirichlet Allocation Deep-Learning – Boltzman Machines SVM Regression 23

Data UCI Machine Learning Repository – Ask Me Collect some of your own 24

Next Time Kernel Methods 25