SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.

Slides:



Advertisements
Similar presentations
Support Vector Machines
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Linear Classifiers/SVMs
Linear Separators.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Linear Separators.
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.
Linear Separators. Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We will.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
Support Vector Machine (SVM) Classification
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
Large-scale Classification and Regression Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
Support Vector Machines
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Least Squares Support Vector Machine Classifiers J.A.K. Suykens and J. Vandewalle Presenter: Keira (Qi) Zhou.
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Lecture 14. Outline Support Vector Machine 1. Overview of SVM 2. Problem setting of linear separators 3. Soft Margin Method 4. Lagrange Multiplier Method.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PREDICT 422: Practical Machine Learning
Gradient descent David Kauchak CS 158 – Fall 2016.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Large Margin classifiers
ECE 5424: Introduction to Machine Learning
Dan Roth Department of Computer and Information Science
Jan Rupnik Jozef Stefan Institute
Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical Learning Dong Liu Dept. EEIS, USTC.
Linear Classifier by Dr
Large Scale Support Vector Machines
Kai-Wei Chang University of Virginia
Machine Learning Week 3.
Support Vector Machines
Other Classification Models: Support Vector Machine (SVM)
CS639: Data Management for Data Science
Presentation transcript:

SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013

Admin Assignment 5 Midterm Friday’s class will be in MBH 632 CS lunch talk Thursday

Java tips for the data -Xmx -Xmx2g

Large margin classifiers The margin of a classifier is the distance to the closest points of either class Large margin classifiers attempt to maximize this margin

Support vector machine problem subject to: This is a a quadratic optimization problem Maximize/minimize a quadratic function Subject to a set of linear constraints Many, many variants of solving this problem (we’ll see one in a bit)

Soft Margin Classification What about this problem? subject to:

Soft Margin Classification subject to: We’d like to learn something like this, but our constraints won’t allow it 

Slack variables subject to: slack variables (one for each example) What effect does this have?

Slack variables subject to: slack penalties

Slack variables subject to: allowed to make a mistake penalized by how far from “correct” trade-off between margin maximization and penalization margin

Soft margin SVM Still a quadratic optimization problem! subject to:

Demo

Solving the SVM problem

Understanding the Soft Margin SVM subject to: Given the optimal solution, w, b: Can we figure out what the slack penalties are for each point?

Understanding the Soft Margin SVM subject to: What do the margin lines represent wrt w,b?

Understanding the Soft Margin SVM subject to: Or:

Understanding the Soft Margin SVM subject to: What are the slack values for points outside (or on) the margin AND correctly classified?

Understanding the Soft Margin SVM subject to: 0! The slack variables have to be greater than or equal to zero and if they’re on or beyond the margin then y i (wx i +b) ≥ 1 already

Understanding the Soft Margin SVM subject to: What are the slack values for points inside the margin AND classified correctly?

Understanding the Soft Margin SVM subject to: Difference from point to the margin. Which is?

Understanding the Soft Margin SVM subject to: What are the slack values for points that are incorrectly classified?

Understanding the Soft Margin SVM subject to: Which is?

Understanding the Soft Margin SVM subject to: “distance” to the hyperplane plus the “distance” to the margin ?

Understanding the Soft Margin SVM subject to: “distance” to the hyperplane plus the “distance” to the margin Why -?

Understanding the Soft Margin SVM subject to: “distance” to the hyperplane plus the “distance” to the margin ?

Understanding the Soft Margin SVM subject to: “distance” to the hyperplane plus the “distance” to the margin 1

Understanding the Soft Margin SVM subject to: “distance” to the hyperplane plus the “distance” to the margin

Understanding the Soft Margin SVM subject to:

Understanding the Soft Margin SVM Does this look familiar?

Hinge loss! 0/1 loss: Hinge: Exponential: Squared loss:

Understanding the Soft Margin SVM subject to: Do we need the constraints still?

Understanding the Soft Margin SVM subject to: Unconstrained problem!

Understanding the Soft Margin SVM Does this look like something we’ve seen before? Gradient descent problem!

Soft margin SVM as gradient descent multiply through by 1/C and rearrange let λ =1/C What type of gradient descent problem?

Soft margin SVM as gradient descent One way to solve the soft margin SVM problem is using gradient descent hinge loss L2 regularization

Gradient descent SVM solver  pick a starting point (w)  repeat until loss doesn’t decrease in all dimensions: pick a dimension move a small amount in that dimension towards decreasing loss (using the derivative) hinge lossL2 regularization Finds the largest margin hyperplane while allowing for a soft margin

Support vector machines One of the most successful (if not the most successful) classification approach: Support vector machine perceptron algorithm k nearest neighbor decision tree

Trends over time