ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Classification / Regression Support Vector Machines
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Crash Course on Machine Learning Part III
An Introduction of Support Vector Machine
Support Vector Machines
Support vector machine
Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.
Support Vector Machines (and Kernel Methods in general)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.
Economics 214 Lecture 37 Constrained Optimization.
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Support Vector Machines and Kernel Methods
Lecture 10: Support Vector Machines
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Support Vector Machines
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Least Squares Support Vector Machine Classifiers J.A.K. Suykens and J. Vandewalle Presenter: Keira (Qi) Zhou.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Mathe III Lecture 8 Mathe III Lecture 8. 2 Constrained Maximization Lagrange Multipliers At a maximum point of the original problem the derivatives of.
Economics 2301 Lecture 37 Constrained Optimization.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Mathe III Lecture 8 Mathe III Lecture 8. 2 Constrained Maximization Lagrange Multipliers At a maximum point of the original problem the derivatives of.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Dhruv Batra Georgia Tech
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Dan Roth Department of Computer and Information Science
ECE 5424: Introduction to Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
Support Vector Machines
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Kernels Usman Roshan.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support vector machines
Machine Learning Week 3.
CSSE463: Image Recognition Day 14
Support Vector Machine I
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
Support Vector Machines
Constraints.
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: SVM Lagrangian Duality (Maybe) SVM dual & kernels Readings: Barber 17.5 Stefan Lee Virginia Tech

Recap of Last Time (C) Dhruv Batra

Image Courtesy: Arthur Gretton (C) Dhruv Batra Image Courtesy: Arthur Gretton

Image Courtesy: Arthur Gretton (C) Dhruv Batra Image Courtesy: Arthur Gretton

Image Courtesy: Arthur Gretton (C) Dhruv Batra Image Courtesy: Arthur Gretton

Image Courtesy: Arthur Gretton (C) Dhruv Batra Image Courtesy: Arthur Gretton

Generative vs. Discriminative Generative Approach (Naïve Bayes) Estimate p(x|y) and p(y) Use Bayes Rule to predict y Discriminative Approach Estimate p(y|x) directly (Logistic Regression) Learn “discriminant” function f(x) (Support Vector Machine) (C) Dhruv Batra

SVMs are Max-Margin Classifiers w.x + b = +1 w.x + b = 0 w.x + b = -1 x+ x- Maximize this while getting examples correct.

Last Time min 𝑤,𝑏 𝑤 𝑇 𝑤 𝑠.𝑡. 𝑤 𝑇 𝑥 𝑗 +𝑏 𝑦 𝑗 ≥1, ∀𝑗 Hard-Margin SVM Formulation min 𝑤,𝑏 𝑤 𝑇 𝑤 𝑠.𝑡. 𝑤 𝑇 𝑥 𝑗 +𝑏 𝑦 𝑗 ≥1, ∀𝑗 Soft-Margin SVM Formulation min 𝑤,𝑏 𝑤 𝑇 𝑤+𝐶 𝑖 𝜉 𝑖 𝑠.𝑡. 𝑤 𝑇 𝑥 𝑗 +𝑏 𝑦 𝑗 ≥1−, 𝜉 𝑗 ∀𝑗 𝜉 𝑗 ≥0, ∀𝑗

Last Time Can rewrite as a hinge loss min 𝑤,𝑏 1 2 𝑤 𝑇 𝑤+𝐶 𝑖 ℎ( 𝑦 𝑖 ( 𝑤 𝑇 𝑥 𝑖 +𝑏) SVM: Hinge Loss LR: Logistic Loss

Today I want to show you how useful SVMs really are by explaining the Kernel trick but to do that…. …. we need to talk about Lagrangian Duality

Constrained Optimization How do I solve these sorts of problems: min 𝑤 𝑓 𝑤 𝑠.𝑡. ℎ 1 𝑤 =0 ⋮ ℎ 𝑘 𝑤 =0 𝑔 1 𝑤 ≤0 ⋮ 𝑔 𝑚 𝑤 ≤0 (C) Dhruv Batra

Introducing the Lagrangian Lets assume we have this problem min 𝑤 𝑓 𝑤 𝑠.𝑡. ℎ 𝑤 =0 𝑔 𝑤 ≤0 Define the Lagrangian function: 𝐿 𝑤, 𝛽,𝛼 =𝑓 𝑤 +𝛽ℎ 𝑤 +𝛼𝑔 𝑤 𝑠.𝑡. 𝛼≥0 Where 𝛽 and 𝛼 are called Lagrange Multipliers (or dual variables) (C) Dhruv Batra

Gradient of Lagrangian 𝐿 𝑤, 𝛽,𝛼 =𝑓 𝑤 +𝛽ℎ 𝑤 +𝛼𝑔 𝑤 𝛿𝐿 𝛿𝛼 =𝑔 𝑤 =0 𝛿𝐿 𝛿𝛽 =ℎ 𝑤 =0 𝛿𝐿 𝛿𝑤 = 𝛿𝑓(𝑤) 𝛿𝑤 +𝛽 𝛿ℎ(𝑤) 𝛿𝑤 +𝛼 𝛿𝑔(𝑤) 𝛿𝑤 =0 This will find critical points in the constrained function. (C) Dhruv Batra

𝛿𝐿 𝛿𝑤 = 𝛿𝑓(𝑤) 𝛿𝑤 +𝛽 𝛿ℎ(𝑤) 𝛿𝑤 =0 Building Intuition 𝐿 𝑤, 𝛽,𝛼 =𝑓 𝑤 +𝛽ℎ 𝑤 𝛿𝐿 𝛿𝑤 = 𝛿𝑓(𝑤) 𝛿𝑤 +𝛽 𝛿ℎ(𝑤) 𝛿𝑤 =0 𝛿𝑓(𝑤) 𝛿𝑤 =−𝛽 𝛿ℎ(𝑤) 𝛿𝑤

Geometric Intuition Image Credit: Wikipedia

Geometric Intuition h Image Credit: Wikipedia

A simple example Minimize f(x,y) = x + y s.t. x2 + y2 = 1 L(x,y, 𝛽) = x + y + 𝛽(x2 + y2 – 1) 𝛿𝐿 𝛿𝑥 =1+2𝛽𝑥=0, 𝛿𝐿 𝛿𝑥 =1+2𝛽𝑦=0 𝛿𝐿 𝛿Β = 𝑥 2 + 𝑦 2 −1=0 𝑥=𝑦= 1 2𝛽 → 2 4 𝛽 2 =1 →𝛽= ± 2 Makes 𝑥=𝑦=± 2 2

Why go through this trouble? Often solutions derived from the Lagrangian form are easier to solve Sometimes they offer some useful intuition about the problem It builds character.

Lagrangian Duality More formally on overhead Starring a game-theoretic interpretation, the duality gap, and KKT conditions.