ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Crash Course on Machine Learning Part III
An Introduction of Support Vector Machine
Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
Support Vector Machine
Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
CS 4700: Foundations of Artificial Intelligence
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture 10: Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PREDICT 422: Practical Machine Learning
Support Vector Machine
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Support Vector Machines and Kernels
Geometrical intuition behind the dual problem
Support Vector Machines
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
An Introduction to Support Vector Machines
Kernels Usman Roshan.
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
ECE 5424: Introduction to Machine Learning
Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
ECE 5424: Introduction to Machine Learning
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Recitation 6: Kernel SVM
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
Support Vector Machine I
Usman Roshan CS 675 Machine Learning
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
Linear Discrimination
SVMs for Document Ranking
Support Vector Machines
Introduction to Machine Learning
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: SVM SVM dual & kernels Readings: Barber 17.5 Stefan Lee Virginia Tech

Lagrangian Duality On paper (C) Dhruv Batra

Dual SVM derivation (1) – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM derivation (1) – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The “kernel trick”!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM interpretation: Sparsity w.x + b = +1 w.x + b = -1 w.x + b = 0 margin 2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra

Dot-product of polynomials Vector of Monomials of degree m (C) Dhruv Batra Slide Credit: Carlos Guestrin

Higher order polynomials d – input features m – degree of polynomial m=4 number of monomial terms grows fast! m = 6, d = 100 D = about 1.6 billion terms m=3 m=2 number of input dimensions (d) (C) Dhruv Batra Slide Credit: Carlos Guestrin

Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernel / Radial Basis Function Sigmoid 2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Kernel Demo Demo http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSVM.html (C) Dhruv Batra

What is a kernel? k: X x X  R Any measure of “similarity” between two inputs Mercer Kernel / Positive Semi-Definite Kernel Often just called “kernel” (C) Dhruv Batra

(C) Dhruv Batra Slide Credit: Blaschko & Lampert

(C) Dhruv Batra Slide Credit: Blaschko & Lampert

(C) Dhruv Batra Slide Credit: Blaschko & Lampert

(C) Dhruv Batra Slide Credit: Blaschko & Lampert

Finally: the “kernel trick”! Never represent features explicitly Compute dot products in closed form Constant-time high-dimensional dot-products for many classes of features Very interesting theory – Reproducing Kernel Hilbert Spaces (C) Dhruv Batra Slide Credit: Carlos Guestrin

Kernels in Computer Vision Features x = histogram (of color, texture, etc) Common Kernels Intersection Kernel Chi-square Kernel (C) Dhruv Batra

What about at classification time For a new input x, if we need to represent (x), we are in trouble! Recall classifier: sign(w.(x)+b) Using kernels we are fine! (C) Dhruv Batra Slide Credit: Carlos Guestrin

Kernels in logistic regression Define weights in terms of support vectors: Derive simple gradient descent rule on i

Kernels Kernel Logistic Regression Kernel Least Squares Kernel PCA … (C) Dhruv Batra