CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
Support Vector Machines
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.
An Introduction to Support Vector Machines Martin Law.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Lecture 04: Logistic Regression
Lecture 07: Soft-margin SVM
Recap Finds the boundary with “maximum margin”
Lecture 05: K-nearest neighbors
Geometrical intuition behind the dual problem
Lecture 09: Gaussian Processes
Lecture 04: Logistic Regression
Lecture 19. SVM (III): Kernel Formulation
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CS 2750: Machine Learning Support Vector Machines
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Lecture 07: Soft-margin SVM
Recap Finds the boundary with “maximum margin”
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Recitation 6: Kernel SVM
Lecture 08: Soft-margin SVM
Lecture 07: Soft-margin SVM
Lecture 04: Logistic Regression
CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.
Lecture 10: Gaussian Processes
Lecture 06: Bagging and Boosting
Usman Roshan CS 675 Machine Learning
Lecture 03: K-nearest neighbors
Support Vector Machines 2
Presentation transcript:

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu

Announcement Final exam: Dec 15, 2017, Friday, 12:30 – 3:00 pm Room TBA Project proposal due tonight ICLR challenge 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

XOR revisited 10/12/17 Yao-Liang Yu

Quadratic classifier Weights (to be learned) 10/12/17 Yao-Liang Yu

The power of lifting Rd  Rd*d+d+1 Feature map 10/12/17 Yao-Liang Yu

Example 10/12/17 Yao-Liang Yu

Does it work? 10/12/17 Yao-Liang Yu

Curse of dimensionality? computation in this space now But, all we need is the dot product !!! This is still computable in O(d)! 10/12/17 Yao-Liang Yu

Feature transform NN: learn ϕ simultaneously with w Here: choose a nonlinear ϕ so that for some f : RR save computation 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Reverse engineering Start with some function , s.t. exists feature transform ϕ with As long as k is efficiently computable, don’t care the dim of ϕ (could be infinite!) Such k is called a (reproducing) kernel. 10/12/17 Yao-Liang Yu

Examples Polynomial kernel Gaussian Kernel Laplace Kernel Matérn Kernel 10/12/17 Yao-Liang Yu

Verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite ( ) Symmetric: Kij = Kji Positive semidefinite (PSD): for all 10/12/17 Yao-Liang Yu

Kernel calculus If k is a kernel, so is λk for any λ ≥ 0 If k1 and k2 are kernels, so is k1+k2 If k1 and k2 are kernels, so is k1k2 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Kernel SVM (dual) With α, but ϕ is implicit… 10/12/17 Yao-Liang Yu

Does it work? 10/12/17 Yao-Liang Yu

Testing Given test sample x’, how to perform testing? No explicit access to ϕ, again! kernel dual variables training set 10/12/17 Yao-Liang Yu

Tradeoff Previously: training O(nd), test O(d) Kernel: training O(n2d), test O(nd) Nice to avoid explicit dependence on h (could be inf) But if n is also large… (maybe later) 10/12/17 Yao-Liang Yu

Learning the kernel (Lanckriet et al.’04) Nonnegative combination of t pre-selected kernels, with coefficients ζ simultaneously learned 10/12/17 Yao-Liang Yu

Logistic regression revisited kernelize Representer Theorem (Wabha, Schölkopf, Herbrich, Smola, Dinuzzo, …). The optimal w has the following form: 10/12/17 Yao-Liang Yu

Kernel Logistic Regression (primal) uncertain predictions get bigger weight 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Universal approximation (Micchelli, Xu, Zhang’06) Universal kernel. For any compact set Z, for any continuous function f : Z  R, for any ε > 0, there exist x1, x2, …, xn in Z and α1,α2,…,αn in R such that decision boundary kernel methods Example. The Gaussian kernel. 10/12/17 Yao-Liang Yu

Kernel mean embedding (Smola, Song, Gretton, Schölkopf, …) feature map of some kernel Characteristic kernel: the above mapping is 1-1 Completely preserve the information in the distribution P Lots of applications 10/12/17 Yao-Liang Yu

Questions? 10/12/17 Yao-Liang Yu