CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Slides:

Advertisements

Similar presentations

Support Vector Machine

Advertisements

Lecture 9 Support Vector Machines

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Classification / Regression Support Vector Machines

SVM—Support Vector Machines

Support vector machine

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.

Support Vector Machine

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Support Vector Machines and Kernel Methods

Support Vector Machines

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

Lecture 10: Support Vector Machines

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.

An Introduction to Support Vector Machines Martin Law.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Lecture 04: Logistic Regression

Lecture 07: Soft-margin SVM

Recap Finds the boundary with “maximum margin”

Lecture 05: K-nearest neighbors

Geometrical intuition behind the dual problem

Lecture 09: Gaussian Processes

Lecture 04: Logistic Regression

Lecture 19. SVM (III): Kernel Formulation

An Introduction to Support Vector Machines

Kernels Usman Roshan.

Support Vector Machines Introduction to Data Mining, 2nd Edition by

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

CS 2750: Machine Learning Support Vector Machines

Lecture 07: Soft-margin SVM

Recap Finds the boundary with “maximum margin”

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Recitation 6: Kernel SVM

Lecture 08: Soft-margin SVM

Lecture 07: Soft-margin SVM

Lecture 04: Logistic Regression

CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.

Lecture 10: Gaussian Processes

Lecture 06: Bagging and Boosting

Usman Roshan CS 675 Machine Learning

Lecture 03: K-nearest neighbors

Support Vector Machines 2

Presentation transcript:

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu

Announcement Final exam: Dec 15, 2017, Friday, 12:30 – 3:00 pm Room TBA Project proposal due tonight ICLR challenge 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

XOR revisited 10/12/17 Yao-Liang Yu

Quadratic classifier Weights (to be learned) 10/12/17 Yao-Liang Yu

The power of lifting Rd  Rd*d+d+1 Feature map 10/12/17 Yao-Liang Yu

Example 10/12/17 Yao-Liang Yu

Does it work? 10/12/17 Yao-Liang Yu

Curse of dimensionality? computation in this space now But, all we need is the dot product !!! This is still computable in O(d)! 10/12/17 Yao-Liang Yu

Feature transform NN: learn ϕ simultaneously with w Here: choose a nonlinear ϕ so that for some f : RR save computation 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Reverse engineering Start with some function , s.t. exists feature transform ϕ with As long as k is efficiently computable, don’t care the dim of ϕ (could be infinite!) Such k is called a (reproducing) kernel. 10/12/17 Yao-Liang Yu

Examples Polynomial kernel Gaussian Kernel Laplace Kernel Matérn Kernel 10/12/17 Yao-Liang Yu

Verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite ( ) Symmetric: Kij = Kji Positive semidefinite (PSD): for all 10/12/17 Yao-Liang Yu

Kernel calculus If k is a kernel, so is λk for any λ ≥ 0 If k1 and k2 are kernels, so is k1+k2 If k1 and k2 are kernels, so is k1k2 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Kernel SVM (dual) With α, but ϕ is implicit… 10/12/17 Yao-Liang Yu

Does it work? 10/12/17 Yao-Liang Yu

Testing Given test sample x’, how to perform testing? No explicit access to ϕ, again! kernel dual variables training set 10/12/17 Yao-Liang Yu

Tradeoff Previously: training O(nd), test O(d) Kernel: training O(n2d), test O(nd) Nice to avoid explicit dependence on h (could be inf) But if n is also large… (maybe later) 10/12/17 Yao-Liang Yu

Learning the kernel (Lanckriet et al.’04) Nonnegative combination of t pre-selected kernels, with coefficients ζ simultaneously learned 10/12/17 Yao-Liang Yu

Logistic regression revisited kernelize Representer Theorem (Wabha, Schölkopf, Herbrich, Smola, Dinuzzo, …). The optimal w has the following form: 10/12/17 Yao-Liang Yu

Kernel Logistic Regression (primal) uncertain predictions get bigger weight 10/12/17 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 10/12/17 Yao-Liang Yu

Universal approximation (Micchelli, Xu, Zhang’06) Universal kernel. For any compact set Z, for any continuous function f : Z  R, for any ε > 0, there exist x1, x2, …, xn in Z and α1,α2,…,αn in R such that decision boundary kernel methods Example. The Gaussian kernel. 10/12/17 Yao-Liang Yu

Kernel mean embedding (Smola, Song, Gretton, Schölkopf, …) feature map of some kernel Characteristic kernel: the above mapping is 1-1 Completely preserve the information in the distribution P Lots of applications 10/12/17 Yao-Liang Yu

Questions? 10/12/17 Yao-Liang Yu