CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
Support Vector Machines
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Lecture 10: Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Lecture 07: Soft-margin SVM
Recap Finds the boundary with “maximum margin”
Lecture 05: K-nearest neighbors
Lecture 09: Gaussian Processes
Lecture 19. SVM (III): Kernel Formulation
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
CS 2750: Machine Learning Support Vector Machines
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Lecture 07: Soft-margin SVM
Recap Finds the boundary with “maximum margin”
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Recitation 6: Kernel SVM
Lecture 08: Soft-margin SVM
Lecture 07: Soft-margin SVM
Lecture 10: Gaussian Processes
Lecture 06: Bagging and Boosting
Usman Roshan CS 675 Machine Learning
Lecture 03: K-nearest neighbors
COSC 4368 Machine Learning Organization
Support Vector Machines 2
Presentation transcript:

CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu

Announcement Final exam: Wednesday December 19, 4:00-6:30PM, PAC 8 Project proposal due tonight Use the provided latex template! A3 will be available tonight 23/02/2019 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 23/02/2019 Yao-Liang Yu

XOR revisited 23/02/2019 Yao-Liang Yu

Quadratic classifier Weights (to be learned) 23/02/2019 Yao-Liang Yu

The power of lifting Rd  Rd*d+d+1 Feature map 23/02/2019 Yao-Liang Yu

Example 23/02/2019 Yao-Liang Yu

Does it work? 23/02/2019 Yao-Liang Yu

Curse of dimensionality? computation in this space now But, all we need is the dot product !!! This is still computable in O(d)! 23/02/2019 Yao-Liang Yu

Feature transform NN: learn ϕ simultaneously with w Here: choose a nonlinear ϕ so that for some f : RR save computation 23/02/2019 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 23/02/2019 Yao-Liang Yu

Reverse engineering Start with some function , s.t. exists feature transform ϕ with As long as k is efficiently computable, don’t care the dim of ϕ (could be infinite!) Such k is called a (reproducing) kernel. 23/02/2019 Yao-Liang Yu

Examples Polynomial kernel Gaussian Kernel Laplace Kernel Matérn Kernel 23/02/2019 Yao-Liang Yu

Verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite ( ) Symmetric: Kij = Kji Positive semidefinite (PSD): for all 23/02/2019 Yao-Liang Yu

Kernel calculus If k is a kernel, so is λk for any λ ≥ 0 If k1 and k2 are kernels, so is k1+k2 k1 with 𝜑1, k2 with 𝜑2 k1+k2 with ?? If k1 and k2 are kernels, so is k1k2 23/02/2019 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 23/02/2019 Yao-Liang Yu

Kernel SVM (dual) With α, but ϕ is implicit… 23/02/2019 Yao-Liang Yu

Does it work? 𝑘 𝐱, 𝐱 ′ = (𝐱 𝑇 𝐱′+1 ) 2 23/02/2019 Yao-Liang Yu

Testing Given test sample x’, how to perform testing? No explicit access to ϕ, again! kernel dual variables training set 23/02/2019 Yao-Liang Yu

Tradeoff Previously: training O(nd), test O(d) Kernel: training O(n2d), test O(nd) Nice to avoid explicit dependence on h (could be inf) But if n is also large… (maybe later) 23/02/2019 Yao-Liang Yu

Learning the kernel (Lanckriet et al.’04) Nonnegative combination of t pre-selected kernels, with coefficients ζ simultaneously learned 23/02/2019 Yao-Liang Yu

Logistic regression revisited kernelize Representer Theorem (Wabha, Schölkopf, Herbrich, Smola, Dinuzzo, …). The optimal w has the following form: 23/02/2019 Yao-Liang Yu

Outline Feature map Kernels The Kernel Trick Advanced 23/02/2019 Yao-Liang Yu

A closer look What does it mean to use a kernel k? testing: 𝑓 𝑥 =𝑤 𝑇 𝜑(𝑥)= 𝑖=1 𝑛 𝛽 𝑖 𝑘(𝑥 𝑖 ,𝑥) Take 𝑘 𝑥, 𝑥 ′ = 𝑥 𝑇 𝑥 ′ 2 𝑓 𝑥 = 𝑖=1 𝑛 𝛽 𝑖 𝑥 𝑇 𝑧 𝑖 2 = 𝑖=1 𝑛 𝛽 𝑖 𝑗=1 𝑑 𝑘=1 𝑑 𝑥 𝑗 𝑥 𝑘 𝑧 𝑗𝑖 𝑧 𝑘𝑖 = 𝑗=1 𝑑 𝑘=1 𝑑 𝑥 𝑗 𝑥 𝑘 ( 𝑖=1 𝑛 𝛽 𝑖 𝑧 𝑗𝑖 𝑧 𝑘𝑖 ) = 𝑗=1 𝑑 𝑘=1 𝑑 𝜇 𝑗𝑘 𝑥 𝑗 𝑥 𝑘 23/02/2019 Yao-Liang Yu

Reproducing Kernel Hilbert Space Fix x, k(., x) : X  R, z | k(z,x) Vary x in X: { k(., x): x in X } A set of functions from X to R Take linear combinations: { 𝑖=1 𝑛 𝛽 𝑖 k(., xi): xi in X } Define dot product: < 𝑖=1 𝑛 𝛽 𝑖 k(., xi), 𝑗=1 𝑚 𝛾 𝑗 k(., zj)> = 𝑖=1 𝑛 𝑗=1 𝑚 𝛽 𝑖 𝛾 𝑗 k(xi, zj) Complete Reproducing: <f, k(.,x)> = f(x) 23/02/2019 Yao-Liang Yu

Universal approximation (Micchelli, Xu, Zhang’06) Universal kernel. For any compact set Z, for any continuous function f : Z  R, for any ε > 0, there exist x1, x2, …, xn in Z and α1,α2,…,αn in R such that decision boundary kernel methods Example. The Gaussian kernel. 23/02/2019 Yao-Liang Yu

Kernel mean embedding (Smola, Song, Gretton, Schölkopf, …) feature map of some kernel Characteristic kernel: the above mapping is 1-1 Completely preserve the information in the distribution P Lots of applications 23/02/2019 Yao-Liang Yu

Questions? 23/02/2019 Yao-Liang Yu