1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
An Introduction of Support Vector Machine
Basics of Kernel Methods in Statistical Learning Theory Mohammed Nasser Professor Department of Statistics Rajshahi University
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
SVM—Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Computer vision: models, learning and inference Chapter 8 Regression.
Support Vector Machines
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines
Support Vector Machine
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
CS 4700: Foundations of Artificial Intelligence
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Learning in Feature Space (Could Simplify the Classification Task)  Learning in a high dimensional space could degrade generalization performance  This.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
ML Concepts Covered in 678 Advanced MLP concepts: Higher Order, Batch, Classification Based, etc. Recurrent Neural Networks Support Vector Machines Relaxation.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Properties of Kernels Presenter: Hongliang Fei Date: June 11, 2009.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Data Mining and Decision Support
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Support Vector Machines
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
The Elements of Statistical Learning
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
Recitation 6: Kernel SVM
Welcome to the Kernel-Club
Machine Learning Support Vector Machine Supervised Learning
Presentation transcript:

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations. I like: an active attitude in class. ask questions! respond to my questions. Enjoy.

2 Primary Goal What is the primary goal of: - Machine Learning - Data Mining - Pattern Recognition - Data Analysis - Statistics Automatic detection of non-coincidental structure in data.

3 Desiderata Robust algorithms are insensitive to outliers and wrong model assumptions. Stable algorithms: generalize well to unseen data. Computationally efficient algorithms are necessary to handle large datasets.

4 Supervised & Unsupervised Learning supervised: classification, regression unsupervised: clustering, dimensionality reduction, ranking, outlier detection. primal vs. dual problems: generalized linear models vs. kernel classification. this is like nearest neighbor classification.

5 Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert space) example:

6 Kernel Trick Note: In the dual representation we used the Gram matrix to express the solution. Kernel Trick: Replace : kernel If we use algorithms that only depend on the Gram-matrix, G, then we never have to know (compute) the actual features This is the crucial point of kernel methods

7 Properties of a Kernel Definition: A finitely positive semi-definite function is a symmetric function of its arguments for which matrices formed by restriction on any finite subset of points is positive semi-definite. Theorem: A function can be written as where is a feature map iff k(x,y) satisfies the semi-definiteness property. Relevance: We can now check if k(x,y) is a proper kernel using only properties of k(x,y) itself, i.e. without the need to know the feature map!

8 Modularity Kernel methods consist of two modules: 1) The choice of kernel (this is non-trivial) 2) The algorithm which takes kernels as input Modularity: Any kernel can be used with any kernel-algorithm. some kernels: some kernel algorithms: - support vector machine - Fisher discriminant analysis - kernel regression - kernel PCA - kernel CCA

9 Goodies and Baddies Goodies: Kernel algorithms are typically constrained convex optimization problems  solved with either spectral methods or convex optimization tools. Efficient algorithms do exist in most cases. The similarity to linear methods facilitates analysis. There are strong generalization bounds on test error. Baddies: You need to choose the appropriate kernel Kernel learning is prone to over-fitting All information must go through the kernel-bottleneck.

10 Regularization Demo Trevor Hastie. regularization is very important! regularization parameters determined by out of sample. measures (cross-validation, leave-one-out).

11 Learning Kernels All information is tunneled through the Gram-matrix information bottleneck. The real art is to pick an appropriate kernel. e.g. take the RBF kernel: if c is very small: G=I (all data are dissimilar): over-fitting if c is very large: G=1 (all data are very similar): under-fitting We need to learn the kernel. Here is some ways to combine kernels to improve them: k1 k2 cone any positive polynomial