Sparse Kernels Methods Steve Gunn.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
7. Support Vector Machines (SVMs)
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Methods of Pattern Recognition chapter 5 of: Statistical learning methods by Vapnik Zahra Zojaji.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
CS 4700: Foundations of Artificial Intelligence
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Theory Simulations Applications Theory Simulations Applications.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 New Horizon in Machine Learning — Support Vector Machine for non-Parametric Learning Zhao Lu, Ph.D. Associate Professor Department of Electrical Engineering,
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
COSC 4335: Other Classification Techniques
COSC 4368 Machine Learning Organization
Presentation transcript:

Sparse Kernels Methods Steve Gunn

Overview Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods

Part I Introduction to Kernel Methods

Classification Consider 2 class problem

Optimal Separating Hyperplane

Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.

Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional

Finding the OSH Size is dependent upon training set size Unique global minimum Quadratic Programming Problem

Support Vectors Information contained in support vectors Can throw away rest of training data SVs have non zero Lagrange multipliers

Generalised Separating Hyperplane

Non Separable Case Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.

Size is dependent upon training set size Unique global minimum Quadratic Programming Problem Finding the GSH

Non-Linear SVM Map input space to high dimensional feature space Find OSH or GSH in Feature Space

Kernel Functions Hilbert Schmidt Theory Mercer’s Conditions is a symmetric function

Polynomial Degree 2

Acceptable Kernel Functions Polynomial Multi-Layer Perceptrons Radial Basis Functions

Iris Data Set

Regression Approximation Error Model Size Generalisation Estimation Error

Regression Approximate the data, with a hyperplane, and the SRM principle. using a loss function, e.g.,

Solution Introduce slack variables and minimise subject to the constraints

Finding the Solution Size is dependent upon training set size Unique global minimum Quadratic Programming Problem where

Part I : Summary Unique Global Minimum Addresses Curse of Dimensionality Complexity dependent upon data set size Information contained in Support Vectors

Part II Sparse Kernel Methods

Cyclic Nature of Empirical Modelling Induce Validate Interpret Design

Induction SVMs have strong theory Good empirical performance Solution of the form, Interpretation Input Selection Transparency

Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition

Sparse Kernel Regression Previously …. Now

The Priors “Different priors for different parameters” Smoothness – controls “overfitting” Sparseness – enables input selection and controls overfitting

Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse

Choosing the Sub-Kernels Avoid additional parameters if possible Sub-models should be flexible

Spline Kernel

Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, E.g. for a two input problem the ANOVA kernel is given by And the multivariate ANOVA kernel is given by

Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,

Optimisation

Quadratic Loss

Epsilon-Insensitive Loss

Algorithm Model Sparse ANOVA Selection Parameter Selection Data ANOVA Basis Selection 3+ Stage Technique Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) Auto-selection of Parameters Capacity Control Parameter cross-validation Sparseness Parameter Validation error Stage I

Sparse Basis Solution Quadratic Loss Function (Quadratic Program)  -Insensitive Loss Function (Linear Program)

AMPG Problem Predict automobile MPG (392 samples) Inputs: no. of cylinders, displacement horsepower, weight acceleration, year Output: MPG

Horse Power Horse Power Network transparency through ANOVA representation.

SUPANOVA AMPG Results (  =2.5) Loss FunctionEstimated Generalisation Error Stage IStage IIILinear Model TrainingTesting MeanVarianceMeanVarianceMeanVariance Quadratic  Insensitive  Quadratic  Insensitive  Insensitive Quadratic

AMPG Additive Terms

Summary SUPANOVA is a global approach Strong Basis (Kernel Methods) Can control loss function and sparseness Can impose limit on maximum variate terms Generalisation + Transparency

Further Information isystems/kernel/ SVM Technical Report MATLAB SVM Toolbox Sparse Kernel Paper These Slides