Sparse Kernels Methods Steve Gunn.

http://www.isis.ecs.soton.ac.uk Sparse Kernels Methods Steve Gunn

Overview Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods

Part I Introduction to Kernel Methods

Classification Consider 2 class problem

Optimal Separating Hyperplane

Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.

Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional

Finding the OSH Size is dependent upon training set size Unique global minimum Quadratic Programming Problem

Support Vectors Information contained in support vectors Can throw away rest of training data SVs have non zero Lagrange multipliers

Generalised Separating Hyperplane

Non Separable Case Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.

Size is dependent upon training set size Unique global minimum Quadratic Programming Problem Finding the GSH

Non-Linear SVM Map input space to high dimensional feature space Find OSH or GSH in Feature Space

Kernel Functions Hilbert Schmidt Theory Mercer’s Conditions is a symmetric function

Polynomial Degree 2

Acceptable Kernel Functions Polynomial Multi-Layer Perceptrons Radial Basis Functions

Iris Data Set

Regression Approximation Error Model Size Generalisation Estimation Error

Regression Approximate the data, with a hyperplane, and the SRM principle. using a loss function, e.g.,

Solution Introduce slack variables and minimise subject to the constraints

Finding the Solution Size is dependent upon training set size Unique global minimum Quadratic Programming Problem where

Part I : Summary Unique Global Minimum Addresses Curse of Dimensionality Complexity dependent upon data set size Information contained in Support Vectors

Part II Sparse Kernel Methods

Cyclic Nature of Empirical Modelling Induce Validate Interpret Design

Induction SVMs have strong theory Good empirical performance Solution of the form, Interpretation Input Selection Transparency

Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition

Sparse Kernel Regression Previously …. Now

The Priors “Different priors for different parameters” Smoothness – controls “overfitting” Sparseness – enables input selection and controls overfitting

Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse

Choosing the Sub-Kernels Avoid additional parameters if possible Sub-models should be flexible

Spline Kernel

Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, E.g. for a two input problem the ANOVA kernel is given by And the multivariate ANOVA kernel is given by

Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,

Optimisation

Quadratic Loss

Epsilon-Insensitive Loss

Algorithm Model Sparse ANOVA Selection Parameter Selection Data ANOVA Basis Selection 3+ Stage Technique Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) Auto-selection of Parameters Capacity Control Parameter cross-validation Sparseness Parameter Validation error Stage I

Sparse Basis Solution Quadratic Loss Function (Quadratic Program)  -Insensitive Loss Function (Linear Program)

AMPG Problem Predict automobile MPG (392 samples) Inputs: no. of cylinders, displacement horsepower, weight acceleration, year Output: MPG

Horse Power5086122 Horse Power158194230 Network transparency through ANOVA representation.

SUPANOVA AMPG Results (  =2.5) Loss FunctionEstimated Generalisation Error Stage IStage IIILinear Model TrainingTesting MeanVarianceMeanVarianceMeanVariance Quadratic 6.977.397.086.1911.411.0  Insensitive  0.480.040.490.031.800.11 Quadratic  Insensitive1.100.071.370.10  Insensitive Quadratic 7.076.527.136.0411.7210.94

AMPG Additive Terms

Summary SUPANOVA is a global approach Strong Basis (Kernel Methods) Can control loss function and sparseness Can impose limit on maximum variate terms Generalisation + Transparency

Further Information http://www.isis.ecs.soton.ac.uk/ isystems/kernel/ SVM Technical Report MATLAB SVM Toolbox Sparse Kernel Paper These Slides

Sparse Kernels Methods Steve Gunn.

Similar presentations

Presentation on theme: "Sparse Kernels Methods Steve Gunn."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sparse Kernels Methods Steve Gunn.

Similar presentations

Presentation on theme: "Sparse Kernels Methods Steve Gunn."— Presentation transcript:

Similar presentations

About project

Feedback