Sparse Kernels Methods Steve Gunn
Overview Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods
Part I Introduction to Kernel Methods
Classification Consider 2 class problem
Optimal Separating Hyperplane
Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.
Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional
Finding the OSH Size is dependent upon training set size Unique global minimum Quadratic Programming Problem
Support Vectors Information contained in support vectors Can throw away rest of training data SVs have non zero Lagrange multipliers
Generalised Separating Hyperplane
Non Separable Case Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.
Size is dependent upon training set size Unique global minimum Quadratic Programming Problem Finding the GSH
Non-Linear SVM Map input space to high dimensional feature space Find OSH or GSH in Feature Space
Kernel Functions Hilbert Schmidt Theory Mercer’s Conditions is a symmetric function
Polynomial Degree 2
Acceptable Kernel Functions Polynomial Multi-Layer Perceptrons Radial Basis Functions
Iris Data Set
Regression Approximation Error Model Size Generalisation Estimation Error
Regression Approximate the data, with a hyperplane, and the SRM principle. using a loss function, e.g.,
Solution Introduce slack variables and minimise subject to the constraints
Finding the Solution Size is dependent upon training set size Unique global minimum Quadratic Programming Problem where
Part I : Summary Unique Global Minimum Addresses Curse of Dimensionality Complexity dependent upon data set size Information contained in Support Vectors
Part II Sparse Kernel Methods
Cyclic Nature of Empirical Modelling Induce Validate Interpret Design
Induction SVMs have strong theory Good empirical performance Solution of the form, Interpretation Input Selection Transparency
Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition
Sparse Kernel Regression Previously …. Now
The Priors “Different priors for different parameters” Smoothness – controls “overfitting” Sparseness – enables input selection and controls overfitting
Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse
Choosing the Sub-Kernels Avoid additional parameters if possible Sub-models should be flexible
Spline Kernel
Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, E.g. for a two input problem the ANOVA kernel is given by And the multivariate ANOVA kernel is given by
Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,
Optimisation
Quadratic Loss
Epsilon-Insensitive Loss
Algorithm Model Sparse ANOVA Selection Parameter Selection Data ANOVA Basis Selection 3+ Stage Technique Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) Auto-selection of Parameters Capacity Control Parameter cross-validation Sparseness Parameter Validation error Stage I
Sparse Basis Solution Quadratic Loss Function (Quadratic Program) -Insensitive Loss Function (Linear Program)
AMPG Problem Predict automobile MPG (392 samples) Inputs: no. of cylinders, displacement horsepower, weight acceleration, year Output: MPG
Horse Power Horse Power Network transparency through ANOVA representation.
SUPANOVA AMPG Results ( =2.5) Loss FunctionEstimated Generalisation Error Stage IStage IIILinear Model TrainingTesting MeanVarianceMeanVarianceMeanVariance Quadratic Insensitive Quadratic Insensitive Insensitive Quadratic
AMPG Additive Terms
Summary SUPANOVA is a global approach Strong Basis (Kernel Methods) Can control loss function and sparseness Can impose limit on maximum variate terms Generalisation + Transparency
Further Information isystems/kernel/ SVM Technical Report MATLAB SVM Toolbox Sparse Kernel Paper These Slides