Download presentation
Presentation is loading. Please wait.
1
http://www.isis.ecs.soton.ac.uk Sparse Kernels Methods Steve Gunn
2
Overview Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods
3
Part I Introduction to Kernel Methods
4
Classification Consider 2 class problem
5
Optimal Separating Hyperplane
6
Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.
7
Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional
8
Finding the OSH Size is dependent upon training set size Unique global minimum Quadratic Programming Problem
9
Support Vectors Information contained in support vectors Can throw away rest of training data SVs have non zero Lagrange multipliers
11
Generalised Separating Hyperplane
12
Non Separable Case Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.
13
Size is dependent upon training set size Unique global minimum Quadratic Programming Problem Finding the GSH
15
Non-Linear SVM Map input space to high dimensional feature space Find OSH or GSH in Feature Space
16
Kernel Functions Hilbert Schmidt Theory Mercer’s Conditions is a symmetric function
17
Polynomial Degree 2
18
Acceptable Kernel Functions Polynomial Multi-Layer Perceptrons Radial Basis Functions
19
Iris Data Set
24
Regression Approximation Error Model Size Generalisation Estimation Error
25
Regression Approximate the data, with a hyperplane, and the SRM principle. using a loss function, e.g.,
26
Solution Introduce slack variables and minimise subject to the constraints
27
Finding the Solution Size is dependent upon training set size Unique global minimum Quadratic Programming Problem where
28
Part I : Summary Unique Global Minimum Addresses Curse of Dimensionality Complexity dependent upon data set size Information contained in Support Vectors
29
Part II Sparse Kernel Methods
30
Cyclic Nature of Empirical Modelling Induce Validate Interpret Design
31
Induction SVMs have strong theory Good empirical performance Solution of the form, Interpretation Input Selection Transparency
32
Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition
33
Sparse Kernel Regression Previously …. Now
34
The Priors “Different priors for different parameters” Smoothness – controls “overfitting” Sparseness – enables input selection and controls overfitting
35
Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse
36
Choosing the Sub-Kernels Avoid additional parameters if possible Sub-models should be flexible
37
Spline Kernel
38
Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, E.g. for a two input problem the ANOVA kernel is given by And the multivariate ANOVA kernel is given by
39
Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,
40
Optimisation
41
Quadratic Loss
42
Epsilon-Insensitive Loss
43
Algorithm Model Sparse ANOVA Selection Parameter Selection Data ANOVA Basis Selection 3+ Stage Technique Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) Auto-selection of Parameters Capacity Control Parameter cross-validation Sparseness Parameter Validation error Stage I
44
Sparse Basis Solution Quadratic Loss Function (Quadratic Program) -Insensitive Loss Function (Linear Program)
45
AMPG Problem Predict automobile MPG (392 samples) Inputs: no. of cylinders, displacement horsepower, weight acceleration, year Output: MPG
46
Horse Power5086122 Horse Power158194230 Network transparency through ANOVA representation.
47
SUPANOVA AMPG Results ( =2.5) Loss FunctionEstimated Generalisation Error Stage IStage IIILinear Model TrainingTesting MeanVarianceMeanVarianceMeanVariance Quadratic 6.977.397.086.1911.411.0 Insensitive 0.480.040.490.031.800.11 Quadratic Insensitive1.100.071.370.10 Insensitive Quadratic 7.076.527.136.0411.7210.94
48
AMPG Additive Terms
49
Summary SUPANOVA is a global approach Strong Basis (Kernel Methods) Can control loss function and sparseness Can impose limit on maximum variate terms Generalisation + Transparency
50
Further Information http://www.isis.ecs.soton.ac.uk/ isystems/kernel/ SVM Technical Report MATLAB SVM Toolbox Sparse Kernel Paper These Slides
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.