Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sparse Kernels Methods Steve Gunn.

Similar presentations


Presentation on theme: "Sparse Kernels Methods Steve Gunn."— Presentation transcript:

1 http://www.isis.ecs.soton.ac.uk Sparse Kernels Methods Steve Gunn

2 Overview Part I : Introduction to Kernel Methods Part II : Sparse Kernel Methods

3 Part I Introduction to Kernel Methods

4 Classification Consider 2 class problem

5 Optimal Separating Hyperplane

6 Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.

7 Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional

8 Finding the OSH Size is dependent upon training set size Unique global minimum Quadratic Programming Problem

9 Support Vectors Information contained in support vectors Can throw away rest of training data SVs have non zero Lagrange multipliers

10

11 Generalised Separating Hyperplane

12 Non Separable Case Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.

13 Size is dependent upon training set size Unique global minimum Quadratic Programming Problem Finding the GSH

14

15 Non-Linear SVM Map input space to high dimensional feature space Find OSH or GSH in Feature Space

16 Kernel Functions Hilbert Schmidt Theory Mercer’s Conditions is a symmetric function

17 Polynomial Degree 2

18 Acceptable Kernel Functions Polynomial Multi-Layer Perceptrons Radial Basis Functions

19 Iris Data Set

20

21

22

23

24 Regression Approximation Error Model Size Generalisation Estimation Error

25 Regression Approximate the data, with a hyperplane, and the SRM principle. using a loss function, e.g.,

26 Solution Introduce slack variables and minimise subject to the constraints

27 Finding the Solution Size is dependent upon training set size Unique global minimum Quadratic Programming Problem where

28 Part I : Summary Unique Global Minimum Addresses Curse of Dimensionality Complexity dependent upon data set size Information contained in Support Vectors

29 Part II Sparse Kernel Methods

30 Cyclic Nature of Empirical Modelling Induce Validate Interpret Design

31 Induction SVMs have strong theory Good empirical performance Solution of the form, Interpretation Input Selection Transparency

32 Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition

33 Sparse Kernel Regression Previously …. Now

34 The Priors “Different priors for different parameters” Smoothness – controls “overfitting” Sparseness – enables input selection and controls overfitting

35 Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse

36 Choosing the Sub-Kernels Avoid additional parameters if possible Sub-models should be flexible

37 Spline Kernel

38 Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, E.g. for a two input problem the ANOVA kernel is given by And the multivariate ANOVA kernel is given by

39 Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,

40 Optimisation

41 Quadratic Loss

42 Epsilon-Insensitive Loss

43 Algorithm Model Sparse ANOVA Selection Parameter Selection Data ANOVA Basis Selection 3+ Stage Technique Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) Auto-selection of Parameters Capacity Control Parameter cross-validation Sparseness Parameter Validation error Stage I

44 Sparse Basis Solution Quadratic Loss Function (Quadratic Program)  -Insensitive Loss Function (Linear Program)

45 AMPG Problem Predict automobile MPG (392 samples) Inputs: no. of cylinders, displacement horsepower, weight acceleration, year Output: MPG

46 Horse Power5086122 Horse Power158194230 Network transparency through ANOVA representation.

47 SUPANOVA AMPG Results (  =2.5) Loss FunctionEstimated Generalisation Error Stage IStage IIILinear Model TrainingTesting MeanVarianceMeanVarianceMeanVariance Quadratic 6.977.397.086.1911.411.0  Insensitive  0.480.040.490.031.800.11 Quadratic  Insensitive1.100.071.370.10  Insensitive Quadratic 7.076.527.136.0411.7210.94

48 AMPG Additive Terms

49 Summary SUPANOVA is a global approach Strong Basis (Kernel Methods) Can control loss function and sparseness Can impose limit on maximum variate terms Generalisation + Transparency

50 Further Information http://www.isis.ecs.soton.ac.uk/ isystems/kernel/ SVM Technical Report MATLAB SVM Toolbox Sparse Kernel Paper These Slides


Download ppt "Sparse Kernels Methods Steve Gunn."

Similar presentations


Ads by Google