Efficient Model Selection for Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Reduced Support Vector Machine
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
SVM Support Vectors Machines
Support Vector Machines
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Genetic Algorithm.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
© Negnevitsky, Pearson Education, Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Introduction,
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
SVMs in a Nutshell.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Collaborative Filtering Matrix Factorization Approach
COSC 4335: Other Classification Techniques
EE368 Soft Computing Genetic Algorithms.
COSC 4368 Machine Learning Organization
Presentation transcript:

Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay

Outline Brief Introduction to SVM Cross-Validation Methods for Parameter tuning Grid Search Genetic Algorithm Auto-tuning for Classification Results Conclusion Pattern Search for Regression

Support Vector Machines Classification - Given a set (x1, y1), (x2, y2),…, (xm, ym)  (X, Y) where X = set of input vectors and Y = set of classes, we are to predict the class y for an unseen x  X Regression - Given a set (x1, y1), (x2, y2),…, (xm, ym)  (X, Y) where X = set of input vectors and Y = set of values, we are to predict the value y for an unseen x  X A+ A- -Tube

Support Vector Machines Kernels - kernel maps the linearly non-separable data in to higher dimensional feature space where it may become linearly separable f

Support Vector Classification Soft Margin Classifier Optimization Problem minimize Subject to where C(>0) is the trade-off between margin maximization and training error minimization

10 fold Cross Validation Testing Training Testing

Cross-Validation Widely regarded as the best method to measure the generalization error (Test error) Training set is divided into p folds Training runs are done using all possible combinations of (p – 1) training folds Testing is done on the remaining fold for each run We are to find the parameter values for which average cross-validation error is minimum

Model Parameter Selection Consider RBF kernel and SVM Classification (Soft Margin Case) RBF Kernel is given by Two Parameters C (trade-off of soft margin case) and of Kernel. Benchmark Dataset – Breast Cancer (100 realizations) Change of parameters changes the test Error Parameters should be chosen such that test error is minimal

Approach Range input Raw data C, gamma Parameter selection SVM classifier misclass. error optimal C and gamma SVM classifier final results

Methods for parameter tuning Grid Search Genetic Algorithm Auto-tuning for Classification

Grid Search Two Dimensional Parameter Space 1000 714.3 1 428.5 2857, 571.5 428.5 1 428.5 1 C 5000 2142.8 3571.4 2142.8 C 3571.4 Two Dimensional Parameter Space

Grid Search Simple technique resembling exhaustive search Take exponentially increasing values in a particular range Find the set with minimum Cross-validation error Adjust the new range in the neighborhood of that chosen set Repeat the process until a satisfactory value for cross-validation error is obtained

Genetic Algorithm Genetic Algorithm is a subclass of “Evolutionary Computing” It is based on Darwin’s theory of evolution Widely accepted for parameter search and optimization Has a high probability of finding the global optimum

Genetic Algorithm - Steps Selection - “Survival of the fittest”. Choose the set of parameter values for which the objective function is optimal Cross-Over - Combine the chosen values Mutation - Modify the combined values to produce the next generation

Genetic Algorithm –Selection Set a criterion for choosing parents which will cross-over For example, Two individual or binary strings are selected with 1’s preferred over 0’s . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Genetic Algorithm – Cross - Over Combine the chosen parents to produce the offspring For example, two parents represented as binary strings performing cross–over 1 1 1 1 1 1 X 1 1 1 1

Genetic Algorithm - Mutation Structure of the produced offspring is changed Prevents the algorithm from being trapped in a local minima For example, the produced is mutated ( one bit position is flipped) 1 1

Genetic Algorithm - Coding Parameters are to be coded into strings before applying GA Real – Coded GA operates on real numbers Simulates the cross-over and mutation through various operators Simulated Binary Cross-over and polynomial mutation operators are used

Auto-tuning Consider a bound for the expected generalization error Try to minimize it by varying the parameters Apply well known minimization procedures to make this “automatic”

Generalization Error Estimates Validation Error - Keep a part of the training data for validation - Find the error while performing tests on validation set - Try to minimize the error on that set Leave One-out Error - Keep one element of training data set for testing - Do training on the remaining elements - Test the element which was previously removed - Do this for all training data elements - Provides an unbiased estimate of the expected generalization error

Leave-One-Out Bounds Span Bound where Sp is the distance between the point and where Radius-margin Bound where R is the radius of the smallest sphere enclosing all data points and M is the margin obtained from SVM optimization solution

Why Radius-margin Bound? It can be thought of an upper bound of the span-bound,which is an accurate estimate of the test error Minimization of the span-bound is more difficult to implement and to control(more local minima) Margin can be obtained from the solution of SVM optimization problem Radius can be calculated by solving a Quadratic optimization problem Soft-margin SVM can be easily incorporated by modifying the kernel of the hard margin version so that C will be considered just as another parameter of the kernel function

Auto-tuning - Steps M = 1 / ||w||, where ||w|| can be obtained by solving the problem: maximize subject to R is obtained by solving the Quadratic Optimization Problem

Auto-tuning - Steps Let θ = set of parameters. Steps are as follows:- Initialize θ to some value Using SVM find the maximum of W Update θ by a minimization method such that T is minimized Go to step 2 or stop when minimum of T is achieved

Results Methods are tested on five benchmark datasets Mean Error, Minimum error among 100 realizations, Maximum error among 100 realizations and std. deviation is reported Breast-Cancer Dataset Thyroid Dataset Titanic Dataset Heart Dataset Diabetics Dataset

Classification Results – Breast Cancer Dataset Number of train patterns : 200 Number of test patterns : 77 Input dimension : 9 Output dimension : 1 Methods Mean Error Min. Error Max. Error Standard Deviation Benchmark 26.04 4.74 Grid Search 27.22 14.58 36.36 4.75 Auto-tuning 27.47 16.88 3.97 Genetic Algorithm 25.40 15.58 33.77 4.39

Classification Results – Thyroid Dataset Number of train patterns : 140 Number of test patterns : 75 Input dimension : 5 Output dimension : 1 Methods Mean Error Min. Error Max. Error Standard Deviation Benchmark 4.80 2.19 Grid Search 4.32 8.00 1.74 Auto-tuning 4.56 9.333 2.02 Genetic Algorithm 4.44 10.667 2.43

Classification Results – Titanic Dataset Number of train patterns : 150 Number of test patterns : 2051 Input dimension : 3 Output dimension : 1 Methods Mean Error Min. Error Max. Error Standard Deviation Benchmark 22.42 1.02 Grid Search 23.08 21.55 33.21 1.18 Auto-tuning 23.01 20.87 1.33 Genetic Algorithm 22.66 21.69 1.11

Classification Results – Heart Dataset Number of train patterns : 170 Number of test patterns : 100 Input dimension : 13 Output dimension : 1 Methods Mean Error Min. Error Max. Error Standard Deviation Benchmark 15.95 3.26 Grid Search 15.49 8.00 23.00 3.29 Auto-tuning 15.65 3.21 Genetic Algorithm 15.87 10.00 25.00 3.27

Classification Results – Diabetis Dataset Number of train patterns : 468 Number of test patterns : 300 Input dimension : 8 Output dimension : 1 Methods Mean Error Min. Error Max. Error Standard Deviation Benchmark 23.53 1.73 Grid Search 23.14 19.33 26.67 1.17 Auto-tuning 23.68 27.33 1.68 Genetic Algorithm 23.69 19.00 28.33 1.71

Conclusion Grid Search is the best technique if the number of parameters is low as it does an exhaustive search on the parameter space Auto-tuning performs much less number of training runs in all cases Genetic Algorithm is quite steady and gives near-optimal solutions Future work would be to test these techniques for regression Analysis of pattern search method for regression

Support Vector Regression Regression Estimate Optimization Problem maximize subject to

Pattern Search Simple and efficient optimization technique No derivatives, only direct function evaluations are needed It gets rarely trapped in a bad local minima Converges rapidly to an optimum

Pattern Search

Pattern Search Patterns determine which points on the parameter space are searched Pattern is usually specified by a matrix. We have considered the matrix which corresponds to the pattern obtained from (x,y+d) d (x-d,y) d (x,y) d (x+d,y) d (x,y-d)

Pattern Search - Algorithm Cross-validation error is the function to be minimized Fix a pattern matrix Pk, set sk = 0 Given and , randomly pick an initial center of pattern qk Compute the function value f(qk) and set min f(qk) If < then stop For i =1 … (p -1) where p is the number of columns in Pk compute if < min then go to step 2

Thank You Mail your Questions/ Suggestions at: shibdas@gmail.com

Genetic Algorithm - Implementation Simulated Binary Cross - over - ui is chosen randomly between 0 and 1. - βi follows the distribution - find out such that cumulative probability density is ui

Genetic Algorithm – Implementation (Cont…) - Generate the offspring xi(1,t+1) and xi(2,t+1) from parents xi(1,t) and xi(2,t). Polynomial Mutation - A random number ri is selected between 0 and 1. - is found out such that cumulative probability of polynomial distribution up to is ri. The polynomial distribution can be written as: - Mutated offspring are obtained using the following rule: where and are respectively the upper and lower bound on xi.

LOO Bounds Jaakola-Haussler Bound Opper-Winther Bound where is the α’s obtained from the solution of SVM optimization problem in case of testing with ‘p’th training example and where is the step function when x > 0 and otherwise. is the number of elements in the training set. Opper-Winther Bound where KSV is the matrix of dot product of support vectors.

Support Vector Classification Finds the optimal hyper-plane which separates the two classes in feature space Decision Function Quadratic Optimization Problem minimize subject to for all i = 1…m