Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends on almost entire dataset Complexity of nonlinear SSVM Runs out of memory while storing the kernel matrix Long CPU time to compute the dense kernel matrix Need to generate and store entries Need to store the entire dataset even after solving the problem
Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots
Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, BUPA Liver 345 x 6, Ionosphere 351 x 34, Pima Indians 768 x 8, Tic-Tac-Toe 958 x 9, Mushroom 8124 x 22, N/A
RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) %
Time( CPU sec. ) Training Set Size RSVM SMO PCGC
Support Vector Regression (Linear Case:) Given the training set: Find a linear function, where is determined by solving a minimization problem that guarantees the smallest overall experiment error made by Motivated by SVM: should be as small as possible Some tiny error should be discard
-Insensitive Loss Function -insensitive loss function: The loss made by the estimation function, at the data point is If then is defined as:
-Insensitive Linear Regression Find with the smallest overall error
- insensitive Support Vector Regression Model Motivated by SVM: should be as small as possible Some tiny error should be discarded where
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and computational complexity for solving the problem
SV Regression by Minimizing Quadratic -Insensitive Loss We minimizeat the same time Occam’s razor : the simplest is the best We have the following (nonsmooth) problem: where Have the strong convexity of the problem
- insensitive Loss Function
Quadratic -insensitive Loss Function
-function replaceUse Quadratic -insensitive Function whichis defined by -function with
-insensitive Smooth Support Vector Regression strongly convex This problem is a strongly convex minimization problem without any constrains twice differentiable Newton-Armijo method The object function is twice differentiable thus we can use a fast Newton-Armijo method to solve this problem
Nonlinear -SVR Based on duality theorem and KKT – optimality conditions In nonlinear case :
Nonlinear SVR Let and Nonlinear regression function :
Nonlinear Smooth Support Vector -insensitive Regression
Slice method Training set and testing set (Slice method) Gaussian kernel Gaussian kernel is used to generate nonlinear -SVR in all experiments Reduced kernel technique Reduced kernel technique is utilized when training dataset is bigger then 1000 Error measure : 2-norm relative error Numerical Results : observations : predicted values
+noise Noise: mean=0, 101 points Parameter: Training time : 0.3 sec. 101 Data Points in Nonlinear SSVR with Kernel:
First Artificial Dataset random noise with mean=0,standard deviation 0.04 Training Time : sec. Error : Training Time : sec. Error : SSVR LIBSVM
Original Function Noise : mean=0, Parameter : Training time : 9.61 sec. Mean Absolute Error (MAE) of 49x49 mesh points : Estimated Function 481 Data Points in
Noise : mean=0, Estimated Function Original Function Using Reduced Kernel: Parameter : Training time : sec. MAE of 49x49 mesh points :
Real Datasets
Linear -SSVR Tenfold Numerical Result
Nonlinear -SSVR Tenfold Numerical Result 1/2
Nonlinear -SSVR Tenfold Numerical Result 2/2
Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends on almost entire dataset Complexity of nonlinear SSVM Runs out of memory while storing the kernel matrix Long CPU time to compute the dense kernel matrix Need to generate and store entries Need to store the entire dataset even after solving the problem
Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots
Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, BUPA Liver 345 x 6, Ionosphere 351 x 34, Pima Indians 768 x 8, Tic-Tac-Toe 958 x 9, Mushroom 8124 x 22, N/A
RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) %
Time( CPU sec. ) Training Set Size RSVM SMO PCGC