Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support vector machine
Support Vector Machines
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Kernel Technique Based on Mercer’s Condition (1909)
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Reduced Support Vector Machine
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines
Unconstrained Optimization Problem
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Lecture 10: Support Vector Machines
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Classification and Regression
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Mathematical Programming in Support Vector Machines
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Efficient Model Selection for Support Vector Machines
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Support Vector Machines Optimization objective Machine Learning.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Computer Sciences Dept. University of Wisconsin - Madison
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
Machine Learning Week 3.
Identification of Wiener models using support vector regression
Concave Minimization for Support Vector Machine Classifiers
COSC 4368 Machine Learning Organization
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, BUPA Liver 345 x 6, Ionosphere 351 x 34, Pima Indians 768 x 8, Tic-Tac-Toe 958 x 9, Mushroom 8124 x 22, N/A

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) %

Time( CPU sec. ) Training Set Size RSVM SMO PCGC

Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that guarantees the smallest overall experiment error made by  Motivated by SVM:  should be as small as possible  Some tiny error should be discard

-Insensitive Loss Function  -insensitive loss function:  The loss made by the estimation function, at the data point is  If then is defined as:

-Insensitive Linear Regression Find with the smallest overall error

- insensitive Support Vector Regression Model Motivated by SVM:  should be as small as possible  Some tiny error should be discarded where

Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and computational complexity for solving the problem

SV Regression by Minimizing Quadratic -Insensitive Loss  We minimizeat the same time  Occam’s razor : the simplest is the best  We have the following (nonsmooth) problem: where  Have the strong convexity of the problem

- insensitive Loss Function

Quadratic -insensitive Loss Function

-function replaceUse Quadratic -insensitive Function whichis defined by -function with

-insensitive Smooth Support Vector Regression strongly convex This problem is a strongly convex minimization problem without any constrains twice differentiable Newton-Armijo method The object function is twice differentiable thus we can use a fast Newton-Armijo method to solve this problem

Nonlinear -SVR Based on duality theorem and KKT – optimality conditions In nonlinear case :

Nonlinear SVR Let and Nonlinear regression function :

Nonlinear Smooth Support Vector -insensitive Regression

Slice method Training set and testing set (Slice method) Gaussian kernel Gaussian kernel is used to generate nonlinear -SVR in all experiments Reduced kernel technique Reduced kernel technique is utilized when training dataset is bigger then 1000 Error measure : 2-norm relative error Numerical Results : observations : predicted values

+noise Noise: mean=0, 101 points Parameter: Training time : 0.3 sec. 101 Data Points in Nonlinear SSVR with Kernel:

First Artificial Dataset random noise with mean=0,standard deviation 0.04 Training Time : sec. Error : Training Time : sec. Error : SSVR LIBSVM

Original Function Noise : mean=0, Parameter : Training time : 9.61 sec. Mean Absolute Error (MAE) of 49x49 mesh points : Estimated Function 481 Data Points in

Noise : mean=0, Estimated Function Original Function Using Reduced Kernel: Parameter : Training time : sec. MAE of 49x49 mesh points :

Real Datasets

Linear -SSVR Tenfold Numerical Result

Nonlinear -SSVR Tenfold Numerical Result 1/2

Nonlinear -SSVR Tenfold Numerical Result 2/2

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, BUPA Liver 345 x 6, Ionosphere 351 x 34, Pima Indians 768 x 8, Tic-Tac-Toe 958 x 9, Mushroom 8124 x 22, N/A

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) %

Time( CPU sec. ) Training Set Size RSVM SMO PCGC