Download presentation
Presentation is loading. Please wait.
Published byBethanie Stafford Modified over 8 years ago
1
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors December 3, 1999 http://www.cs.wisc.edu/~musicant
2
Chunking with 1 billion nonzero elements
3
Outline l Problem Formulation –New formulation of Support Vector Regression (SVR) –Theoretically close to LP formulation of Smola, Schölkopf, Rätsch –Interpretation of perturbation parameter l Numerical Comparisons –Speed comparisons of our method and prior formulations l Massive Regression –Chunking methods for solving large problems Row chunking Row-column chunking l Conclusions & Future Work
4
Support Vector Tolerant Regression -insensitive interval within which errors are tolerated l can improve performance on testing sets by avoiding overfitting
5
Deriving the SVR Problem l m points in R n, represented by an m x n matrix A. l is the vector to be approximated. We wish to solve: Let w be represented by the dual formulation This suggests replacing AA’ by a general nonlinear kernel K(A,A’): Measure the error by s, with a tolerance bound errors tolerance (e is a vector of ones)
6
Deriving the SVR Problem (continued) Add regularization term and minimize the error with weight C > 0: Parametrically maximize the tolerance via parameter. This maximizes the minimum error component, thereby resulting in error uniformity. bound errors tolerance regularizationerror regularization errorinterval size bound errors tolerance regularization
7
Equivalent to Smola, Schölkopf, Rätsch (SSR) Formulation l Our formulation single error bound tolerance as a constraint
8
l Smola, Schölkopf, Rätsch multiple error bounds
9
l Reduction in: –Variables: 4m+2 --> 3m+2 –Solution time
10
Equivalent to Smola, Schölkopf, Rätsch (SSR) Formulation l Our formulation l Smola, Schölkopf, Rätsch l Reduction in: –Variables: 4m+2 --> 3m+2 –Solution time single error bound multiple error bounds tolerance as a constraint
11
l Perturbation theory results show there exists a fixed such that: l For all –we solve the above stabilized least 1-norm problem –additionally we maximize the least error component As goes from 0 to 1, –least error component is monotonically nondecreasing function of Natural interpretation for l our linear program is equivalent to classical stabilized least 1-norm approximation problem
12
Numerical Testing l Two sets of tests –Compare computational times of our method (MM) and the SSR method –Row-column chunking for massive datasets l Datasets: –US Census Bureau Adult Dataset: 300,000 points in R 11 –Delve Comp-Activ Dataset: 8192 points in R 13 –UCI Boston Housing Dataset: 506 points in R 13 –Gaussian noise was added to each of these datasets. l Hardware: Locop2: Dell PowerEdge 6300 server with: –Four gigabytes of memory, 36 gigabytes of disk space –Windows NT Server 4.0 –CPLEX 6.5 solver
13
is a parameter which needs to be determined experimentally Use a hold-out tuning set to determine optimal value for l Algorithm: = 0 while (tuning set accuracy continues to improve) { Solve LP = + 0.1 } l Run for both our method and SSR methods and compare times Experimental Process
14
Comparison Results
15
Linear Programming Row Chunking l Basic approach: (PSB/OLM) for classification problems l Classification problem is solved for a subset, or chunk of constraints (data points) l Those constraints with positive multipliers are preserved and integrated into next chunk (support vectors) l Objective function is montonically nondecreasing l Dataset is repeatedly scanned until objective function stops increasing
16
Innovation: Simultaneous Row-Column Chunking l Mapping of data points to constraints –Classification: Each data point yields one constraint. –Regression: Each data point yields two constraints. Row- Column Chunking manages which constraint to maintain for next chunk. l Fixing dual variables at upper bounds for efficiency –Classification: Simple to do since problem is coded in its dual formulation. Any support vectors with dual variables at upper bound are held constant in successive chunks. –Regression: Primal formulation was used for efficiency purposes. We therefore aggregated all constraints with fixed multipliers to yield a single constraint.
17
Innovation: Simultaneous Row-Column Chunking l Large number of columns –Row Chunking Implemented for a linear kernel only. Cannot handle problems with large numbers of variables, and hence limited practically to linear kernels. –Row-Column Chunking Implemented for a general nonlinear kernel. New data increase the dimensionality of K(A,A’) by adding both rows and columns (variables) to the problem. We handle this with row-column chunking.
18
while (problem termination criteria not satisfied) { choose a set of rows from the problem as a row chunk while (row chunk termination criteria not satisfied) { from this row chunk, select a set of columns solve the LP allowing only these columns as variables add those columns with nonzero values to the next column chunk } add those rows with nonzero dual multipliers to the next row chunk } Row-Column Chunking Algorithm
19
Row-Column Chunking Diagram Step 1aStep 1bStep 1c Step 2aStep 2b Step 2c Step 3aStep 3bStep 3c loop
20
Chunking Experimental Results
21
Objective Value & Tuning Set Error for Billion-Element Matrix
22
Conclusions and Future Work l Conclusions –Support Vector Regression can be handled more efficiently using improvements on previous formulations –Row-column chunking is a new approach which can handle massive regression problems l Future work –Generalizing to other loss functions, such as Huber M-estimator –Extension to larger problems using parallel processing for both linear and quadratic programming formulations
23
Questions?
24
LP Perturbation Regime #1 l Our LP is given by: When = 0, the solution is the stabilized least 1- norm solution. l Therefore, by LP Perturbation Theory, there exists a such that –The solution to the LP with is a solution to the least 1-norm problem that also maximizes .
25
LP Perturbation Regime #2 l Our LP can be rewritten as: l Similarly, by LP Perturbation Theory, there exists a such that –The solution to the LP with is the solution that minimizes least error ( ) among all minimizers of average tolerated error.
26
Motivation for dual variable substitution l Primal: l Dual:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.