Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.

Slides:

Advertisements

Similar presentations

C&O 355 Lecture 15 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

Advertisements

C&O 355 Lecture 8 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Solving LP Models Improving Search Special Form of Improving Search

C&O 355 Mathematical Programming Fall 2010 Lecture 22 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

C&O 355 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.

A Newton Method for Linear Programming Olvi L. Mangasarian University of California at San Diego.

C&O 355 Mathematical Programming Fall 2010 Lecture 15 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Separating Hyperplanes

Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.

The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no.

MIT and James Orlin © Nonlinear Programming Theory.

Nonlinear Optimization for Optimal Control

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

Martin Burger Institut für Numerische und Angewandte Mathematik European Institute for Molecular Imaging CeNoS Total Variation and related Methods Numerical.

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Dynamic lot sizing and tool management in automated manufacturing systems M. Selim Aktürk, Siraceddin Önen presented by Zümbül Bulut.

Binary Classification Problem Learn a Classifier from the Training Set

Optimization Methods One-Dimensional Unconstrained Optimization

Unconstrained Optimization Problem

ISM 206 Lecture 4 Duality and Sensitivity Analysis.

1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.

Name: Mehrab Khazraei(145061) Title: Penalty or Exterior penalty function method professor Name: Sahand Daneshvar.

Normalised Least Mean-Square Adaptive Filtering

Mathematical Programming in Support Vector Machines

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

Numerical Computations in Linear Algebra. Mathematically posed problems that are to be solved, or whose solution is to be confirmed on a digital computer.

C&O 355 Lecture 2 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Solving Linear Programming Problems: The Simplex Method

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.

Privacy-Preserving Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla UCSD – Center for Computational Mathematics Seminar January 11, 2011.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.

EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.

Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.

CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.

Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.

Approximation Algorithms based on linear programming.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Exact Differentiable Exterior Penalty for Linear Programming

Chap 10. Sensitivity Analysis

Computational Optimization

6.5 Stochastic Prog. and Benders’ decomposition

Chap 9. General LP problems: Duality and Infeasibility

Chap 3. The simplex method

Chapter 3 The Simplex Method and Sensitivity Analysis

CS5321 Numerical Optimization

Uri Zwick – Tel Aviv Univ.

Chapter 8. General LP Problems

Chapter 5. The Duality Theorem

Chapter 8. General LP Problems

6.5 Stochastic Prog. and Benders’ decomposition

Chapter 8. General LP Problems

Chapter 2. Simplex method

Presentation transcript:

Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

Preliminaries Exterior penalty functions in linear and nonlinear programming –Exact (penalty parameter remains finite) Nondifferentiable –Asymptotic (penalty parameter approaches infinity) Differentiable Are there exact exterior penalty functions that are differentiable? –Yes for linear programs Which is the topic of this talk

Outline Sufficient exactness condition for dual exterior penalty function Exact primal solution computation from inexact dual exterior penalty function Independence of dual penalty function on penalty parameter Generalized Newton algorithm & its convergence DLE: Direct Linear Equation algorithm & its convergence Computational results Conclusion & outlook

The Primal & Dual Linear Programs Primal linear program Dual linear program

The Dual Exterior Penalty Problem Divide by  2 and let: Penalty problem becomes:

Exact Primal Solution Computation Any solution of the dual penalty problem: generates an exact solution y of the primal LP: for sufficiently large but finite  as follows: In addition this solution minimizes: over the solution set of the primal LP. Ref: Journal of Machine Learning Research 2006

Optimality Condition for Dual Exterior Penalty Problem & Exact Primal LP Solution A nasc for solving the dual penalty problem: is: where P 2 R m £ m is a diagonal matrix of ones and zeros defined as follows: Solving for u gives: which gives the following exact primal solution,

Sufficient Condition for Penalty Parameter  Note that in y = B 0 ( (BB 0 +P)\b ) +  ( B 0 ( (BB 0 + P) \ (Bd) ) - d ) y depends on  only through –The implicit dependence of P on u –The explicit dependence on  above Thus, if  is sufficiently large to ensure y is an exact solution of the linear program, then –P (i.e., the active constraint set) does not change with increasing  –B 0 ( (BB 0 + P) \ (Bd) ) - d = 0 Assumed to hold Ensured computationally

Generalized Newton Algorithm Solve the unconstrained problem f(u) = -b 0 u + ½(||B 0 u -  d|| 2 + ||(-u) + || 2 ) using a generalized Newton method Ordinary Newton method requires gradient and Hessian to compute the Newton direction - ( r 2 f(u) ) -1 r f(u), but f is not twice differentiable Instead of ordinary Hessian, we use the generalized Hessian, ∂ 2 f(u) and the generalized Newton direction (∂ 2 f(u)) -1 r f(u) – r f(u) = -b + B(B 0 u -  d) - (-u) + – ∂ 2 f(u) = BB 0 + diag ( sign ( (-u) + ) )

Generalized Newton Algorithm (JMLR 2006) minimize f(u) = -b 0 u + ½(||B 0 u -  d|| 2 + ||(-u) + || 2 ) 1)u i + 1 = u i + i t i t i = i (∂ 2 f(u i )) -1 r f(u i ) (generalized Newton direction) i = max {1, ½, ¼, …} s.t. f(u i ) - f(u i + i t i ) ¸ - i ¼ r f(u i ) 0 t i (Armijo stepsize) 2)Stop if || r f(u i )|| · tol & ||B 0 ((BB 0 +P i ) \ (Bd)) - d|| · tol P i = diag(sign((-u i ) + )) 3)If i = imax then  ! 10 , imax ! 2 ¢ imax 4)i ! i + 1 and go to (1)

Generalized Newton Algorithm Convergence Assume tol = 0 Assume B 0 ((BB 0 + P) \ (Bd)) - d = 0 implies that  is large enough that an exact solution to the primal is obtained Then either –The Generalized Newton Algorithm terminates at u i such that y = B 0 u i -  d is an exact solution to the primal, or –For any accumulation point ū of the sequence of iterates {u i }, y = B 0 ū -  d is an exact solution to the primal Exactness condition is incorporated as a termination criterion

Direct Linear Equation Algorithm f(u) = -b 0 u + ½(||B 0 u -  d|| 2 + ||(-u) + || 2 ) r f(u) = -b + B(B 0 u -  d) - (-u) + = -b + B(B 0 u -  d) + Pu r f(u) = 0, u = (BB 0 + P) -1 (  Bd + b) Successively solve r f(u) = 0 for updated values of the diagonal matrix P = diag ( sign ( (-u) + ) )

Direct Linear Equation Algorithm minimize f(u) = -b 0 u + ½(||B 0 u -  d|| 2 + ||(-u) + || 2 ) 1)P i = diag ( sign ( (-u i ) + ) ) 2)u i+1 = (BB 0 + P i ) \ (b +  Bd) 3)u i+1 ! u i + i (u i+1 - u i ) i is the Armijo stepsize 4)Stop if ||u i+1 - u i || · tol & ||B 0 ( (BB 0 +P i ) \ (Bd) ) - d|| · tol 5)If i = imax then  ! 10 , imax ! 2 ¢ imax 6)i ! i + 1 and go to (1)

Direct Linear Equation Algorithm Convergence Assume tol = 0 Assume B 0 ( (BB 0 + P) \ (Bd) ) - d = 0 implies that  is large enough that an exact solution to the primal is obtained, and that each matrix in the sequence {BB 0 + P i } is nonsingular Then either –The Direct Linear Equation Algorithm terminates at u i such that y = B 0 u i -  d is an exact solution to the primal, or –For any accumulation point ū of the sequence of iterates {u i }, y = B 0 ū -  d is an exact solution to the primal Exactness condition is incorporated as a termination criterion

Solving Primal LPs with More Constraints than Variables Difficulty: factoring BB 0 Solution: get exact solution to the dual which requires factoring a smaller matrix the size of B 0 B Given an exact solution of the dual, find the exact solution of the primal by solving where B 1 and B 2 correspond to u 1 > 0 and u 2 = 0 –Requires factoring matrices only of size B 0 B

Primal exterior penalty problem The Primal & Dual Linear Programs Dual linear program Primal linear program For sufficiently large , u = (-By +  b) + is an exact solution of the dual linear program Furthermore, this solution minimizes ||u|| 2 over the solution set of the dual linear program

Optimality Condition for the Primal Exterior Penalty Problem & Exact Dual LP Solution A nasc for solving the primal penalty problem: is: where Q 2 R ` £ ` is a diagonal matrix of ones and zeros defined as follows: Solving for y gives: y = (B 0 QB) \ (  B 0 Qb - d) which gives the following exact dual solution, u = (-By +  b) +, u = ( B ( (B 0 QB) \ d ) -  ( B ( (B 0 QB) \ (B 0 Qb) ) - b ) ) +

Sufficient Condition for Penalty Parameter  Note that in u = ( B ( (B 0 QB) \ d ) -  ( B ( (B 0 QB) \ (B 0 Qb) ) - b ) ) + u depends on  only through –Q, which depends on  through y and  –The explicit dependence on  above Thus,  is sufficiently large to ensure u is an exact solution of the linear program if –Q does not change with increasing  –diag ( sign(u) )( B ( (B 0 QB) \ (B 0 Qb) ) - b ) = 0 The subgradient with respect to 

Computational Details Cholesky factorization used for both methods –Ensure factorizability by adding a small multiple of the identity matrix –For example, BB 0 + P +  I for some small  –Other approaches left to future work Start with  = 100 for both methods –Newton method: occasionally increased to 1000 –Direct method:  not increased in our examples

Computational Results When B 0 ( (BB 0 +P) \ (Bd) ) - d = 0, optimal solution obtained –Tested on randomly generated linear programs –We know the optimal objective values –This condition is used as a stopping criterion –Relative difference from the true objective value and maximum constraint violation less then 1e-3, and often smaller than 1e-6 B 0 ( (BB 0 + P) \ (Bd) ) - d = 0 satisfied efficiently –Our algorithms are compared against the commercial LP package CPLEX 9.0 (simplex and barrier methods) –Our algorithms are implemented using MATLAB 7.3

Running Time Versus Linear Program Size Problems with the Same Number of Variables and Constraints Number of variables (= number of constraints) Average seconds to solution

Average Seconds to Solve 10 Random Linear Programs with 100 Variables and Increasing Numbers of Constraints ConstraintsCPLEXNewton LPDLE 1, , , ,000,

VariablesCPLEXNewton LPDLE 1, , , ,000, Average Seconds to Solve 10 Random Linear Programs with 100 Constraints and Increasing Numbers of Variables

Conclusion Presented sufficient conditions for obtaining an exact solution to a primal linear program from a classical dual exterior penalty function Precise termination condition given for –Newton algorithm for linear programming (JMLR 2006) –Direct method based on solving the optimality condition of the convex penalty function Algorithms efficiently obtain optimal solutions using the precise termination condition

Future Work Deal with larger linear programs Application to real-world linear programs Direct methods for other optimization problems, e.g. linear complementarity problems Further improvements to performance and robustness

Links to Talk & Papers