On parallelizing dual decomposition in stochastic integer programming

Slides:

Advertisements

Similar presentations

Column Generation: Theory, Algorithms and Applications Dr Natashia Boland Department of Mathematics and Statistics The University of Melbourne.

Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Electrical and Computer Engineering Mississippi State University

Lecture 10: Integer Programming & Branch-and-Bound

Scalable Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing.

Applications of Stochastic Programming in the Energy Industry Chonawee Supatgiat Research Group Enron Corp. INFORMS Houston Chapter Meeting August 2, 2001.

Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National.

1 Logic-Based Benders Methods for Planning and Scheduling John Hooker Carnegie Mellon University August 2003.

1 Logic-Based Methods for Global Optimization J. N. Hooker Carnegie Mellon University, USA November 2003.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Scalable Multi-Stage Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory DOE Applied.

Computational Methods for Management and Economics Carla Gomes

Revisiting the Optimal Scheduling Problem Sastry Kompella 1, Jeffrey E. Wieselthier 2, Anthony Ephremides 3 1 Information Technology Division, Naval Research.

1 A Second Stage Network Recourse Problem in Stochastic Airline Crew Scheduling Joyce W. Yen University of Michigan John R. Birge Northwestern University.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Michel Gendreau CIRRELT and MAGI École Polytechnique de Montréal

1 Planning and Scheduling to Minimize Tardiness John Hooker Carnegie Mellon University September 2005.

1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.

LP formulation of Economic Dispatch

Higher-order Confidence Intervals for Stochastic Programming using Bootstrapping Cosmin G. Petra Joint work with Mihai Anitescu Mathematics and Computer.

RAS Problem Solving Competition 2012 INFORMS Annual Meeting 2012, Phoenix, Oct. 14, 2012 Mixed-integer Programming Based Approaches for.

1 Capacity Planning under Uncertainty Capacity Planning under Uncertainty CHE: 5480 Economic Decision Making in the Process Industry Prof. Miguel Bagajewicz.

Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory Joint work with.

Optimization Under Uncertainty: Structure-Exploiting Algorithms Victor M. Zavala Assistant Computational Mathematician Mathematics and Computer Science.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory Joint work with.

Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.

1 Using Composite Variable Modeling to Solve Integrated Freight Transportation Planning Problems Sarah Root University of Michigan IOE November 6, 2006.

Column Generation Approach for Operating Rooms Planning Mehdi LAMIRI, Xiaolan XIE and ZHANG Shuguang Industrial Engineering and Computer Sciences Division.

Optimization for Operation of Power Systems with Performance Guarantee

Scalable Multi-Stage Stochastic Programming

A Decomposition Heuristic for Stochastic Programming Natashia Boland +, Matteo Fischetti*, Michele Monaci*, Martin Savelsbergh + + Georgia Institute of.

1 Outline:  Outline of the algorithm  MILP formulation  Experimental Results  Conclusions and Remarks Advances in solving scheduling problems with.

A Distributed Framework for Correlated Data Gathering in Sensor Networks Kevin Yuen, Ben Liang, Baochun Li IEEE Transactions on Vehicular Technology 2008.

Linear Programming Topics General optimization model LP model and assumptions Manufacturing example Characteristics of solutions Sensitivity analysis Excel.

An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*

Transformation of Timed Automata into Mixed Integer Linear Programs Sebastian Panek.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Quasi-static Channel Assignment Algorithms for Wireless Communications Networks Frank Yeong-Sung Lin Department of Information Management National Taiwan.

1 Optimization Based Power Generation Scheduling Xiaohong Guan Tsinghua / Xian Jiaotong University.

MILP algorithms: branch-and-bound and branch-and-cut

Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

Optimal Fueling Strategies for Locomotive Fleets in Railroad Networks Seyed Mohammad Nourbakhsh Yanfeng Ouyang 1 William W. Hay Railroad Engineering Seminar.

1 Lagrangean Relaxation --- Bounding through penalty adjustment.

Pricing with Markups in Competitive Markets with Congestion Nicolás Stier-Moses, Columbia Business School Joint work with José Correa, Universidad Adolfo.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.

Integer LP In-class Prob

Simplified Benders cuts for Facility Location

A Local Relaxation Approach for the Siting of Electrical Substations Walter Murray and Uday Shanbhag Systems Optimization Laboratory Department of Management.

Resource Allocation in Hospital Networks Based on Green Cognitive Radios 王冉茵

Integer Programming, Branch & Bound Method

The Impact of Intermittent Renewable Energy Sources on Wholesale Electricity Prices Prof. Dr. Felix Müsgens, Thomas Möbius USAEE-Conference Pittsburgh,

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

MS&E 348 The L-Shaped Method: Theory and Example 2/5/04 Lecture.

Water resources planning and management by use of generalized Benders decomposition method to solve large-scale MINLP problems By Prof. André A. Keller.

Signal processing and Networking for Big Data Applications: Lecture 9 Mix Integer Programming: Benders decomposition And Branch & Bound NOTE: To change.

6.5 Stochastic Prog. and Benders’ decomposition

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Nonnegative polynomials and applications to learning

Chapter 6. Large Scale Optimization

Department of Information Management National Taiwan University

Matteo Fischenders, University of Padova

Matteo Fischetti, University of Padova

6.5 Stochastic Prog. and Benders’ decomposition

Chapter 6. Large Scale Optimization

Presentation transcript:

On parallelizing dual decomposition in stochastic integer programming Cosmin Petra Mathematics and Computer Science Division Argonne National Laboratory, USA Joint work with Miles Lubin (MIT), Burhaneddin Sandikçi, Kipp Martin (U Chicago) INFORMS Annual Meeting Oct 2012

Overview Block-angular structure Motivation: stochastic optimization of the power grid Revisiting dual decomposition algorithm of Carøe and Schultz Parallelizing the solution to Lagrangian dual Parallel numerical experiments

Stochastic Formulation Discrete distribution leads to block-angular (MI)LP

Large-scale (dual) block-angular LPs Extensive form In terminology of stochastic LPs: First-stage variables (decision now): x0 Second-stage variables (recourse decision): x1, …, xN Each diagonal block is a realization of a random variable (scenario)

Stochastic Optimization and the Power Grid Unit Commitment: Determine optimal on/off schedule of thermal (coal, natural gas, nuclear) generators. Day-ahead market prices. (hourly) Mixed-integer Economic Dispatch: Set real-time market prices. (every 5-10 min.) Continuous Linear/Quadratic Challenge: Integrate energy produced by highly variable renewable sources into these control systems. Minimize operating costs, subject to: Physical generation and transmission constraints Reserve levels Demand …

Stochastic unit commitment with wind power Scenarios obtained using numerical weather prediction codes Real-time grid-nested 24h parallel simulation using WRF Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu

Computational challenges and difficulties in power grid May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty “Large” scenarios (Wi up to 100,000 x 100,000) “Large” 1st stage (1,000s, 10,000s of variables) Easy to build a practical instance that requires 100+ GB of RAM to solve  Requires distributed memory Integer constraints Real-time solution needed in our applications

Dual decomposition - formulation Feasibility sets Extensive form in this notation is Split-variable formulation Non-anticipativity constraints are explicitly enforced

Dual decomposition (Carøe and Schultz, 1999) Apply Lagrangian relaxation (LR) to non-anticipativity constraints For mixed-integer problems, LR provides a lower bound on the optimal value Branch and bound on non-anticipativity variables

Dual decomposition – computational appeal Relaxing non-anticipativity decouples the scenarios Typically better lower bound than from LP relaxation Lagrangian dual has similar computational pattern to Benders At each iteration, solve a continuous master problem and an independent MILP for each scenario Trivially parallel? Not quite, there are interesting algorithmic and computational questions, as we'll see. To our knowledge, no previously published parallel implementations, although scope for parallelism has been observed

“State variable” formulation of non-anticipativity Caroe and Schultz consider r-1 constraints in the form We use “state variable” formulation with r constraints (Sen, 2005) Lagrangian dual problem can be stated as Will see later why this formulation is useful for parallel computations

Characterization of the solution Proposition (Carøe and Schultz) Optimal objective of Lagrangian dual is that of a partially convexified LP relaxation Theoretical equivalence with Lulli and Sen’s (2004) branch-and-price (column generation) However, solving the Lagrangian dual also gives a solution to (1) (useful in branch-and-bound) The optimal value equals the optimal value of the linear program Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Optimization of the Lagrangian Dual Each is concave and non-differentiable For a fixed , is a subgradient of , where By solving/evaluating (mixed-integer LP), we get at least one subgradient for free Carøe and Schultz suggests using proximal bundle black-box solvers. Alternatives are other variants of cutting-plane algorithms (boxstep or level regularization) - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Let’s open the proximal bundle black box Proximal bundle QP master - is the trial point, the regularization parameter The dual, below, is typically advantageous for computation The proximal bundle master QP has a (dual) block-angular structure - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Dual of Proximal Bundle QP Master Block-angular structure Direct result of equality-constrained formulation, doesn't hold for other formulations of non-anticipativity. But also applies to other forms of regularization.

Computational significance The master QP is overall sparse (but may be block dense). Dense QP solvers, which are typically used in black-box proximal bundle codes, can't efficiently solve it. Master QP’s structure can be exploited: PIPS (Petra) or OOPS(Gondzio) With the ability to solve the master in parallel, we address a serial bottleneck of execution. Now, both MILP subproblems and QP master can be solved in parallel. Greater potential for parallel speedup (Amdahl's law)

Numerical experiments Implementation in C++ using MPI. Looking at solving Lagrangian dual, no branching implemented SCIP used for MILP subproblems. No “compression of bundle” (removing cuts/subgradients) Serial experiments Cutting plane vs. black-box proximal bundle vs. our implementation Parallel experiments Scalability of full proximal bundle algorithm More detailed look at scalability of parallel QP solver

Test instances Stochastic mixed-integer instances dcap and sslp from SIPLIB by Shabbir Ahmed Stochastic LP product instance from Huseyin Topaloglu

Test architecture “Fusion” high-performance cluster at Argonne 320 nodes InfiniBand QDR interconnect Two 2.6 Ghz Xeon processors per node (total 8 cores) Most nodes have 36 GB of RAM, some have 96 GB We use 1 MPI process (= parallel process) per core

Serial experiments OOQP - General sparsity-exploiting QP IPM solver Time limit 7200 seconds OOQP - General sparsity-exploiting QP IPM solver PIPS - Specialized IPM solver for block-angular structure ConicBundle - Open-source off-the-shelf proximal bundle code

Parallel experiments – serial master, parallel subproblems

Parallel experiments – parallel master, parallel subproblems - Overall 10x speedup vs. 2x speedup over on dcap332 by solving master in parallel - Master has little eect for sslp 10 15. Also little speedup by solving subproblems in parallel. Imbalance in time to solve MILP subproblems. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Scalability of master QP Scope for parallelism in solving master QP depends on number of linking variables (= number of first-stage variables) being small relative to diagonal blocks (= subgradient cuts per scenario). Stochastic integer test problems have very small first stage (order of 10). Important to consider performance on larger first stage for practical problems. Stochastic LP product instances used for this (1,000 scenarios) We look only at QP solves, not proximal bundle convergence.

Small first stage

Medium first stage

Large first stage

Energy application – stochastic unit commitment State of Illinois power grid 12-hour horizon 64 scenarios 3,621,180 vars. 3,744,468 cons. 3,132 binary LP Relaxation objective: 939,208 LP Relaxation + CglProbing cuts: 939,626 Feasible solution (rounding): 942,237 Optimality gap: 0.27% (0.5% is acceptable in industry practice) Lagrangian relaxation: 941,176 Feasible solution (rounding): 943,351 Optimality gap: 0.23% (combined: 0.11%)

Conclusions and future work Revisited dual decomposition from the perspective of parallel computation, addressed bottleneck of solving master Dual decomposition promising approach for parallel solution of stochastic mixed-integer programs More work needed to address load imbalance, perhaps asynchronism as in Linderoth and Wright Branch and bound implementation Large scale computational study for stochastic unit commitment

Weather System Operator ON/OFF Generation Levels Demand Distribution Supplier Weather Consumer