On parallelizing dual decomposition in stochastic integer programming Cosmin Petra Mathematics and Computer Science Division Argonne National Laboratory, USA Joint work with Miles Lubin (MIT), Burhaneddin Sandikçi, Kipp Martin (U Chicago) INFORMS Annual Meeting Oct 2012
Overview Block-angular structure Motivation: stochastic optimization of the power grid Revisiting dual decomposition algorithm of Carøe and Schultz Parallelizing the solution to Lagrangian dual Parallel numerical experiments
Stochastic Formulation Discrete distribution leads to block-angular (MI)LP
Large-scale (dual) block-angular LPs Extensive form In terminology of stochastic LPs: First-stage variables (decision now): x0 Second-stage variables (recourse decision): x1, …, xN Each diagonal block is a realization of a random variable (scenario)
Stochastic Optimization and the Power Grid Unit Commitment: Determine optimal on/off schedule of thermal (coal, natural gas, nuclear) generators. Day-ahead market prices. (hourly) Mixed-integer Economic Dispatch: Set real-time market prices. (every 5-10 min.) Continuous Linear/Quadratic Challenge: Integrate energy produced by highly variable renewable sources into these control systems. Minimize operating costs, subject to: Physical generation and transmission constraints Reserve levels Demand …
Stochastic unit commitment with wind power Scenarios obtained using numerical weather prediction codes Real-time grid-nested 24h parallel simulation using WRF Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu
Computational challenges and difficulties in power grid May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty “Large” scenarios (Wi up to 100,000 x 100,000) “Large” 1st stage (1,000s, 10,000s of variables) Easy to build a practical instance that requires 100+ GB of RAM to solve Requires distributed memory Integer constraints Real-time solution needed in our applications
Dual decomposition - formulation Feasibility sets Extensive form in this notation is Split-variable formulation Non-anticipativity constraints are explicitly enforced
Dual decomposition (Carøe and Schultz, 1999) Apply Lagrangian relaxation (LR) to non-anticipativity constraints For mixed-integer problems, LR provides a lower bound on the optimal value Branch and bound on non-anticipativity variables
Dual decomposition – computational appeal Relaxing non-anticipativity decouples the scenarios Typically better lower bound than from LP relaxation Lagrangian dual has similar computational pattern to Benders At each iteration, solve a continuous master problem and an independent MILP for each scenario Trivially parallel? Not quite, there are interesting algorithmic and computational questions, as we'll see. To our knowledge, no previously published parallel implementations, although scope for parallelism has been observed
“State variable” formulation of non-anticipativity Caroe and Schultz consider r-1 constraints in the form We use “state variable” formulation with r constraints (Sen, 2005) Lagrangian dual problem can be stated as Will see later why this formulation is useful for parallel computations
Characterization of the solution Proposition (Carøe and Schultz) Optimal objective of Lagrangian dual is that of a partially convexified LP relaxation Theoretical equivalence with Lulli and Sen’s (2004) branch-and-price (column generation) However, solving the Lagrangian dual also gives a solution to (1) (useful in branch-and-bound) The optimal value equals the optimal value of the linear program Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"
Optimization of the Lagrangian Dual Each is concave and non-differentiable For a fixed , is a subgradient of , where By solving/evaluating (mixed-integer LP), we get at least one subgradient for free Carøe and Schultz suggests using proximal bundle black-box solvers. Alternatives are other variants of cutting-plane algorithms (boxstep or level regularization) - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"
Let’s open the proximal bundle black box Proximal bundle QP master - is the trial point, the regularization parameter The dual, below, is typically advantageous for computation The proximal bundle master QP has a (dual) block-angular structure - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"
Dual of Proximal Bundle QP Master Block-angular structure Direct result of equality-constrained formulation, doesn't hold for other formulations of non-anticipativity. But also applies to other forms of regularization.
Computational significance The master QP is overall sparse (but may be block dense). Dense QP solvers, which are typically used in black-box proximal bundle codes, can't efficiently solve it. Master QP’s structure can be exploited: PIPS (Petra) or OOPS(Gondzio) With the ability to solve the master in parallel, we address a serial bottleneck of execution. Now, both MILP subproblems and QP master can be solved in parallel. Greater potential for parallel speedup (Amdahl's law)
Numerical experiments Implementation in C++ using MPI. Looking at solving Lagrangian dual, no branching implemented SCIP used for MILP subproblems. No “compression of bundle” (removing cuts/subgradients) Serial experiments Cutting plane vs. black-box proximal bundle vs. our implementation Parallel experiments Scalability of full proximal bundle algorithm More detailed look at scalability of parallel QP solver
Test instances Stochastic mixed-integer instances dcap and sslp from SIPLIB by Shabbir Ahmed Stochastic LP product instance from Huseyin Topaloglu
Test architecture “Fusion” high-performance cluster at Argonne 320 nodes InfiniBand QDR interconnect Two 2.6 Ghz Xeon processors per node (total 8 cores) Most nodes have 36 GB of RAM, some have 96 GB We use 1 MPI process (= parallel process) per core
Serial experiments OOQP - General sparsity-exploiting QP IPM solver Time limit 7200 seconds OOQP - General sparsity-exploiting QP IPM solver PIPS - Specialized IPM solver for block-angular structure ConicBundle - Open-source off-the-shelf proximal bundle code
Parallel experiments – serial master, parallel subproblems
Parallel experiments – parallel master, parallel subproblems - Overall 10x speedup vs. 2x speedup over on dcap332 by solving master in parallel - Master has little eect for sslp 10 15. Also little speedup by solving subproblems in parallel. Imbalance in time to solve MILP subproblems. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"
Scalability of master QP Scope for parallelism in solving master QP depends on number of linking variables (= number of first-stage variables) being small relative to diagonal blocks (= subgradient cuts per scenario). Stochastic integer test problems have very small first stage (order of 10). Important to consider performance on larger first stage for practical problems. Stochastic LP product instances used for this (1,000 scenarios) We look only at QP solves, not proximal bundle convergence.
Small first stage
Medium first stage
Large first stage
Energy application – stochastic unit commitment State of Illinois power grid 12-hour horizon 64 scenarios 3,621,180 vars. 3,744,468 cons. 3,132 binary LP Relaxation objective: 939,208 LP Relaxation + CglProbing cuts: 939,626 Feasible solution (rounding): 942,237 Optimality gap: 0.27% (0.5% is acceptable in industry practice) Lagrangian relaxation: 941,176 Feasible solution (rounding): 943,351 Optimality gap: 0.23% (combined: 0.11%)
Conclusions and future work Revisited dual decomposition from the perspective of parallel computation, addressed bottleneck of solving master Dual decomposition promising approach for parallel solution of stochastic mixed-integer programs More work needed to address load imbalance, perhaps asynchronism as in Linderoth and Wright Branch and bound implementation Large scale computational study for stochastic unit commitment
Weather System Operator ON/OFF Generation Levels Demand Distribution Supplier Weather Consumer