Download presentation
Presentation is loading. Please wait.
Published byAlan Sutton Modified over 9 years ago
1
Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov SIAM CSE 2013 Joint work with Olaf Schenk(USI Lugano), Miles Lubin (MIT), Klaus Gaertner(WIAS Berlin)
2
Outline Application of HPC to power-grid optimization under uncertainty Parallel interior-point solver (PIPS-IPM) –structure exploiting Revisiting linear algebra Experiments on BG/P with the new features 2
3
Stochastic unit commitment with wind power Wind Forecast – WRF(Weather Research and Forecasting) Model –Real-time grid-nested 24h simulation –30 samples require 1h on 500 CPUs (Jazz@Argonne) 3 Slide courtesy of V. Zavala & E. Constantinescu Wind farm Thermal generator
4
Stochastic Formulation Discrete distribution leads to block-angular LP 4
5
Large-scale (dual) block-angular LPs 5 In terminology of stochastic LPs: First-stage variables (decision now): x 0 Second-stage variables (recourse decision) : x 1, …, x N Each diagonal block is a realization of a random variable (scenario) Extensive form
6
Computational challenges and difficulties May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty “Large” scenarios ( W i up to 250,000 x 250,000) “Large” 1 st stage (1,000s, 10,000s of variables) Easy to build a practical instance that requires 100+ GB of RAM to solve Requires distributed memory Real-time solution needed in our applications 6
7
Linear algebra of primal-dual interior-point methods (IPM) 7 subj. to. Min Convex quadratic problem IPM Linear System Two-stage SP arrow-shaped linear system (modulo a permutation) Multi-stage SP nested N is the number of scenarios 2 solves per IPM iteration - predictor directions - corrector directions
8
Special Structure of KKT System (Arrow-shaped) 8
9
Parallel Solution Procedure for KKT System Steps 1 and 5 trivially parallel –“Scenario-based decomposition” Steps 1,2,3 are >95% of total execution time. 9
10
Components of Execution Time Notice break in y-axis scale 10
11
Scenario Calculations – Steps 1 and 5 Each scenario is assigned to an MPI process, which locally performs steps 1 and 5. Matrices are sparse and symmetric indefinite (symmetric with positive and negative eigenvalues). Computing is very expensive when solving with the factors of against non-zero columns of and multiplying from left with 4 hours 10 minutes wall time to solve a 4h-horizon problem with 8k scenarios on 8k nodes. Need to run under strict time requirements –For example, solve 24h-horizon problem in less than 1h 11
12
Revisiting scenario computations for shared-memory Multiple sparse right-hand sides Triangular solves phase hard to parallelize in shared-memory (multi-core) Factorization phase speeds up very well and achieves considerable peak- performance Our approach: incomplete factorization of Stop factorization after the elimination of (1,1) block will sit in the (2,2) block (Schur complement) 12
13
Implementation Requires modification of the linear solver PARDISO (Schenk) -> PARDISO-SC Pivot perturbations during factorization needed to maintain numerical stability Errors due to perturbations are absorbed by iterative refinement This would be extremely expensive in our case (many right-hand sides) We let errors propagate in the “global” Schur complement C (Step 2) Factorize the perturbed C (denoted by ) (Step 3) After Step 1, 2 and 3, we have the factorization of an approximation matrix 13
14
Pivot error absorption by preconditioned BiCGStab 14 Still we have to solve with “Absorb errors” by solving Kz=r using preconditioned BiCGStab –Numerical experiments showed it is more robust than iterative refinement. Preconditioner is Each BiCGStab iteration requires –2 mat-vecs: Kz –2 applications of the preconditioner: One application of the preconditioner resumes to performing “solve” steps 4 and 5 for
15
Summary of the new approach 15
16
Test architecture “Intrepid” Blue Gene/P supercomputer –40,960 nodes –Custom interconnect –Each node has quad-core 850 Mhz PowerPC processor, 2 GB RAM DOE INCITE Award 2012-2013 – 24 million core hours 16
17
Numerical experiments 4h (UC4), 12h(UC12), 24h(UC24) horizon problems 1 scenario per node (4 cores per scenario) Large-scale: 12h horizon, up to 32k scenarios and 128k cores (k=1,024) –16k scenarios – 2.08 billion variables, 1.81 billion constraints, KKT system size = 3.89 billion LAPACK+SMP ESSL BLAS for first-stage linear systems PARDISO-SC for second-stage linear systems 17
18
Compute SC Times 18
19
Time per IPM iteration UC12, 32k scenarios, 32k nodes (128k cores) BiCGStab iteration count ranges from 0 to 1.5 Cost of absorbing factorization perturbation errors is between 10 and 30% of total iteration cost 19
20
Solve to completion – UC12 Nodes/scensWall time (sec)IPM IterationsTime per IPM iteration (sec) 4096 3548.510333.57 8192 3883.711234.67 163844208.812334.80 327684781.713335.95 20 Before: 4 hours 10 minutes wall time to solve UC4 problem with 8k scenarios on 8k nodes Now: UC12
21
Weak scaling 21
22
Strong scaling 22
23
Conclusions and Future Considerations Multicore-friendly reformulation of sparse linear algebra computations lead to one order of magnitude faster execution times. Fast factorization-based computation of SC Robust and cheap pivot errors absorption via Krylov iterative methods Parallel efficiency of PIPS remains good. Performance evaluation on today’s supercomputers –IBM BG/Q –Cray XK7, XC30 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.