Download presentation
Presentation is loading. Please wait.
Published byBethany Gaines Modified over 9 years ago
1
Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov Joint work with Mihai Anitescu, Miles Lubin and Victor Zavala
2
Motivation Sources of uncertainty in complex energy systems –Weather –Consumer Demand –Market prices Applications @Argonne – Anitescu, Constantinescu, Zavala –Stochastic Unit Commitment with Wind Power Generation –Energy management of Co-generation –Economic Optimization of a Building Energy System 2
3
Stochastic Unit Commitment with Wind Power Wind Forecast – WRF(Weather Research and Forecasting) Model –Real-time grid-nested 24h simulation –30 samples require 1h on 500 CPUs (Jazz@Argonne) 3 Slide courtesy of V. Zavala & E. Constantinescu Wind farm Thermal generator Thermal Units Schedule? Minimize Cost Satisfy Demand/Adopt wind power Have a Reserve Technological constraints
4
Optimization under Uncertainty Two-stage stochastic programming with recourse (“here-and-now”) 4 subj. to. continuous discrete Sampling Statistical Inference M batches Sample average approximation (SAA) subj. to.
5
Solving the SAA problem – PIPS solver Interior-point methods (IPMs) –Polynomial iteration complexity: (in theory) –IPMs perform better in practice (infeasible primal-dual path-following) –No more than 30-50 iterations have been observed for n less than 10 million –We can confirm that this is still true for n being hundred times larger –Two linear systems solved at each iteration –Direct solvers needs to be used because IPMs linear systems are ill-conditioned and needs to be solved accurately –We solve the SAA problems with a standard IPM (Mehrotra’s predictor-corrector) and specialized linear algebra –PIPS solver Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All" 5
6
Linear Algebra of Primal-Dual Interior-Point Methods 6 subj. to. Min Convex quadratic problem IPM Linear System Two-stage SP arrow-shaped linear system (modulo a permutation) Multi-stage SP nested S is the number of scenarios
7
7 The Direct Schur Complement Method (DSC) Uses the arrow shape of H 1.Implicit factorization 2. Solving Hz=r 2.1. Backward substitution 2.2. Diagonal Solve 2.3. Forward substitution
8
Parallelizing DSC – 1. Factorization phase 8 2. Triangular solves Process 1 Process 2 Process p Process 1 Factorization of the 1 st stage Schur complement matrix = BOTTLENECK Sparse linear algebra MA57 Dense linear algebra LAPACK
9
Parallelizing DSC – 2. Triangular solves 9 Process 1 Process 2 Process p Process 1 Process 2 Process p 1 st stage backsolve = BOTTLENECK 1.Factorization Sparse linear algebra Dense linear algebra
10
Implementation of DSC Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All" 10 FactBacksolves FactBacksolves FactBacksolves CommComm Dense factbacksolve MPI_Allreduce CommComm forw.subst.Dense solve Dense factbacksolve Dense factbacksolve forw.subst.Dense solve forw.subst.Dense solve FactorizationTriangular solves Proc 1 Proc 2 Proc p Computations are replicated on each process.
11
Scalability of DSC 11 Unit commitment 76.7% efficiency but not always the case Large number of 1 st stage variables: 38.6% efficiency on Fusion @ Argonne
12
BOTTLENECK SOLUTION 1: STOCHASTIC PRECONDITIONER 12
13
The Stochastic Preconditioner The exact structure of C is IID subset of n scenarios: The stochastic preconditioner (P. & Anitescu, in COAP 2011) For C use the constraint preconditioner (Keller et. al., 2000) 13
14
Implementation of PSC 14 CommComm MPI_Reduce (to proc p+1) F.B. FactBacksolves Dense fact of prcnd. CommComm MPI_Allreduce backsolve Backsolve CommComm MPI_Reduce (to proc 1) Krylov solve Prcnd tri. slv. comm forw.subst. CommComm MPI_Bcast Proc 1 Proc 2 Proc p Proc p+1 Proc 1 Proc 2 Proc p Proc p+1 REMOVES the factorization bottleneck Slightly larger solve bottleneck F.B. F.B. F.B. FactBacksolves FactBacksolves idle
15
The “Ugly” Unit Commitment Problem 15 DSC on P processes vs PSC on P+1 process Optimal use of PSC – linear scaling Factorization of the preconditioner can not be hidden anymore. 120 scenarios
16
Quality of the Stochastic Preconditioner “Exponentially” better preconditioning (P. & Anitescu, 2011) Proof: Hoeffding inequality Assumptions on the problem’s random data 1.Boundedness 2.Uniform full rank of and 16 not restrictive
17
Quality of the Constraint Preconditioner has an eigenvalue 1 with order of multiplicity. The rest of the eigenvalues satisfy Proof: based on Bergamaschi et. al., 2004. 17
18
The Krylov Methods Used for BiCGStab using constraint preconditioner M Preconditioned Projected CG (PPCG) (Gould et. al., 2001) –Preconditioned projection onto the –Does not compute the basis for Instead, – 18
19
Performance of the preconditioner Eigenvalues clustering & Krylov iterations Affected by the well-known ill-conditioning of IPMs. 19
20
SOLUTION 2: PARALELLIZATION OF STAGE 1 LINEAR ALGEBRA 20
21
Parallelizing the 1 st stage linear algebra We distribute the 1 st stage Schur complement system. C is treated as dense. Alternative to PSC for problems with large number of 1 st stage variables. Removes the memory bottleneck of PSC and DSC. We investigated ScaLapack, Elemental (successor of PLAPACK) –None have a solver for symmetric indefinite matrices (Bunch-Kaufman); –LU or Cholesky only. –So we had to think of modifying either. 21 dense symm. pos. def., sparse full rank.
22
Cholesky-based -like factorization Can be viewed as an “implicit” normal equations approach. In-place implementation inside Elemental: no extra memory needed. Idea: modify the Cholesky factorization, by changing the sign after processing p columns. It is much easier to do in Elemental, since this distributes elements, not blocks. Twice as fast as LU Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block. 22
23
Distributing the 1 st stage Schur complement matrix All processors contribute to all of the elements of the (1,1) dense block A large amount of inter-process communication occurs. Each term is too big to fit in a node’s memory. Possibly more costly than the factorization itself. Solution: collective MPI_Reduce_scatter calls Reduce (sum) terms, then partition and send to destination (scatter) Need to reorder (pack) elements to match matrix distribution Columns of the Schur complement matrix are distributed as they are calculated 23
24
DSC with distributed first-stage Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All" 24 FactBacksolves CommComm backsolve MPI_Reduce_scatter CommComm MPI_Allreduce forw.subst. backsolve forw.subst. Proc 1 Proc 2 Proc p ELEMENTALELEMENTAL ELEMENTALELEMENTAL FactBacksolves FactBacksolves Schur complement matrix is computed and reduced block-wise. (B blocks of columns ) For each b=1:B
25
Reduce operations 25 Streamlined copying procedure - Lubin and Petra (2010) Loop over continuous memory and copy elements in send buffer Avoids divisions and modulus ops needed to compute the positions “Symmetric” reduce for Only lower triangle is reduced Fixed buffer size A variable number of columns reduced. Effectively halves the communication (both data & # of MPI calls).
26
Large-scale performance 26 First-stage linear algebra: ScaLapack (LU), Elemental(LU), and Strong scaling of PIPS with and 90.1% from 64 to 1024 cores 75.4% from 64 to 2048 cores > 4,000 scenarios On Fusion Lubin, P., Anitescu, in OMS 2011 SAA problem: 1 st stage variables: 82,000 Total #: 189 million Thermal units: 1,000 Wind farms: 1,200
27
Towards real-life models – Economic dispatch with transmission constraints Current status: ISOs (Independent system operator) use –deterministic wind profiles, market prices and demand –network (transmission) constraints –Outer 1-h timestep 24 horizon simulation –Inner 5-min timestep 1h horizon corrections Stochastic ED with transmission constraints (V. Zavala et. al. 2010) –Stochastic wind profiles & transmission constraints –Deterministic market prices and demand –24 horizon with 1h timestep –Kirchoff’s laws are part of the constraints –The problem is huge: KKT systems are 1.8 Bil x 1.8 Bil 27 Generator Load node (bus)
28
Solving ED with transmission constraints on Intrepid BG/P 32k wind scenarios (k=1024) 32k nodes (131,072 cores) on Intrepid BG/P Hybrid programming model: SMP inside MPI –Sparse 2 nd -stage linear algebra: WSMP (IBM) –Dense 1 st -stage linear algebra: Elemental with SMP BLAS + OpenMP for packing/unpacking buffer. For a 4h Horizon problem very good strong scaling Lubin, P., Anitescu, Zavala – in proceedings of SC 11. 28
29
Stochastic programming – a scalable computation pattern Scenario parallelization in a hybrid programming model MPI+SMP –DSC, PSC (1 st stage < 10,000 variables) Hybrid MPI/SMP running on Blue Gene/P –131k cores (96% strong scaling) for Illinois ED problem with grid constraints. 2B variables, maybe largest ever solved? Close to real-time solutions (24 hr horizon in 1 hr wallclock) –Further development needed, since users aim for More uncertainty, more detail (x 10) Faster Dynamics Shorter Decision Window (x 10) Longer Horizons (California == 72 hours) (x 3) Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All" 29
30
Thank you for your attention! Questions? 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.