Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.

Similar presentations


Presentation on theme: "Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center."— Presentation transcript:

1 Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center Rensselaer Polytechnic Institute SIAM CSE 2013 MS105 Frameworks, Algorithms and Scalable Technologies for Mathematics on Next-generation Computers February 26, 2013

2 Outline 2 Simulation Based Engineering Workflow Parallel Unstructured Mesh Infrastructure (PUMI) ParMA Predictive Load Balancing Estimating change in mesh size Part merging Part splitting Multi-criteria Partition Improvement Closing Remarks

3 Simulation Based Engineering Workflow 3 Problem Definition Mesh Generation PDE Analysis Adaptation Post Processing

4 Partition model Geometric model Parallel Unstructured Mesh Infrastructure (PUMI) 4 Fields distribution of solution over mesh Mesh analysis domain 0-3D topological entities and adjacencies distribution of mesh across computing resources Parallel Control communication utilities

5 PUMI Mesh Representation 5 Mesh entities: vertex (0D), edge (1D), face (2D), or region (3D) Adjacencies: How the mesh entities connect to each other Complete representation: store sufficient entities and adjacencies to get any adjacency in O(1) time Partition Model: Topological partition information Relationship to mesh entities – classification Regions Faces Vertices Edges

6 Partitioning using Mesh Adjacencies (ParMA) 6 Mesh and partition model adjacencies represent application data more completely then standard graph-partitioning models. All mesh entities can be considered, while graph-partitioning models use only a subset of mesh adjacency information. Any adjacency can be obtained in O(1) time (assuming use of a complete mesh adjacency structure). Advantages Avoid graph construction (assuming you have complete representation) Directly account for multiple entity types – important for the solve process - typically the most computationally expensive step Easy to use with diffusive procedures Disadvantage Lack of well developed algorithms for more global parallel partitioning operations directly from mesh adjacencies Regions Faces Vertices Edges

7 Parallel unstructured mesh adaptation typically generates parts with 400% or more imbalance on non-trivial geometries due to local coarsening & refinement. Refining then repartitioning can exceed available memory in some processes, even if the system’s total memory is sufficient. Goal: Fast and scalable redistribution procedure to prevent memory exhaustion 120 parts with ~30% of the average load ~20 parts with > 200% imbalance, peak imbalance is ~430% Predictive Load Balancing for Mesh Adaptation 7 Histogram of element imbalance in 1024 part adapted mesh on Onera M6 wing if no load balancing is applied prior to adaptation.

8 Simulation Based Engineering Workflow 8 PDE Analysis Redistribute Adapt Mesh Application Specific Load Balancing

9 Graph-based procedure Estimate change in mesh size - set element weights quantifying change Run Zoltan graph/hyper-graph based partitioning on weighted mesh Adapt the mesh Run Zoltan graph/hyper-graph based partitioning and/or a partition improvement procedure to attain optimal partition Observations Parallel construction and balancing of graph with small cuts takes reasonable time Re-partitioning s.t. adaptation does not exceed per-process memory limits should be cheaper then a full re-partition Graph-Based Predictive Load Balancing 9

10 ParMA: Predictive Load Balancing (PLB) Mesh adjacency-based procedure Estimate change in mesh size Merge parts that will be coarsened to create some empty parts. Split parts with substantial refinement into the empty parts to remove imbalance spikes of refined mesh. Refine/coarsen the mesh. Apply ParMA’s Multi-criteria partition improvement 10 120 parts with ~30% of the average load ~20 parts with > 200% imbalance, peak imbalance is ~430% Histogram of element imbalance in 1024 part adapted mesh on Onera M6 wing if no load balancing is applied prior to adaptation.

11 ParMA:PLB – Estimating Mesh Size 11

12 ParMA:PLB – Part Merging 12 Add example figures!!

13 ParMA:PLB – Part Splitting 13 Add example figures!!

14 ParMA:PLB – Status Initial Testing and Ongoing Developments Predictive load balancing tests run on 1024 parts on RPI BlueGene Q (64 nodes, 16 ranks/node) Analytic mesh adaptation size field - 17e6 regions to 214e6 regions Prevents exhaustion of process memory – reduces peak predicted imbalance from 200% to 150% 0.4 seconds to make merging and splitting decisions Optimized migration procedure for part merging – simpler logic then current ‘generalized’ migration procedure 14

15 ParMA: Multi-Criteria Partition Improvement 15 Parallel simulation requires that the mesh be distributed with equal work-load and minimum inter-part communications Observations on graph-based dynamic balancing Parallel construction and balancing of graph with small cuts takes reasonable time Graph/hyper-graph partitions are powerful for unstructured meshes, however they use one order (as in 0,1,2,3) of mesh entity as the graph nodes, hence the balance of other mesh entities may not be optimal Accounting for multiple criteria and or multiple interactions is not obvious Hypergraphs allows edges to connect more that two vertices – has been used to help account for migration communication costs Schloegel and Karypis (2002) discuss an effective optimization method for three, or fewer, constraints

16 ParMA: Multi-Criteria Partition Improvement 16 Improve scaling of applications by reducing imbalances through exchange of mesh regions between neighboring parts Current algorithm focused on improved scalability of the solve by accounting for balance of multiple entity types Imbalance is limited to a small number of heavily loaded parts, referred to as spikes, which limit the scalability of applications Application defined priority list of entity types such that imbalance of high priority types is not increased when balancing lower priority types

17 ParMA: Multi-Criteria Partition Improvement 17 Input: Priority list of mesh entity types to be balanced (region, face, edge, vertex) Partitioned mesh with complete representation and communication, computation and migration weights for each entity Algorithm: From high to low priority if separated by ‘>’ (different groups) From low to high dimension entity types if separated by ‘=’ (same group)  Compute migration schedule (Collective)  Select regions for migration (Embarrassingly Parallel)  Migrate selected regions (Collective) Ex) “Rgn>Face=Edge>Vtx” is the user’s input Step 1: improve balance for mesh regions Step 2.1: improve balance for mesh edges Step 2.2: improve balance for mesh faces Step 3: improve balance for mesh vertices Mesh element selection

18 ParMA: Multi-Criteria Partition Improvement (Zhou) 133M region mesh on 16k parts Entity imbalances Run time (Jaguar: Cray XT5) 18 PHASTA CFD on 288k cores of JuGene BG/P, 1B rgn mesh (Rasquin) Improves strong scaling from 0.88 to 0.95 with ParMA Vtx>Rgn Application inputs

19 ParMA: Multi-Criteria Partition Improvement – Status Testing weight based imbalance for predictive load balancing Partition improvement for PHASTA meshes with billions of elements and more then half a million parts Apply ParMA merging and splitting techniques to reduce vertex imbalance spikes – careful consideration of part surface area changes Effect of disconnected parts – sacrifice communication for improved computation balance Diffusion across partition model edges Account for load and communication balance changes when entities are classified on partition model edges and selected for migration 19

20 Closing Remarks Successfully leveraging mesh adjacency and partition model information Making quick decisions Demonstrated partition improvement Future ParMA Developments Supporting Two-level partitioning Finding efficient combination of procedures to rebalance to compute nodes and then within each node See Dan Ibanez’s MS120 talk at 10 am in Hancock (Lobby Level) for details on the MPI + Pthreads approach for PUMI More Information Online PUMI: http://www.scorec.rpi.edu/FMDB/ ParMA: http://redmine.scorec.rpi.edu/projects/parma SCOREC: http://www.scorec.rpi.edu FASTMath: http://www.fastmath-scidac.org 20

21 Thank You 21 For More Information Contact: smithc11@rpi.edu


Download ppt "Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center."

Similar presentations


Ads by Google