AstroBEAR Parallelization Options
Areas With Room For Improvement Ghost Zone Resolution MPI Load-Balancing Re-Gridding Algorithm Upgrading MPI Library
Ghost Zone Resolution Can exceed 30% of total program execution time. Affects fixed grid as well as AMR For runs using >2 processors, 98-99% of ghost zone execution time is MPI processing.
Ghost Zone Resolution Options Duplex Transmission Old version swaps ghost zone data serially between two processors. Duplex transmission would have the two processors handle sending, receiving and copying concurrently. Pros: Reduces the amount of duplicated overhead. Makes more efficient use of worker processors. Cons: Little reduction in the amount of MPI overhead. Still has a high computation cost relative to the number of nodes. Status: In progress
Alternate option: Ghost Zone broadcast Use the MPI Broadcast routines to have a grid send all its ghost zones to its neighbors at once, who then process that data and broadcast their own ghost zones when it is their turn. Pros: Eliminates need for pairwise iteration over level (i.e., transfer would only be done once per grid). Cons: Potential congestion if all a grid’s neighbors are on the same processor. No guarantee that it’s an improvement over pairwise duplex transmission. Status: Speculative
Load Balancing Does it need to be done as often? Ramses code only rebalances every ten frames. Re-gridding happens locally as usual, but it is assumed that the AMR structure does not change enough between two iterations to warrant a load-rebalance. Pros: Significant reduction in MPI overhead (BalanceLoads() gets called a lot). Non-MPI overhead will likely be reduced as well, as the current load-balancing scheme recalculates the load across the entire Forest. Cons: “patch-based AMR” vs. “tree-based AMR”; can it be adapted to AstroBEAR? Requires implementation of some Hilbert-space algorithm—how complex/computationally intensive? Status: Speculative
Re-Gridding Parallelization Parallelization of re-gridding is handled using MPI and OpenMP Problem: MPI-1 limits thread usage Only one thread for the worker processors and two for the master processor. Only one thread on each processor is MPI-capable. Performance bottlenecks happen if one processor gets tied up.
MPI with OpenMP, multi-thread MPI with OpenMP, single thread Advantage of Multiple Threads
Unfortunately... LAM MPI is not thread-safe. You can write multi-threaded applications using LAM MPI, but it is explicitly not thread-safe and so we would be responsible for maintaining MPI exclusion. In a collaborative development environment like AstroBEAR, this is a bad idea. LAM is making noise about supporting this eventually, but they're not there yet. Alternatives: Improve efficiency of pairwise message passing. Offload more re-gridding computation to worker processors. Status: We're looking at it.