CASC This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. UCRL-PRES-XXXXXX. Introducing Cooperative Parallelism John May, David Jefferson Nathan Barton, Rich Becker, Jarek Knap Gary Kumfert, James Leek, John Tannahill Lawrence Livermore National Laboratory presented to the CCA Forum 25 Jan 2007
Outline l Challenges for massively parallel programming l Cooperative parallel programming model l Applications for cooperative parallelism l Cooperative parallelism and Babel l Ongoing work
Massive parallelism strains SPMD New techniques needed to fill the gap l Increasingly difficult to make all processors work in lock-step —Lack of inherent parallelism —Load balance l New techniques need richer programming model than pure SPMD with MPI —Adaptive sampling —Multi-model simulation (e.g., components) l Fault tolerance requires better process management —Need smaller unit of granularity for failure recovery, checkpoint/restart
Parallel symponent using MPI internally Ad hoc symponent creation and communication Runtime system Introducing Cooperative Parallelism l Computational job consists of multiple interacting “symponents” —Large parallel (MPI) jobs or single processes —Created and destroyed dynamically —Appear as objects to each other —Communicate through remote method invocation (RMI) l Apps can add symponents incrementally l Designed to complement MPI, not replace it!
Cooperative parallelism features l Three synchronization styles for RMI —Blocking (caller waits for return) —Nonblocking (caller checks later for result) —One-way (caller dispatches request and has no further interaction) l Target of RMI can be a single process or a parallel job, with parameters distributed to all tasks l Closely integrated with Babel framework —Symponents written in C, C++, Fortran, F90, Java, and Python interact seamlessly —Developer writes interface description files to specify RMI interfaces —Exceptions propagated from remote methods —Object-oriented structure lets symponents inherit capabilities and interfaces from other symponents
Benefits of cooperative parallelism l Easy subletting of work improves load balance l Simple model for expressing task-based parallelism (rather than data parallelism) l Nodes can be suballocated dynamically l Dynamic management of symponent jobs supports fault tolerance —Caller notified of failing symponents; can re- launch l Existing stand-alone applications can be modified and combined as discrete modules
But what about MPI? l Cooperative parallelism —Dynamic management of symponents —Components are opaque to each other —Communication is connectionless, ad-hoc, interrupting and point-to-point l MPI and MPI-2 —Mostly-static process management (MPI-2 can spawn processes but not explicitly terminate them) —Tasks are typically highly-coordinated —Communication is connection-oriented and either point-to-point or collective; MPI-2 supports remote memory access
Well-balanced work Server proxy Servers for unbalanced work Applications: Load balancing l Divide work into well-balanced and unbalanced parts l Run balanced work as a regular MPI job l Set up pool of servers to handle unbalanced work —Server proxy assigns work to available servers l Tasks with extra work can sublet it in parallel so they can catch up to less-busy tasks
Coarse scale model Server proxy Fine scale servers Unknown function Interpolated values Newly computed value Previously computed values Applications: Adaptive sampling l Multiscale model, similar to AMR —BUT: Can use different models at different scales l Fine-scale computations requested from remote servers to improve load balance l Initial results cached in a database l Later computations check cached results and interpolate if accuracy is acceptable
Master Completed simulation Completed simulation Active simulation Active simulation Active simulation Applications: Parameter studies l Master process launches multiple parallel components to complete a simulation, each with different parameters l Existing simulation codes can be wrapped to form components l Master can direct study based on accumulated results, launch new components as others complete
Applications: Federated simulations l Components modeling separate physical entities interact l Potential example: Ocean, atmosphere, sea ice —Each modeled in a separate job —Interactions communicated through RMI (N-by-M parallel RMI is future work)
Cooperative parallelism and Babel l Babel gives Co-op —Language interoperability, SIDL, object-oriented model —RMI, including exception handling l Co-op adds —Symponent launch, monitoring, and termination —Motivation and resources for extending RMI —Patterns for developing task-parallel applications l Babel and Cooperative Parallelism teams are colocated and share some staff
Status: Runtime software l Prototype runtime software working on Xeon, Opteron and Itanium systems l Competed 1360-CPU demonstration in September l Beginning to do similar-scale runs on new platforms l Planning to port to IBM systems this year l Ongoing work to enhance software robustness and documentation
Status: Applications l Ongoing experiments with material modeling application —Demonstrated X speedups on >1000 processors using adaptive sampling l Investigating use in parameter study application l In discussions with several other apps groups at LLNL l Also looking for possible collaborations with groups outside LLNL l Contacts: John May l David Jefferson l