Download presentation
Presentation is loading. Please wait.
1
Portable MPI and Related Parallel Development Tools Rusty Lusk Mathematics and Computer Science Division Argonne National Laboratory (The rest of our group: Bill Gropp, Rob Ross, David Ashton, Brian Toonen, Anthony Chan)
2
2 Outline MPI –What is it? –Where did it come from? –One implementation –Why has it “succeeded”? Case study: an MPI application –Portability –Libraries –Tools Future developments in parallel programming –MPI development –Languages –Speculative approaches
3
3 What is MPI? A message-passing library specification –extended message-passing model –not a language or compiler specification –not a specific implementation or product For parallel computers, clusters, and heterogeneous networks Full-featured Designed to provide access to advanced parallel hardware for –end users –library writers –tool developers
4
4 Where Did MPI Come From? Early vendor systems (NX, EUI, CMMD) were not portable. Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts. –Did not address the full spectrum of message-passing issues –Lacked vendor support –Were not implemented at the most efficient level The MPI Forum organized in 1992 with broad participation by vendors, library writers, and end users. MPI Standard (1.0) released June, 1994; many implementation efforts. MPI-2 Standard (1.2 and 2.0) released July, 1997.
5
5 Informal Status Assessment All MPP vendors now have MPI-1. (1.0, 1.1, or 1.2) Public implementations (MPICH, LAM, CHIMP) support heterogeneous workstation networks. MPI-2 implementations are being undertaken now by all vendors. MPI-2 is harder to implement than MPI-1 was. MPI-2 implementations will appear piecemeal, with I/O first.
6
6 MPI Sources The Standard itself: –at http://www.mpi-forum.org –All MPI official releases, in both postscript and HTML Books on MPI and MPI-2: –Using MPI: Portable Parallel Programming with the Message-Passing Interface (2 nd edition), by Gropp, Lusk, and Skjellum, MIT Press, 1999. –Using MPI-2: Extending the Message-Passing Interface, by Gropp, Lusk, and Thakur, MIT Press, 1999 –MPI: The Complete Reference, volumes 1 and 2, MIT Press, 1999. Other information on Web: –at http://www.mcs.anl.gov/mpi –pointers to lots of stuff, including other talks and tutorials, a FAQ, other MPI pages
7
7 The MPI Standard Documentation
8
8 Tutorial Material on MPI, MPI-2
9
9 The MPICH Implementation of MPI As a research project: exploring tradeoffs between performance and portability; conducting research in implementation issues. As a software project: providing a free MPI implementation on most machines; enabling vendors and others to build complete MPI implementation on their own communication services. MPICH 1.2.2 just released, with complete MPI-1, parts of MPI-2 (I/O and C++), port to Windows2000. Available at http://www.mcs.anl.gov/mpi/mpich
10
10 Lessons From MPI Why Has It Succeeded? The MPI Process Portability Performance Simplicity Modularity Composability Completeness
11
11 The MPI Process Started with open invitation to all those interested in standardizing message-passing model Participation from –Parallel computing vendors –Computer Scientists –Application scientists Open process –All invited, but hard work required –All deliberations available at all times Reference implementation developed during design process –Helped debug design –Immediately available when design completed
12
12 Portability Most important property of a programming model for high-performance computing –Application lifetimes 5 to 20 years –Hardware lifetimes much shorter –(not to mention corporate lifetimes!) Need not lead to lowest common denominator approach Example: MPI semantics allow direct copy of data from user space send buffer to user space receive buffer –Might be implemented by hardware data mover –Might be implemented by netwrk hardware –Might be implemented by socket The hard part: portability with performance
13
13 Performance MPI can help manage the crucial memory hierarchy –Local vs. remote memory is explicit –A received message is likely to be in cache MPI provides collective operations for both communication and computation that hide complexity or non-portability of scalable algorithms from the programmer. Can interoperate with optimising compilers Promotes use of high-performance libraries Doesn’t provide performance portability –This problem is still too hard, even for the best compilers –E.g. BLAS
14
14 Simplicity Simplicity is in the eye of the beholder –MPI-1 has about 125 functions Too big! Too small! –MPI-2 has about 150 more –Even this is not very many by comparison Few applications use all of MPI –But few MPI functions go unused One can write serious MPI programs with as few as six functions –Other programs with a different six… Economy of concepts –Communicators encapsulate both process groups and contexts –Datatypes both enable heterogeneous communication and allow non-contiguous messages buffers Symmetry helps make MPI easy to understand.
15
15 Modularity Modern applications often combine multiple parallel components. MPI supports component-oriented software through its use of communicators Support of libraries means applications may contain no MPI calls at all.
16
16 Composability MPI works with other tools –Compilers Since it is a library –Debuggers Debugging interface used by MPICH, TotalView, others –Profiling tools The MPI “profiling interface” is part of standard MPI-2 provides precise interaction with multi- threaded programs –MPI_THREAD_SINGLE –MPI_THREAD_FUNNELLED (OpenMP loops) –MPI_THREAD_SERIAL (Open MP single) –MPI_THREAD_MULTIPLE The interface provides for both portability and performance
17
17 Completeness MPI provides a complete programming model. Any parallel algorithm can be expressed. Collective operations operate on subsets of processes. Easy things are not always easy, but Hard things are possible.
18
18 The Center for the Study of Astrophysical Thermonuclear Flashes To simulate matter accumulation on the surface of compact stars, nuclear ignition of the accreted (and possibly underlying stellar) material, and the subsequent evolution of the star’s interior, surface, and exterior X-ray bursts (on neutron star surfaces) Novae (on white dwarf surfaces) Type Ia supernovae (in white dwarf interiors)
19
19 FLASH Scientific Results ' Wide range of compressibility ' Wide range of length and time scales ' Many interacting physical processes ' Only indirect validation possible ' Rapidly evolving computing environment ' Many people in collaboration Cellular detonations Compressible turbulence Helium burning on neutron stars Rayleigh-Taylor instability Richtmyer-Meshkov instability Flame-vortex interactions Laser-driven shock instabilities Nova outbursts on white dwarfs Gordon Bell prize at SC2000
20
20 The FLASH Code: MPI in Action Solves complex systems of equations for hydrodynamics and nuclear burning Written primarily in Fortran-90 Uses Paramesh library for adaptive mesh refinement; Paramesh is implemented with MPI I/O (for checkpointing, visualization, other purposes) done with HDF-5 library, which is implemented with MPI-IO Debugged with TotalView, using standard debugger interface Tuned with Jumpshot and Vampir, using MPI profiling interface Gordon Bell prize winner in 2000 Portable to all parallel computing environments (since MPI)
21
21 FLASH Scaling Runs
22
22 X-Ray Burst on the Surface of a Neutron Star
23
23 Showing the AMR Grid
24
24 MPI Performance Visualization with Jumpshot For detailed analysis of parallel program behavior, timestamped events are collected into a log file during the run. A separate display program (Jumpshot) aids the user in conducting a post mortem analysis of program behavior. Log files can become large, making it impossible to inspect the entire program at once. The FLASH Project motivated an indexed file format (SLOG) that uses a preview to select a time of interest and quickly display an interval. Logfile Jumpshot Processes Display
25
25 Removing Barriers From Paramesh
26
26 Using Jumpshot MPI functions and messages automatically logged User-defined states Nested states Zooming and scrolling Spotting opportunities for optimization
27
27 Future Developments in Parallel Programming: MPI and Beyond MPI not perfect Any widely-used replacement will have to share the properties that made MPI a success. Some directions (in decreasing order of speculativeness) –Improvements to MPI implementations –Improvements to the MPI definition –Continued evolution of libraries –Research and development for parallel languages –Further out: radically different programming models for radically different architectures.
28
28 MPI Implementations Implementations beget implementation research –Datatypes, I/O, memory motion elimination On most platforms, better collective –Most MPI implementations build collective on point-to-point, too high-level –Need stream-oriented methods that understand MPI datatypes Optimize for new hardware –In progress for VIA, Infiniband –Need more emphasis on collective operations –Off-loading message processing onto NIC Scaling beyond 10,000 processes Parallel I/O –Clusters –Remote Fault-tolerance –Intercommunicators provide an approach Working with multithreading approaches
29
29 Improvements to MPI Itself Better Remote-memory-access interface –Simpler for some simple operations –Atomic fetch-and-increment Some minor fixup already in progress –MPI 2.1 Building on experience with MPI-2 Interactions with compilers
30
30 Libraries and Languages General Libraries –Global Arrays –PETSc –ScaLAPACK Application-specific libraries Most built on MPI, at least for portable version.
31
31 More Speculative Approaches HTMT for Petaflops Blue Gene PIMS MTA All will need a programming model that explicitly manages a deep memory hierarchy. Exotic + small benefit = dead
32
32 Summary MPI is a successful example of a community defining, implementing, and adopting a standard programming methodology. It happened because of the open MPI process, the MPI design itself, and early implementation. MPI research continues to refine implementations on modern platforms, and this is the “main road” ahead. Tools that work with MPI programs are thus a good investment. MPI provides portability and performance for complex applications on a variety of architectures.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.