August 12, 2004 UCRL-PRES
2 Aug Outline l Motivation l About the Applications l Statistics Gathered l Inferences l Future Work
3 Aug Motivation l Info for App developers –Information on the expense of basic MPI functions (recode?) –Set expectations l Many tradeoffs available in MPI design –Memory allocation decisions –Protocol cutoff point decisions –Where is additional code complexity worth it? l Information on MPI Usage is scarce l New tools (e.g. mpiP) make profiling reasonable –Easy to incorporate (no source code changes) –Easy to interpret –Unobtrusive observation (little performance impact)
4 Aug About the applications… Amtran Ares Ardra Geodyne IRS: Mdcask: Linpack/HPL: Miranda: Smg: Spheral Sweep3d Umt2k: Amtran: discrete coordinate neutron transport Ares: instability 3-D simulation in massive star supernova envelopes Ardra: neutron transport/radiation diffusion code exploring new numerical algorithms and methods for the solution of the Boltzmann Transport Equation (e.g. nuclear imaging). Geodyne: eulerian adaptive mesh refinement (e.g. comet-earth impacts) IRS: solves the radiation transport equation by the flux-limiting diffusion approximation using an implicit matrix solution Mdcask: molecular dynamics codes for study in radiation damage in metals Linpack/HPL: solves a random dense linear system. Miranda: hydrodynamics code simulating instability growth Smg: a parallel semicoarsening multigrid solver for the linear systems arising from finite difference, volume, or finite element discretizations Spheral: provides a steerable parallel environment for performing coupled hydrodynamical & gravitational numerical simulations Sweep3d: solves a 1-group neuron transport problem Umt2k: photon transport code for unstructured meshes
5 Aug Percent of time to MPI Overall for sampled: 60% MPI 40% remaining app
6 Aug Top MPI Point-to-Point Calls
7 Aug Top MPI Collective Calls
8 Aug Comparing Collective and Point-to-Point
9 Aug Average Number of Calls for Most Common MPI Functions “Large” Runs
10 Aug Communication Patterns most dominant msgsize
11 Aug Communication Patterns (continued)
12 Aug Frequency of callsites by MPI functions
13 Aug Scalability
14 Aug Observations Summary l General –People seem to scale code to ~60% MPI/communication –Isend/Irecv/Wait many times more prevalent than Sendrecv and blocking send/recv –Time spent in collectives predominantly divided among barrier, allreduce, broadcast, gather, and alltoall –Most common msgsize is typically between 1K and 1MB l Surprises –Waitany most prevalent call –Almost all pt2pt messages are the same size within a run –Often, message size decreases with large runs –Some codes driven by alltoall performance
15 Aug Future Work & Concluding Remarks l Further understanding of apps needed –Results for other test configurations –When can apps make better use of collectives –Mpi-io usage info needed –Classified applications l Acknowledgements mpiP is due to Jeffrey Vetter and Chris Chambreau This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.