Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

Similar presentations


Presentation on theme: "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear."— Presentation transcript:

1 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 0 Why not MPI-only Applications? A Case for Investigating Hybrid Parallelism H. Carter Edwards Sandia National Laboratory Sandia CSRI Workshop on Next generation scalable applications: When MPI-only is not enough June 3-5, 2008

2 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 1 Expected HPC Environment Distributed memory parallelism will always be with us –Network of processing nodes –Well understood programming model, e.g. MPI Processing nodes will have multiple cores –Cores-per-node = cores-per-socket * sockets-per-node –Apps must scale with #cores = #nodes * cores-per-node My concerns –Cores contending for memory resources –Application’s per-socket or per-core memory overhead –Application’s non-deterministic parallel behavior (not discussed)

3 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 2 Expected HPC Environment My Concerns for MPI-Only Unmanaged contention for node’s memory resources –Cores contend for access to memory hierarchy –Cores contend for socket’s cache memory –MPI-only has no provision for coordinating access –How much will this limit on-node scalability? At the mercy of the memory subsystem performance –Can intentional coordinated access improve scalability? Could lead to increased complexity in the application Per-node memory overhead –Will OS require each core to have its own executable image? –Intra-node distributed memory parallelism requires shared / ghosted data and message passing buffers

4 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 3 Conclusion: We Need to Investigate Hybrid Parallel Programming Model(s) Two level programming model; orthogonal parallelism –Outer inter-socket: distributed memory / MPI parallelism –Inner intra-socket: shared memory / thread parallelism –Impact to scalability and performance? –Impact to complexity and robustness? My investigation –Application programmer interface Simple and minimalistic Flexible for non-uniform parallel work –Performance parameters thread “flight control” memory access patterns

5 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 4 Simple Programming Model for Inner Level Parallelism Task pool / work queue strategy (old paradigm) –Sequential operations performed by a single thread –Inner level parallel operations performed by all threads –Inner level parallel operations have a local and temporary scope –Conceptually compatible with OpenMP and TBB model root thread

6 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 5 Scaling of double-double ‘dot(x,y)’ Barcelona (AMD 2x4core) with OpenMPI Hybrid parallel: #Processes = MPI*Pthreads Threads always in flight – no blocking MPI_Allreduce overhead is minor; Scaling is great 8 4 2 1

7 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 6 Scaling of double-double ‘dot(x,y)’ Clovertown (Intel 2x4core) with MPICH Hybrid parallel: #Processes = MPI*Pthreads Threads always in-flight – no blocking MPI_Allreduce overhead is awful. Memory bandwidth saturates. 1 2 4 8

8 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 7 Scaling of double-double ‘dot(x,y)’ Barcelona (AMD 2x4core) with OpenMPI Hybrid parallel: #Processes = MPI*Pthreads Threads blocked between flights 8 4 2 1

9 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 8 Thread “Flight Control” MPI-Only –One thread in flight for each MPI process –A thread only blocks when waiting for receive (common usage) – when needed by the algorithm –Components / libraries share single thread resource – MPI_Comm Non-MPI / Hybrid inner loop parallelism –Thread start / stop blocking a performance concern Required for number active threads > number of cores Mixed threading mechanisms, e.g. Pthreads+TBB+OpenMP+… –Rather not block threads – unless needed by algorithm Need a shared thread resource analogous to MPI_Comm Choose a single low-level mechanism ~ MPI

10 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 9 Summary Expect networks of manycore nodes –Apps required to scale with respect to core count –If MPI-only, critical to have multicore leveraging implementation MPI-only or Hybrid Parallelism? –Hybrid may be necessary to address memory access contention –Hybrid can reduce inter-core communication overhead –Hybrid provides opportunities for intra-node load balancing via work-queue Thread “flight control” –Performance (and portability) may require choosing a single low- level mechanism analogous to MPI but for intra-socket parallelism


Download ppt "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear."

Similar presentations


Ads by Google