Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

Slides:



Advertisements
Similar presentations
Conclusion Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Efficacy of GPUs in RAID Parity Calculation 8/8/2007 Matthew Curry and Lee Ward Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
490dp Synchronous vs. Asynchronous Invocation Robert Grimm.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
What is Program Management?
Performance Engineering and Debugging HPC Applications David Skinner
Mapping Techniques for Load Balancing
SAND Number: P Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department.
The hybird approach to programming clusters of multi-core architetures.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Csinparallel.org Patterns and Exemplars: Compelling Strategies for Teaching Parallel and Distributed Computing to CS Undergraduates Libby Shoop Joel Adams.
Automated Computer Account Management in Active Directory June 2 nd, 2009 Bill Claycomb Systems Analyst Sandia National Laboratories Sandia is a multiprogram.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
SAINT2002 Towards Next Generation January 31, 2002 Ly Sauer Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Managed by UT-Battelle for the Department of Energy MPI for MultiCore and ManyCore Galen Shipman Oak Ridge National Laboratory June 4, 2008.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
© 2010 IBM Corporation Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems Gabor Dozsa 1, Sameer Kumar 1, Pavan Balaji 2,
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
Abdelhalim Amer *, Huiwei Lu *, Pavan Balaji *, Satoshi Matsuoka + *Argonne National Laboratory, IL, USA +Tokyo Institute of Technology, Tokyo, Japan Characterizing.
Hybrid MPI and OpenMP Parallel Programming
Strategies for Solving Large-Scale Optimization Problems Judith Hill Sandia National Laboratories October 23, 2007 Modeling and High-Performance Computing.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
System Architecture: Near, Medium, and Long-term Scalable Architectures Panel Discussion Presentation Sandia CSRI Workshop on Next-generation Scalable.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Scalable Linear Algebra Capability Area Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Reconfigurable Computing Aspects of the Cray XD1 Sandia National Laboratories / California Craig Ulmer Cray User Group (CUG 2005) May.
Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.
STK (Sierra Toolkit) Update Trilinos User Group meetings, 2014 R&A: SAND PE Sandia National Laboratories is a multi-program laboratory operated.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
CCA Common Component Architecture Insights from Quantum Chemistry Joseph P. Kenny Scalable Computing Research and Design Sandia National Laboratories Livermore,
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Productive Performance Tools for Heterogeneous Parallel Computing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
For Massively Parallel Computation The Chaotic State of the Art
Ray-Cast Rendering in VTK-m
Hybrid Programming with OpenMP and MPI
Presentation transcript:

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Why not MPI-only Applications? A Case for Investigating Hybrid Parallelism H. Carter Edwards Sandia National Laboratory Sandia CSRI Workshop on Next generation scalable applications: When MPI-only is not enough June 3-5, 2008

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Expected HPC Environment Distributed memory parallelism will always be with us –Network of processing nodes –Well understood programming model, e.g. MPI Processing nodes will have multiple cores –Cores-per-node = cores-per-socket * sockets-per-node –Apps must scale with #cores = #nodes * cores-per-node My concerns –Cores contending for memory resources –Application’s per-socket or per-core memory overhead –Application’s non-deterministic parallel behavior (not discussed)

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Expected HPC Environment My Concerns for MPI-Only Unmanaged contention for node’s memory resources –Cores contend for access to memory hierarchy –Cores contend for socket’s cache memory –MPI-only has no provision for coordinating access –How much will this limit on-node scalability? At the mercy of the memory subsystem performance –Can intentional coordinated access improve scalability? Could lead to increased complexity in the application Per-node memory overhead –Will OS require each core to have its own executable image? –Intra-node distributed memory parallelism requires shared / ghosted data and message passing buffers

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Conclusion: We Need to Investigate Hybrid Parallel Programming Model(s) Two level programming model; orthogonal parallelism –Outer inter-socket: distributed memory / MPI parallelism –Inner intra-socket: shared memory / thread parallelism –Impact to scalability and performance? –Impact to complexity and robustness? My investigation –Application programmer interface Simple and minimalistic Flexible for non-uniform parallel work –Performance parameters thread “flight control” memory access patterns

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Simple Programming Model for Inner Level Parallelism Task pool / work queue strategy (old paradigm) –Sequential operations performed by a single thread –Inner level parallel operations performed by all threads –Inner level parallel operations have a local and temporary scope –Conceptually compatible with OpenMP and TBB model root thread

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Scaling of double-double ‘dot(x,y)’ Barcelona (AMD 2x4core) with OpenMPI Hybrid parallel: #Processes = MPI*Pthreads Threads always in flight – no blocking MPI_Allreduce overhead is minor; Scaling is great

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Scaling of double-double ‘dot(x,y)’ Clovertown (Intel 2x4core) with MPICH Hybrid parallel: #Processes = MPI*Pthreads Threads always in-flight – no blocking MPI_Allreduce overhead is awful. Memory bandwidth saturates

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Scaling of double-double ‘dot(x,y)’ Barcelona (AMD 2x4core) with OpenMPI Hybrid parallel: #Processes = MPI*Pthreads Threads blocked between flights

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Thread “Flight Control” MPI-Only –One thread in flight for each MPI process –A thread only blocks when waiting for receive (common usage) – when needed by the algorithm –Components / libraries share single thread resource – MPI_Comm Non-MPI / Hybrid inner loop parallelism –Thread start / stop blocking a performance concern Required for number active threads > number of cores Mixed threading mechanisms, e.g. Pthreads+TBB+OpenMP+… –Rather not block threads – unless needed by algorithm Need a shared thread resource analogous to MPI_Comm Choose a single low-level mechanism ~ MPI

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Summary Expect networks of manycore nodes –Apps required to scale with respect to core count –If MPI-only, critical to have multicore leveraging implementation MPI-only or Hybrid Parallelism? –Hybrid may be necessary to address memory access contention –Hybrid can reduce inter-core communication overhead –Hybrid provides opportunities for intra-node load balancing via work-queue Thread “flight control” –Performance (and portability) may require choosing a single low- level mechanism analogous to MPI but for intra-socket parallelism