Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Slides:

Advertisements

Similar presentations

Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.

Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Thoughts on Shared Caches Jeff Odom University of Maryland.

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Mapping Communication Layouts to Network Hardware Characteristics on Massive-Scale Blue Gene Systems Pavan Balaji*, Rinku Gupta*, Abhinav Vishnu + and.

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.

Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.

Automating Topology Aware Task Mapping on Large Parallel Machines Abhinav S Bhatele Advisor: Laxmikant V. Kale University of Illinois at Urbana-Champaign.

Load Balancing and Topology Aware Mapping for Petascale Machines Abhinav S Bhatele Parallel Programming Laboratory, Illinois.

Storage area network and System area network (SAN)

Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.

Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.

A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links Ernie Chan.

Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.

1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.

Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.

Application Challenges for Sustained Petascale William Gropp

Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

© 2010 IBM Corporation Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems Gabor Dozsa 1, Sameer Kumar 1, Pavan Balaji 2,

Advanced / Other Programming Models Sathish Vadhiyar.

 Collectives on Two-tier Direct Networks EuroMPI – 2012 Nikhil Jain, JohnMark Lau, Laxmikant Kale 26 th September, 2012.

Analysis of Topology-Dependent MPI Performance on Gemini Networks Antonio J. Peña, Ralf G. Correa Carvalho, James Dinan, Pavan Balaji, Rajeev Thakur, and.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.

Charm Workshop CkDirect: Charm++ RDMA Put Presented by Eric Bohm CkDirect Team: Eric Bohm, Sayantan Chakravorty, Pritish Jetley, Abhinav Bhatele.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,

Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.

Super computers Parallel Processing

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.

Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

1 Optimizing Quantum Chemistry using Charm++ Eric Bohm Parallel Programming Laboratory Department of Computer Science University.

Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.

LeanCP: Terascale Car-Parrinello ab initio molecular dynamics using charm++ Application Team Glenn J. Martyna, Physical Sciences Division, IBM Research.

A CASE STUDY IN USING MASSIVELY PARALLEL SIMULATION FOR EXTREME- SCALE TORUS NETWORK CO-DESIGN Misbah Mubarak, Rensselaer Polytechnic Institute Christopher.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.

1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.

Multiprocessor Systems

Effects of Contention on Message Latencies in Large Supercomputers

uGNI-based Charm++ Runtime for Cray Gemini Interconnect

Performance Evaluation of Adaptive MPI

Parallelization of CPAIMD using Charm++

BigSim: Simulating PetaFLOPS Supercomputers

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale

An Orchestration Language for Parallel Objects

Support for Adaptivity in ARMCI Using Migratable Objects

Presentation transcript:

Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY IMPORTANT AGAIN? bhatele, illinois.edu Webpage:

Outline May 29th, Why should we consider topology aware mapping for optimizing performance? Demonstrate the effects of contention on message latencies through simple MPI benchmarks Case Study: OpenAtom Abhinav LSPP 2009

The Mapping Problem Given a set of communicating parallel “entities”, map them onto physical processors Entities – COMM_WORLD ranks in case of an MPI program – Objects in case of a Charm++ program Aim – Balance load – Minimize communication traffic May 29th, 20093Abhinav LSPP 2009

Target Machines 3D torus/mesh interconnects Blue Gene/P at ANL: – 40,960 nodes, torus - 32 x 32 x 40 XT4 (Jaguar) at ORNL: – 8,064 nodes, torus - 21 x 16 x 24 May 29th, Other interconnects – Fat-tree Abhinav LSPP 2009

Motivation Consider a 3D mesh/torus interconnect Message latencies can be modeled by (L f /B) x D + L/B L f = length of flit, B = bandwidth, D = hops, L = message size When (L f * D) << L, first term is negligible May 29th, But in presence of contention … Abhinav LSPP 2009

MPI Benchmarks† Quantification of message latencies and dependence on hops – No sharing of links (no contention) – Sharing of links (with contention) May 29th, † Abhinav LSPP 2009

WOCON: No contention A master rank sends messages to all other ranks, one at a time (with replies) May 29th, 20097Abhinav LSPP 2009

WOCON: Results May 29th, ANL Blue Gene/PPSC XT3 (L f /B) x D + L/B Abhinav LSPP 2009

WICON: With Contention Divide all ranks into pairs and everyone sends to their respective partner simultaneously May 29th, Near Neighbor: NNRandom: RND Abhinav LSPP 2009

WICON: Results May 29th, ANL Blue Gene/PPSC XT3 Abhinav LSPP 2009

WICON2: Controlled Contention Pair each rank with a partner which is ‘n’ hops away May 29th, Abhinav LSPP 2009

May 29th, Abhinav LSPP 2009

WICON2: Blue Gene/P May 29th, times Abhinav LSPP 2009

WICON2: Cray XT3 May 29th, 2009Abhinav LSPP times

More tests … May 29th, 2009Abhinav LSPP

May 29th, 2009Abhinav LSPP ANL Blue Gene/PPSC XT3

More tests … May 29th, 2009Abhinav LSPP

May 29th, 2009Abhinav LSPP ANL Blue Gene/P PSC XT3

OpenAtom Ab-Initio Molecular Dynamics code Communication is static and structured Challenge: Multiple groups of objects with conflicting communication patterns May 29th, Abhinav LSPP 2009

Parallelization using Charm++ May 29th, [10] Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2): , Abhinav LSPP 2009

Topology Mapping of Chare Arrays May 29th, State-wise communication Plane-wise communication Joint work with Eric J. Bohm Abhinav LSPP 2009

Results on Blue Gene/P (ANL) May 29th, Abhinav LSPP 2009

Results on XT3 May 29th, Abhinav LSPP 2009

Summary May 29th, Topology is important again 2. Even on fast interconnects such as Cray 3. In presence of contention, bandwidth occupancy effects message latencies significantly 4. Increases with the number of hops each message travels 5. Topology Manager API: A uniform API for IBM and Cray machines 6. Case Studies: OpenAtom, NAMD, Stencil 7. Eventually, an automatic mapping framework Abhinav LSPP 2009

Acknowledgements: 1. Argonne National Laboratory: Pete Beckman, Tisha Stacey 2. Pittsburgh Supercomputing Center: Chad Vizino, Shawn Brown 3. Oak Ridge National Laboratory: Patrick Worley, Donald Frederick 4. IBM: Robert Walkup, Sameer Kumar 5. Cray: Larry Kaplan 6. Matt Reilly References: 1. Abhinav Bhatele, Laxmikant V. Kale, Dynamic Topology Aware Load Balancing Algorithms for MD Applications, To appear in Proceedings of International Conference on Supercomputing (ICS), Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale, A Case Study of Communication Optimizations on 3D Mesh Interconnects, To appear in Proceedings of Euro-Par, Abhinav Bhatele, Laxmikant V. Kale, Benefits of Topology-aware Mapping for Mesh Topologies, Parallel Processing Letters (Special issue on Large-Scale Parallel Processing), Vol: 18 Issue:4, Pages: , bhatele, illinois.edu Webpage: