Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.

Slides:



Advertisements
Similar presentations
Dynamic Load Balancing in Scientific Simulation Angen Zheng.
Advertisements

Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Automating Topology Aware Task Mapping on Large Parallel Machines Abhinav S Bhatele Advisor: Laxmikant V. Kale University of Illinois at Urbana-Champaign.
Load Balancing and Topology Aware Mapping for Petascale Machines Abhinav S Bhatele Parallel Programming Laboratory, Illinois.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
1 Case Study I: Parallel Molecular Dynamics and NAMD: Laxmikant Kale Parallel Programming Laboratory University of Illinois at.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (ii) Bill Smith CCLRC Daresbury Laboratory
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale.
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
1 NAMD: Biomolecular Simulation on Thousands of Processors James C. Phillips Gengbin Zheng Sameer Kumar Laxmikant Kale Parallel.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scalability and interoperable libraries in NAMD Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois.
Parallel Molecular Dynamics Application Oriented Computer Science Research Laxmikant Kale
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
1 Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study Laxmikant Kale Gengbin Zheng Sameer Kumar Chee Wai.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC Scaling NAMD to 100 Million Atoms on Petascale.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
Dynamic Load Balancing in Scientific Simulation
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
NAMD on BlueWaters Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale.
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Effects of Contention on Message Latencies in Large Supercomputers
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Scalable Molecular Dynamics for Large Biomolecular Systems
Integrated Runtime of Charm++ and OpenMP
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Presentation transcript:

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center

Molecular Dynamics A system of [charged] atoms with bonds Use Newtonian Mechanics to find the positions and velocities of atoms Each time-step is typically in femto-seconds At each time step calculate the forces on all atoms calculate the velocities and move atoms around September 9th, 2008Abhinav S Bhatele 2

NAMD: NAnoscale Molecular Dynamics Naïve force calculation is O(N 2 ) Reduced to O(N logN) by calculating Bonded forces Non-bonded: using a cutoff radius Short-range: calculated every time step Long-range: calculated every fourth time-step (PME) September 9th, 2008Abhinav S Bhatele 3

Hybrid of spatial and force decomposition NAMD’s Parallel Design September 9th, 2008Abhinav S Bhatele 4

Parallelization using Charm++ September 9th, 2008Abhinav S Bhatele 5 Static Mapping Load Balancing Bhatele, A., Kumar, S., Mei, C., Phillips, J. C., Zheng, G. & Kale, L. V Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium, Miami, FL, USA, April 2008.

Communication in NAMD Each patch multicasts its information to many computes Each compute is a target of two multicasts only Use ‘Proxies’ to send data to different computes on the same processor September 9th, 2008Abhinav S Bhatele 6

Topology Aware Techniques Static Placement of Patches September 9th, 2008Abhinav S Bhatele 7

Topology Aware Techniques (contd.) Placement of computes September 9th, 2008Abhinav S Bhatele 8

Load Balancing in Charm++ Principle of Persistence Object communication patterns and computational loads tend to persist over time Measurement-based Load Balancing Instrument computation time and communication volume at runtime Use the database to make new load balancing decisions September 9th, 2008Abhinav S Bhatele 9

NAMD’s Load Balancing Strategy NAMD uses a dynamic centralized greedy strategy There are two schemes in play: A comprehensive strategy (called once) A refinement scheme (called several times during a run) Algorithm: Pick a compute and find a “suitable” processor to place it on September 9th, 2008Abhinav S Bhatele 10

Choice of a suitable processor Among underloaded processors, try to: Find a processor with the two patches or their proxies Find a processor with one patch or a proxy Pick any underloaded processor September 9th, 2008Abhinav S Bhatele 11 Highest Priority Lowest Priority

Load Balancing Metrics Load Balance: Bring Max-to-Avg Ratio close to 1 Communication Volume: Minimize the number of proxies Communication Traffic: Minimize hop bytes Hop-bytes = Message size X Distance traveled by message September 9th, 2008Abhinav S Bhatele 12 Agarwal, T., Sharma, A., Kale, L.V Topology-aware task mapping for reducing communication contention on large parallel machines, In Proceedings of IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, April 2006.

Results: Hop-bytes September 9th, 2008Abhinav S Bhatele 13

Results: Performance September 9th, 2008Abhinav S Bhatele 14

Simulation of WW Domain WW: 30,591- atom simulation on NCSA’s Abe cluster September 9th, 2008Abhinav S Bhatele 15 Freddolino, P. L., Liu, F., Gruebele, M., & Schulten, K Ten-microsecond MD simulation of a fast-folding WW domain Biophysical Journal 94 L75-L77.

Future Work A scalable distributed load balancing strategy Generalized Scenario: multicasts: each object is the target of multiple multicasts use topological information to minimize communication Understanding the effect of various factors on load balancing in detail September 9th, 2008Abhinav S Bhatele 16

NAMD Development Team: Parallel Programming Lab, UIUC – Abhinav Bhatele, Sameer Kumar, David Kunzman, Chee Wai Lee, Chao Mei, Gengbin Zheng, Laxmikant V. Kale Theoretical and Computational Biophysics Group – Jim Phillips, Klaus Schulten Acknowledgments: Argonne National Laboratory, Pittsburgh Supercomputing Center (Shawn Brown, Chad Vizino, Brian Johanson), TeraGrid