Welcome to the 2016 Charm++ Workshop!

Slides:

Advertisements

Similar presentations

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.

Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.

Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.

A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.

Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.

Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications Jonathan Lifflander, UIUC Sriram Krishnamoorthy, PNNL* Laxmikant.

Adaptive MPI Milind A. Bhandarkar

Welcome to the 2015 Charm++ Workshop! Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science.

Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

SMALL IDENTIFIERS FOR CHARE ARRAY ELEMENTS With contributions from Akhil Langer, Harshitha Menon, Bilge Acun, Ramprasad Venkataraman, and L.V. Kalé Phil.

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.

Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science.

NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)

Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.

Fault Tolerance in Charm++ Gengbin Zheng 10/11/2005 Parallel Programming Lab University of Illinois at Urbana- Champaign.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.

Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale

Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.

Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.

1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.

Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

PADTAD 2008 Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign.

ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.

1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.

Debugging Tools for Charm++ Applications Filippo Gioachin University of Illinois at Urbana-Champaign.

Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.

Productive Performance Tools for Heterogeneous Parallel Computing

Adaptive MPI Performance & Application Studies

Chandra S. Martha Min Lee 02/10/2016

Welcome to the 2017 Charm++ Workshop!

For Massively Parallel Computation The Chaotic State of the Art

DARMA Janine C. Bennett, Jonathan Lifflander, David S. Hollman, Jeremiah Wilke, Hemanth Kolla, Aram Markosyan, Nicole Slattengren, Robert L. Clay (PM)

Parallel Objects: Virtualization & In-Process Components

uGNI-based Charm++ Runtime for Cray Gemini Interconnect

Performance Evaluation of Adaptive MPI

Scalable Fault Tolerance Schemes using Adaptive Runtime Support

Department of Computer Science University of California,Santa Barbara

Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab

CPSC 531: System Modeling and Simulation

Component Frameworks:

Title Meta-Balancer: Automated Selection of Load Balancing Strategies

Welcome to the 2018 Charm++ Workshop!

Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar

Integrated Runtime of Charm++ and OpenMP

Faucets: Efficient Utilization of Multiple Clusters

Case Studies with Projections

BigSim: Simulating PetaFLOPS Supercomputers

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale

Parallel Programming in C with MPI and OpenMP

An Orchestration Language for Parallel Objects

Support for Adaptivity in ARMCI Using Migratable Objects

Laxmikant (Sanjay) Kale Parallel Programming Laboratory

Presentation transcript:

Welcome to the 2016 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign

A couple of forks MPI + x “Task Models” Overdecomposition + Migratability MPI + x “Task Models” Asynchrony Overdecomposition and migratability: Most adaptivity MPI+X Task Models

Overdecomposition Decompose the work units & data units into many more pieces than execution units Cores/Nodes/.. Not so hard: we do decomposition anyway

Migratability Allow these work and data units to be migratable at runtime i.e. the programmer or runtime, can move them Consequences for the app-developer Communication must now be addressed to logical units with global names, not to physical processors But this is a good thing Consequences for RTS Must keep track of where each unit is Naming and location management

Asynchrony: Message-Driven Execution Now: You have multiple units on each processor They address each other via logical names Need for scheduling: What sequence should the work units execute in? One answer: let the programmer sequence them Seen in current codes, e.g. some AMR frameworks Message-driven execution: Let the work-unit that happens to have data (“message”) available for it execute next Let the RTS select among ready work units Programmer should not specify what executes next, but can influence it via priorities

Charm++ Charm++ began as an adaptive runtime system for dealing with application variability: Dynamic load imbalances Task parallelism first (state-space search) Iterative (but irregular/dynamic) apps in mid-1990s But it turns out to be useful for future hardware, which is also characterized by variability Charm++ workshop 2014

Message-driven Execution A[..].foo(…) Charm++ workshop 2014

Empowering the RTS Adaptive Runtime System Overdecomposition Introspection Adaptivity Overdecomposition Asynchrony Migratability You can have asynchrony without overdecomposition or vice versa, you can have migratability without asynchrony, but you need all three to empower the RTS.You need to add introspection and adaptivity to make a powerful Adaptive Runtime System The Adaptive RTS can: Dynamically balance loads Optimize communication: Spread over time, async collectives Automatic latency tolerance Prefetch data with almost perfect predictability Charm++ workshop 2014

What Do RTSs Look Like: Charm++ Charm++ workshop 2014

PPL Highlights of last year Petascale Applications made excellent progress ChaNGa, NAMD, EpiSimdemics, OpenAtom They are all current, past or upcoming PRAC applications, selected by NSF for large allocations for science on Blue Waters! Charm++ workshop 2014

External Evaluation of Charm++ Sandia@Livermore evaluated Charm++ Robert Clay, Janine Bennett, David Hollman, Jeremiah Wilkes, and Sandia team Selected Charm++ along with Legion and Uintah Week-long exploration by a team Eric Mikida and Nikhil Jain from PPL Mini-aero was implemented.. With load balancing, resilience, etc. ! Sandia report Intel exploration continues Tim Mattson, Robert Wijngaart, [Jeff Hammond] Summer intern implemented PRK benchmarks Charm++ workshop 2014

Episimdemics Simulation of epidemics: Collaboration with Madhav Marathe et al a Virginia Tech, and Livermore Converted from original MPI to Charm++ Recent results scale to most of blue waters Many optimizations that exploit asynchrony of Charm++ Charm++ workshop 2014

Charmworks, Inc. A path to long-term sustainability of Charm++ Commercially supported version Focus on 10-1000 nodes at Charmworks Existing collaborative apps to continue with same licensing (NAMD, OpenAtom) as before University version continues to be distributed Freely, in source code form, for non-profits Code base: Committed to avoiding divergence for a few years Charmworks codebase will be streamlined We will be happy to take your feedback Charm++ workshop 2014

Charmworks contributions Past or ongoing relevant work: Eclipse plugin Charmdebug improvements Significantly improved robust parsing of .ci files Packaging scripts: spack, … GPU manager with shared memory nodes Accel framework Default parameter choices Automation of checkpoint/restart scheduling Metabalancer integration Performance report Charm++ workshop 2014

Graduating Doctoral Students! In the first half of 2016, mostly Nikhil Jain (LLNL) Jonathan Lifflander (Sandia @ Livermore) Xiang Ni (IBM Research) Phil Miller (charmworks) Harshitha Menon Charm++ workshop 2014

Workshop Overview Keynotes Invited talks: Applications Barbara Chapman (today) Thomas Sterling (tomorrow morning) Invited talks: Applications Charm++ features and capabilities Within-node parallelism, AMPI, Panel: Higher Level Abstractions Charm++ workshop 2014