Component Frameworks:

Slides:

Advertisements

Similar presentations

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Advertisements

Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.

Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.

PPL-Dept of Computer Science, UIUC Component Frameworks: Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science University.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.

1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.

Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty,

1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.

Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.

Adaptive MPI Milind A. Bhandarkar

7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Advanced / Other Programming Models Sathish Vadhiyar.

Adaptive Mesh Modification in Parallel Framework Application of parFUM Sandhya Mangala (MIE) Prof. Philippe H. Geubelle (AE) University of Illinois, Urbana-Champaign.

Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.

1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)

CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois.

NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)

1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.

1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.

Computer Science Overview Laxmikant Kale October 29, 2002 ©2002 Board of Trustees of the University of Illinois ©

Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.

Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.

1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.

Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University.

1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.

Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.

Basic Concepts of FEM Framework & API

Computer Science Overview

ChaNGa: Design Issues in High Performance Cosmology

Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Objects: Virtualization & In-Process Components

ParFUM: High-level Adaptivity Algorithms for Unstructured Meshes

Performance Evaluation of Adaptive MPI

Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.

Component Frameworks:

Workshop on Charm++ and Applications Welcome and Introduction

Title Meta-Balancer: Automated Selection of Load Balancing Strategies

Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar

GENERAL VIEW OF KRATOS MULTIPHYSICS

Runtime Optimizations via Processor Virtualization

Gary M. Zoppetti Gagan Agrawal

Hybrid Programming with OpenMP and MPI

Faucets: Efficient Utilization of Multiple Clusters

Case Studies with Projections

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale

Parallel Programming in C with MPI and OpenMP

An Orchestration Language for Parallel Objects

Higher Level Languages on Adaptive Run-Time System

Support for Adaptivity in ARMCI Using Migratable Objects

Parallel Implementation of Adaptive Spacetime Simulations A

Presentation transcript:

Component Frameworks: Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC Motivation Parallel Computing in Science and Engineering Competitive advantage Pain in the neck Necessary evil It is not so difficult But tedious, and error-prone New issues: race conditions, load imbalances, modularity in presence of concurrency,.. Just have to bite the bullet, right? PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC But wait… Parallel computation structures The set of the parallel applications is diverse and complex Yet, the underlying parallel data structures and communication structures are small in number Structured and unstructured grids, trees (AMR,..), particles, interactions between these, space-time One should be able to reuse those Avoid doing the same parallel programming again and again PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC A second idea Many problems require dynamic load balancing We should be able to reuse load rebalancing strategies It should be possible to separate load balancing code from application code This strategy is embodied in Charm++ Express the program as a collection of interacting entities (objects). Let the system control mapping to processors PPL-Dept of Computer Science, UIUC

Charm Component Frameworks Object based decomposition Reuse of Specialized Parallel Strucutres Load balancing Auto. Checkpointing Flexible use of clusters Out-of-core execn. Component Frameworks PPL-Dept of Computer Science, UIUC

Current Set of Component Frameworks FEM / unstructured meshes: “Mature”, with several applications already Multiblock: multiple structured grids New, but very promising AMR: oct and quad trees PPL-Dept of Computer Science, UIUC

Multiblock Constituents PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC Terminology PPL-Dept of Computer Science, UIUC

Multi-partition decomposition Idea: divide the computation into a large number of pieces Independent of number of processors typically larger than number of processors Let the system map entities to processors PPL-Dept of Computer Science, UIUC

Component Frameworks: Using the Load Balancing Framework Automatic Conversion from MPI Cross module interpolation Structured FEM MPI-on-Charm Irecv+ Frameworkpath Load database + balancer Migration path Charm++ Converse PPL-Dept of Computer Science, UIUC

Finite Element Framework Goals Hide parallel implementation in the runtime system Allow adaptive parallel computation and dynamic automatic load balancing Leave physics and numerics to user Present clean, “almost serial” interface: begin time loop compute forces communicate shared nodes update node positions end time loop begin time loop compute forces update node positions end time loop Serial Code for entire mesh Framework Code for mesh partition PPL-Dept of Computer Science, UIUC

FEM Framework: Responsibilities FEM Application (Initialize, Registration of Nodal Attributes, Loops Over Elements, Finalize) FEM Framework (Update of Nodal properties, Reductions over nodes or partitions) Partitioner Combiner METIS Charm++ (Dynamic Load Balancing, Communication) I/O PPL-Dept of Computer Science, UIUC

Structure of an FEM Application init() driver Update Update driver driver Update Shared Nodes Shared Nodes finalize() PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC Dendritic Growth Studies evolution of solidification microstructures using a phase-field model computed on an adaptive finite element grid Adaptive refinement and coarsening of grid involves re-partitioning PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle PPL-Dept of Computer Science, UIUC

“Overhead” of Multipartitioning PPL-Dept of Computer Science, UIUC

Load balancer in action Automatic Load Balancing in Crack Propagation 1. Elements Added 3. Chunks Migrated 2. Load Balancer Invoked PPL-Dept of Computer Science, UIUC

Parallel Collision Detection Detect collisions (intersections) between objects scattered across processors Approach, based on Charm++ Arrays Overlay regular, sparse 3D grid of voxels (boxes) Send objects to all voxels they touch Collide voxels independently and collect results Leave collision response to user code PPL-Dept of Computer Science, UIUC

Collision Detection Speed O(n) serial performance Single Linux PC 2us per polygon serial performance Good speedups to 1000s of processors ASCI Red, 65,000 polygons per processor scaling problem (to 100 million polygons) PPL-Dept of Computer Science, UIUC

PPL-Dept of Computer Science, UIUC Rocket Simulation Our Approach: Multi-partition decomposition Data-driven objects (Charm++) Automatic load balancing framework AMPI: Migration path for existing MPI+Fortran90 codes ROCFLO, ROCSOLID, and ROCFACE PPL-Dept of Computer Science, UIUC