An Orchestration Language for Parallel Objects

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
What is Software Architecture?
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Advanced / Other Programming Models Sathish Vadhiyar.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Introduction to OOP CPS235: Introduction.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
PADTAD 2008 Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
TensorFlow– A system for large-scale machine learning
Visit for more Learning Resources
Distributed Shared Memory
Operating Systems (CS 340 D)
Conception of parallel algorithms
Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun
GC211Data Structure Lecture2 Sara Alhajjam.
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Programming Models for SimMillennium
The University of Adelaide, School of Computer Science
Parallel Algorithm Design
Implementing Chares in a High-Level Scripting Language
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Parallel Programming in C with MPI and OpenMP
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Runtime Optimizations via Processor Virtualization
What is Concurrent Programming?
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
Outline Chapter 2 (cont) OS Design OS structure
Charisma: Orchestrating Migratable Parallel Objects
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Parallel Programming in C with MPI and OpenMP
Higher Level Languages on Adaptive Run-Time System
COMP60611 Fundamentals of Parallel and Distributed Systems
Support for Adaptivity in ARMCI Using Migratable Objects
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

An Orchestration Language for Parallel Objects Laxmikant Kalé, Mark Hills, Chao Huang Parallel Programming Lab University of Illinois I’ll talk about the project of orchestration language for parallel objects Charm++/AMPI and the powerful concept of processor virtualization with migratable objects has proven itself through various successful applications, However, in some very complicated parallel programs, under this framework, we observed an obscure overall flow of control due to the huge amount of parallel objects and asynchronous method invocation. To solve this problem, we design and develop an orchestration language that allows the expression of global view of control flow of a parallel program.

Outline Motivation Language Design Implementation Future Work Charm++ and Virtualization Language Design Program Structure Orchestration Statements Communication Patterns Code Example Implementation Jade and MSA Future Work 5/26/2019 charm.cs.uiuc.edu

Motivation Charm++/AMPI and migratable parallel objects (VPs) User partitions work into parallel objects RTS maps objects onto physical processors Asynchronous method invocation on Chares and ChareArray elements System implementation User View Decomposition done by programmer, everything else like mapping and scheduling is automated Achieve high productivity and performance by Seeking optimal division of labor between programmer and the system Typically larger than number of processors Asynchronous method invocation Chares and Charearrays 5/26/2019 charm.cs.uiuc.edu

Motivation (cont.) Rocket simulation example under traditional MPI vs. Charm++/AMPI framework Benefit: load balance, communication optimizations, modularity Problem: flow of control buried in asynchronous method invocations Solid Fluid . . . 1 2 P Solid1 Fluid1 Solid2 Fluid2 Solidn Fluidm . . . Solid3 In traditional MPI paradigm. The number of partitions of both modules is typically equal to the number of processor P And although the i’th elements of fluid module and solid module are not connected geometrically in the simulation, they are glued together on the I’th processor. Under Charm++/AMPI framework, the two modules each get their own set of parallel objects. And the size of the arrays are not restricted or related. The benefit of this is performance optimizations and better modularity. Problem: due to the asynchronous method invocation, the flow of control is buried deep into the object code 5/26/2019 charm.cs.uiuc.edu

Motivation (cont.) Car-Parrinello Ab Initio Molecular Dynamics (CPAIMD) The overall flow of control is complicated and concurrent operations among different sets of parallel objects 100*100*100 * 128states It would be ideal to have a higher-level control flow specification 5/26/2019 charm.cs.uiuc.edu

Language Design Program consists of Orchestration (.or) code User code Chare arrays declaration Orchestration with parallel constructs Global flow of control User code User variables Sequential methods User code contains as little parallel control flow as possible, eg. Physics, computation, 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Array creation classes MyArrayType : ChareArray1D; Pairs : ChareArray2D; end-classes vars myWorkers : MyArrayType[10]; myPairs : Pairs[8][8]; otherPairs : Pair[2][2]; end-vars Invoking method on an array myWorkers[i].foo(); myWorkers.foo(); Omitting index invoking on all elements 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Orchestration Statements forall forall i in myWorkers myWorkers[i].doWork(1,100); end-forall Whole set of elements (abbreviated) forall in myWorkers doWork(1,100); Subset of elements forall i:0:10:2 in myWorkers forall <i,j:0:8:2> in myPairs Similar but Distinction between this and HPF FORALL Data array vs object array 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Orchestration Statements overlap forall i in worker1 ... end-forall forall i in worker2 end-overlap Normally, when there are 2 or more foralls one after the other. When there’s no need for barrier Useful in multiple time stepping algorithms 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Communication Patterns Input and output of method invocations forall i in workers <..,q[i],..>:=workers[i].f(..,p[(i+1)%N],..); end-forall Method workers::f produces the value q, and consumes value p. (p and q can overlap) Producer-consumer model: Values of p and q can be used as soon as they are made available during the method execution Produces value with same index i; e(i) for consumed value must be affine expression 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Communication Patterns Point-to-point <p[i]> := A[i].f(..); <..> := B[i].g(p[i]); Multicast <p[i]> := A[i].f(...); <...> := B[i].g(p[i-1], p[i], p[i+1]); 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Communication Patterns Reduction <..,+e,..> := B[i,j].g(..); All-to-All forall i in A <rows[i,j:0:N-1]> := A[i].1Dforward(...); end-forall forall k in B ... := B[k].2Dforward(rows[l:0:N-1, k]); 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) Code Example Jacobi 1D begin forall i in J <lb[i],rb[i]> := J[i].init(); end-forall while (e > threshold) <+e, lb[i], rb[i]> := J[i].compute(rb[i-1],lb[i+1]); end-while end User code specifies how to produce output via publish calls Compared to what would it look like in Charm++ 5/26/2019 charm.cs.uiuc.edu

Language Design (cont.) CPAIMD Revisited The overall flow of control is complicated and concurrent operations among different sets of parallel objects It would be ideal to have a higher-level control flow specification 5/26/2019 charm.cs.uiuc.edu

Implementation Jade Multi-phase Shared Array (MSA) Java-like parallel language supporting Chares and ChareArrays Simple interface, everything in one file Translated to Charm++ and compiled Multi-phase Shared Array (MSA) Restricted shared-memory abstraction Provides global view of data to parallel objects Accesses are divided into phases: read, write, accumulate Reduced synchronization traffic Introduce MSA then: Here, we are using MSA as an initial implmentation vehicle for our orchestration language 5/26/2019 charm.cs.uiuc.edu

Implementation (cont.) Current implementation Orchestration code (.or) file translated into Jade (.java) Chare array declaration, control flow code, etc, will be generated Sequential method definition and additional variables integrated into the target file Translated as Jade, compiled and run as a Charm++ program 5/26/2019 charm.cs.uiuc.edu

Future Work Design details Implementation Productivity MSA vs. Message based communication Implicit method: Inlining user code in orchestration Support for sparse chare arrays Implementation Dependence analysis Producer-consumer communication Productivity Interoperability with Charm++/AMPI Integrate libraries Integrated with Charm modules and user libraries 5/26/2019 charm.cs.uiuc.edu

Thank You Parallel Programming Lab at University of Illinois http://charm.cs.uiuc.edu 5/26/2019 charm.cs.uiuc.edu