An Orchestration Language for Parallel Objects Laxmikant Kalé, Mark Hills, Chao Huang Parallel Programming Lab University of Illinois I’ll talk about the project of orchestration language for parallel objects Charm++/AMPI and the powerful concept of processor virtualization with migratable objects has proven itself through various successful applications, However, in some very complicated parallel programs, under this framework, we observed an obscure overall flow of control due to the huge amount of parallel objects and asynchronous method invocation. To solve this problem, we design and develop an orchestration language that allows the expression of global view of control flow of a parallel program.
Outline Motivation Language Design Implementation Future Work Charm++ and Virtualization Language Design Program Structure Orchestration Statements Communication Patterns Code Example Implementation Jade and MSA Future Work 5/26/2019 charm.cs.uiuc.edu
Motivation Charm++/AMPI and migratable parallel objects (VPs) User partitions work into parallel objects RTS maps objects onto physical processors Asynchronous method invocation on Chares and ChareArray elements System implementation User View Decomposition done by programmer, everything else like mapping and scheduling is automated Achieve high productivity and performance by Seeking optimal division of labor between programmer and the system Typically larger than number of processors Asynchronous method invocation Chares and Charearrays 5/26/2019 charm.cs.uiuc.edu
Motivation (cont.) Rocket simulation example under traditional MPI vs. Charm++/AMPI framework Benefit: load balance, communication optimizations, modularity Problem: flow of control buried in asynchronous method invocations Solid Fluid . . . 1 2 P Solid1 Fluid1 Solid2 Fluid2 Solidn Fluidm . . . Solid3 In traditional MPI paradigm. The number of partitions of both modules is typically equal to the number of processor P And although the i’th elements of fluid module and solid module are not connected geometrically in the simulation, they are glued together on the I’th processor. Under Charm++/AMPI framework, the two modules each get their own set of parallel objects. And the size of the arrays are not restricted or related. The benefit of this is performance optimizations and better modularity. Problem: due to the asynchronous method invocation, the flow of control is buried deep into the object code 5/26/2019 charm.cs.uiuc.edu
Motivation (cont.) Car-Parrinello Ab Initio Molecular Dynamics (CPAIMD) The overall flow of control is complicated and concurrent operations among different sets of parallel objects 100*100*100 * 128states It would be ideal to have a higher-level control flow specification 5/26/2019 charm.cs.uiuc.edu
Language Design Program consists of Orchestration (.or) code User code Chare arrays declaration Orchestration with parallel constructs Global flow of control User code User variables Sequential methods User code contains as little parallel control flow as possible, eg. Physics, computation, 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Array creation classes MyArrayType : ChareArray1D; Pairs : ChareArray2D; end-classes vars myWorkers : MyArrayType[10]; myPairs : Pairs[8][8]; otherPairs : Pair[2][2]; end-vars Invoking method on an array myWorkers[i].foo(); myWorkers.foo(); Omitting index invoking on all elements 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Orchestration Statements forall forall i in myWorkers myWorkers[i].doWork(1,100); end-forall Whole set of elements (abbreviated) forall in myWorkers doWork(1,100); Subset of elements forall i:0:10:2 in myWorkers forall <i,j:0:8:2> in myPairs Similar but Distinction between this and HPF FORALL Data array vs object array 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Orchestration Statements overlap forall i in worker1 ... end-forall forall i in worker2 end-overlap Normally, when there are 2 or more foralls one after the other. When there’s no need for barrier Useful in multiple time stepping algorithms 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Communication Patterns Input and output of method invocations forall i in workers <..,q[i],..>:=workers[i].f(..,p[(i+1)%N],..); end-forall Method workers::f produces the value q, and consumes value p. (p and q can overlap) Producer-consumer model: Values of p and q can be used as soon as they are made available during the method execution Produces value with same index i; e(i) for consumed value must be affine expression 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Communication Patterns Point-to-point <p[i]> := A[i].f(..); <..> := B[i].g(p[i]); Multicast <p[i]> := A[i].f(...); <...> := B[i].g(p[i-1], p[i], p[i+1]); 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Communication Patterns Reduction <..,+e,..> := B[i,j].g(..); All-to-All forall i in A <rows[i,j:0:N-1]> := A[i].1Dforward(...); end-forall forall k in B ... := B[k].2Dforward(rows[l:0:N-1, k]); 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) Code Example Jacobi 1D begin forall i in J <lb[i],rb[i]> := J[i].init(); end-forall while (e > threshold) <+e, lb[i], rb[i]> := J[i].compute(rb[i-1],lb[i+1]); end-while end User code specifies how to produce output via publish calls Compared to what would it look like in Charm++ 5/26/2019 charm.cs.uiuc.edu
Language Design (cont.) CPAIMD Revisited The overall flow of control is complicated and concurrent operations among different sets of parallel objects It would be ideal to have a higher-level control flow specification 5/26/2019 charm.cs.uiuc.edu
Implementation Jade Multi-phase Shared Array (MSA) Java-like parallel language supporting Chares and ChareArrays Simple interface, everything in one file Translated to Charm++ and compiled Multi-phase Shared Array (MSA) Restricted shared-memory abstraction Provides global view of data to parallel objects Accesses are divided into phases: read, write, accumulate Reduced synchronization traffic Introduce MSA then: Here, we are using MSA as an initial implmentation vehicle for our orchestration language 5/26/2019 charm.cs.uiuc.edu
Implementation (cont.) Current implementation Orchestration code (.or) file translated into Jade (.java) Chare array declaration, control flow code, etc, will be generated Sequential method definition and additional variables integrated into the target file Translated as Jade, compiled and run as a Charm++ program 5/26/2019 charm.cs.uiuc.edu
Future Work Design details Implementation Productivity MSA vs. Message based communication Implicit method: Inlining user code in orchestration Support for sparse chare arrays Implementation Dependence analysis Producer-consumer communication Productivity Interoperability with Charm++/AMPI Integrate libraries Integrated with Charm modules and user libraries 5/26/2019 charm.cs.uiuc.edu
Thank You Parallel Programming Lab at University of Illinois http://charm.cs.uiuc.edu 5/26/2019 charm.cs.uiuc.edu