Presentation is loading. Please wait.

Presentation is loading. Please wait.

Charisma: Orchestrating Migratable Parallel Objects

Similar presentations


Presentation on theme: "Charisma: Orchestrating Migratable Parallel Objects"— Presentation transcript:

1 Charisma: Orchestrating Migratable Parallel Objects
Chao Huang, Laxmikant Kale Parallel Programming Lab University of Illinois at Urbana-Champaign Thanks to my shepherd Prof. Barton Miller for giving us useful advice on improving our paper.

2 Motivation Complex structures in modern applications
Large number of components Complicated interactions Parallel programming productivity Traditional SPMD paradigms break modularity Object-based paradigm obscures global control flow Goal A language for expressing global view of control Target user: novice parallel programmer with engineering background 2/28/2019 HPDC 2007

3 Outline Motivation Expressing Flow of Control
Language Design and Implementation Code Examples Results Performance and Productivity Studies Related Work Future Work 2/28/2019 HPDC 2007

4 Example: MD Structure of a simple MD simulation patch (cell)
MPI_Recv(angle_buf,…, ANGLE_SRC, ANGLE_TAG,…); /* calculate angle forces */ MPI_Recv(pair_left_buf,…, PAIR_LEFT_SRC, PAIR_LEFT_TAG,…); MPI_Recv(pair_right_buf,…, PAIR_RIGHT_SRC, PAIR_RIGHT_TAG,…); /* calculate pairwise forces */ Structure of a simple MD simulation MPI_Recv(buf,..., MPI_ANY_SOURCE, MPI_ANY_TAG,…); switch(GET_TYPE(buf)){ case (FOR_ANGLE): /* calculate angle forces */ case (FOR_PAIR_LEFT): /* calculate pairwise forces */ case (FOR_PAIR_RIGHT): } a few hundreds of patches, several hundreds of computes, thousands of PME objects Different # of different objects on one PE Manual allocation , load balancing is hard Wildcard is bad when there’s deeper flow of control: PME Charm++ clear abstract and separation of entities for load balancing patch (cell) compute (cellpair) 2/28/2019 HPDC 2007

5 Expressing Flow of Control
Charm++: fragmented in object code MainChare::MainChare{ cell.sendCoords(); } MainChare::reduceEnergy(energy){ totalEnerty+= energy; if iter++ ( MAX_ITER cells.sendCoords(); else CkExit(); } Cellpair::recvCoords(coords){ if not coords from both cells received buffer(coords); return; else // all coords ready force = calcForces(); for index in 2 cells cells(index).recvForces(forces); } Cell::sendCoords(){ for index in 26 neighbor cellpairs cellpairs(index).recvCoords(coords); } Cell::recvForces(forces){ totalforces += forces; if not all forces from all cellpairs received return; else // neighborhood reduction completed integrate(); mainProxy.reduceEnergy(energy); } Different communication patterns: broadcast, multicast, ptp, reduction 2/28/2019 HPDC 2007

6 Charisma Expressing global view of control Features
Parallel constructs in orchestration code Sequential code separately in user C++ code Features High level abstraction of control Separation of parallel constructs and sequential code Generating Charm++ code Automatic load balancing, adaptive overlap, etc User code has minimal parallel involvement 2/28/2019 HPDC 2007

7 Language Design foreach statement Producer-consumer model
Invokes method on all elements: object-level parallelism Producer-consumer model Sequential code unaware of source of input values and destination of output values Data is sent out as soon as it becomes available Various communication patterns Control constructs: loop, if-then-else, overlap foreach i in workers workers[i].doWork(); end-foreach foreach i in workers (p[i]) <- workers[i].foo(); workers[i].bar(p[i+1]); end-foreach Communication patterns through the combinations of object indexes and produced/consumed parameter indexes. Example: all-to-all transpose in fft exmaple 2/28/2019 HPDC 2007

8 Code Example: MD with Charisma
Orchestration code: global view of control foreach i,j,k in cells (coords[i,j,k]) <- cells[i,j,k].produceCoords(); end-foreach for iter = 1 to MAX_ITER foreach i1,j1,k1,i2,j2,k2 in cellpairs (+forces[i1,j1,k1],+forces[i2,j2,k2]) <- cellpairs[i1,j1,k1,i2,j2,k2]. calcForces(coords[i1,j1,k1],coords[i2,j2,k2]); foreach i,j,k in cells (coords[i,j,k],+energy) <- cells[i,j,k].integrate(forces[i,j,k]); MDMain.updateEnergy(energy); end-for 2/28/2019 HPDC 2007

9 Code Example: MD with Charisma
Orchestration code: global view of control foreach i,j,k in cells (coords[i,j,k]) <- cells[i,j,k].produceCoords(); end-foreach for iter = 1 to MAX_ITER foreach i1,j1,k1,i2,j2,k2 in cellpairs (+forces[i1,j1,k1],+forces[i2,j2,k2]) <- cellpairs[i1,j1,k1,i2,j2,k2]. calcForces(coords[i1,j1,k1],coords[i2,j2,k2]); foreach i,j,k in cells (coords[i,j,k],+energy) <- cells[i,j,k].integrate(forces[i,j,k]); MDMain.updateEnergy(energy); end-for 2/28/2019 HPDC 2007

10 Code Example: MD with Charisma
Orchestration code: global view of control foreach i,j,k in cells (coords[i,j,k]) <- cells[i,j,k].produceCoords(); end-foreach for iter = 1 to MAX_ITER foreach i1,j1,k1,i2,j2,k2 in cellpairs (+forces[i1,j1,k1],+forces[i2,j2,k2]) <- cellpairs[i1,j1,k1,i2,j2,k2]. calcForces(coords[i1,j1,k1],coords[i2,j2,k2]); foreach i,j,k in cells (coords[i,j,k],+energy) <- cells[i,j,k].integrate(forces[i,j,k]); MDMain.updateEnergy(energy); end-for Comment: Charisma orchestrates parallel objects and expresses global control view clearly. 2/28/2019 HPDC 2007

11 Code Example: MD with Charisma
Sequential code void Cell::integrate(Force forces[], outport coords, outport energy){ for(int i=0;i<mySize;i++){ myAtoms[i].applyForces(forces[i]); myAtoms[i].update(); } produce(coords,myAtoms,mySize); double myEnergy = this->calculateEnergy(); reduce(energy, myEnergy, “+”); Benefit: sequential algorithm can be developed independently, exchanged, kept secret. Send out whatever data available. 2/28/2019 HPDC 2007

12 Language Implementation
Dependence analysis Identify inports and outports Organize dependence graph Generate communications Control transfer Central control vs. Distributed control Generated code optimization Eliminating unnecessary memory copy Migrating live variables Library module support Communication patterns through the combinations of object indexes and produced/consumed parameter indexes. Example: all-to-all transpose in fft exmaple 2/28/2019 HPDC 2007

13 Expressing Flow of Control (Cont.)
Example: Parallel 3D FFT foreach x in planes1 (pencils[x,*]) <- planes1[x].fft1d(); end-foreach foreach y in planes2 planes2[y].fft2d(pencils[*,y]); 2/28/2019 HPDC 2007

14 Results Scalability Results 2D Jacobi 3D FFT
Overhead: jacobi 2-6%, fft: up to 5% Overheads come from synchronization and buffer copying which could be further eliminated in Charm++ 2D Jacobi (Size: on 4096 objects) 3D FFT (Size: 5123 on 256 objects) 2/28/2019 HPDC 2007

15 Results Productivity Study 2D Jacobi Wator 47% reduction for jacobi
30% reduction for wator 36 students in CS420 (Introduction to Parallel Programming) Spring 2007 Taught MPI, OpenMP, Charisma, Charm++, etc Assigned Jacobi2D (w/ Charisma) and Wator(w/ Charm++) Asked to fill out questionnaires 25 questionnaires collected 4 didn't finish, 2 with funny numbers 19 valid samples, CS undergrads, CS and non-CS grads Mean of programming experience in year: 6.16 Mean of parallel programming experience in year: 0.58 2/28/2019 2D Jacobi HPDC 2007 Wator

16 Related Work Producer-consumer model Composing from components
Fortran M Composing from components P-COM2 Visual parallel dataflow languages HeNCE and CODE VPE Fortran M shares the concepts of ports, ports are connected to create channels between processes. Charisma analyzes and generate data dependence among migratable objects, better modularity. P-com2 composes independently developed components into a parallel program. Charisma’s orchestration language provides a global view of control. HeNCE, CODE and VPE are all visual parallel languages that treat sequential subroutines as primitive components. CODE: arc = dataflow, HeNCE: arc = control, Charisma combines benefits of dataflow and control expression: use data dependence to drive progress, and insert control constructs where needed to ensure clear structure of the program. 2/28/2019 HPDC 2007

17 Future Work Use dependence to optimize Orchestration for other fields
Critical path analysis Prefetch objects (out-of-core execution) Assist communication optimizations Orchestration for other fields Stream-based applications Professor Franklin’s talk on stream processing 2/28/2019 HPDC 2007

18 Thank You Questions? More details at http://charm.cs.uiuc.edu
2/28/2019 HPDC 2007


Download ppt "Charisma: Orchestrating Migratable Parallel Objects"

Similar presentations


Ads by Google