Creating Simulations with POSE Terry L. Wilmarth, Nilesh Choudhury, David Kunzman, Eric Bohm Parallel Programming Laboratory University of Illinois at Urbana- Champaign
Inspiration Parallel Discrete Event Simulation of fine-grained tasks is notoriously difficult to parallelize and scale Classes of applications parallelize well with conservative synchronization, but most do not scale well Optimistic Synchronization coupled with Charm++'s virtualization, load balancing, communication optimization, etc. enable scalable general-purpose PDES (fine-grained or not)
POSE 0.5a Prerequisites Parallel Discrete Event Simulation C++, Charm++ Minimal understanding of parallel programming Some understanding of optimistic synchronization mechanism for PDES
Designing a POSE Simulation Decompose system to be modeled into its concurrent entities: posers Determine how the entities interact with discrete events: event methods Determine initial placement of entities Some parallel computing savvy required to maximize achievable parallelism
Example: POSE to Charm++
POSE to Charm++ Posers: special Charm++ chares that communicate via timestamped messages executed in order Event methods: Charm++ entry methods that receive timestamped event messages Strategy: method for synchronizing posers in parallel
Optimistic Synchronization Events received by posers are queued by timestamp in poser's event queue Events executed in timestamp order according to synchronization strategy The poser's state is periodically checkpointed A straggler arrives, events are rolled back, spawned events are cancelled, and the poser's state is recovered
Optimistic Synchronization in POSE After handling straggler, forward execution proceeds as before Fossil collection: checkpoint memory is recovered when no longer needed, i.e. when checkpoint is older than... GVT, global virtual time, the minimum virtual time in the entire simulation
Using POSE POSE code translated to Charm++ Same program structure: .ci, .h, .C Same program code can be used to run sequential or parallel simulations Highly configurable
Code Sample: Poser & Event Messages, .ci file message WorkerData; message WorkMsg; poser worker : sim adapt4 chpt { entry worker(WorkerData *); // Event methods entry [event] void work(WorkMsg *); };
The .ci File Declare event message types: message WorkerMsg; Declare posers with synchronization strategy and representation type: poser worker : sim adapt4 chpt {... Declare event methods for posers which take an event message as parameter: entry [event] void work(WorkMsg *);
Code Sample: Poser Declaration & Event Methods, .h file class worker { int someIntData; public: worker(); worker(WorkerData *m); ~worker(); worker& operator=(const worker& obj); void pup(PUP::er &p); // Event methods void work(WorkMsg *m); void work_anti(WorkMsg *m); void work_commit(WorkMsg *m); };
The .h File Define posers and their state (a portion of the global state): class worker { int someIntData; ... Declare any local helper methods required Declare contructors and required methods: Basic constructor, destructor, assignment operator, pup method
The .h File, cont'd Declare event methods and corresponding anti-methods and commit methods: void work(WorkMsg *m); void work_anti(WorkMsg *m); void work_commit(WorkMsg *m); Anti-methods provide an alternative mechanism to checkpointing that allows the user to undo the state changes of an event method
The .h File, cont'd Commit methods are executed when fossil collection is about to free the memory of a check- pointed state and “commit” to the execution of an event i.e. committed event can't be rolled back Useful for statistics collection, I/O, or any other activity that should only happen once
Code Sample: Poser Constructor & Event Method Invocation, .C file worker::worker(WorkerData *m) { someIntData = m->someData; delete m; POSE_srand(myHandle); WorkMsg *wm; if (myHandle == 0) { wm = new WorkMsg; wm->someIntData = someIntData; POSE_invoke(work(wm), worker, POSE_rand()%42, POSE_rand()%10)); }
The .C File Constructor receives message, uses data, deletes message (not true for event methods!) Every poser has a handle: myHandle if (myHandle == 0) { ... worker::worker(WorkerData *m) { someIntData = m->someData; delete m; ...
The .C File, cont'd User decides what handle each poser has at construction time Any poser can invoke events on another poser as long as it knows the destination poser's handle Random number generation in POSE repeats same sequence in case of rollback and re-execution
The .C File, cont'd Event method invocation: POSE_invoke(event_method(event_msg), poser_type, dest_handle, transit_time); event_msg is timestamped with OVT + transit_time when it arrives on poser dest_handle wm = new WorkMsg; wm->someIntData = someIntData; POSE_invoke(work(wm), worker, POSE_rand()% 42, POSE_rand()%10)); Each poser has its own virtual time: OVT; posers' OVTs can be out-of-sync
Code Sample: Event Methods, .C file void worker::work(WorkMsg *m) { WorkMsg *wm; wm->someIntData = m->someIntData + someIntData; // fake computation POSE_busy_wait(1000); elapse(27); POSE_invoke(work(wm), worker, POSE_rand()%42, POSE_rand()%10); } void worker::work_anti(WorkMsg *m) { restore(this); void worker::work_commit(WorkMsg *m) { }
Passing Virtual Time We've seen how to make an event happen in the future via the transit_time parameter to POSE_invoke Elapse time on a poser: elapse(27); Increments poser's OVT by 27 Auto-elapse: a poser receives an event at time t > OVT; advance poser's OVT to t
Passing Virtual Time, cont'd What if t < OVT? If the received event is inserted in the event queue before other executed events, it causes a rollback If not, the event is handled at time OVT, not at time t (events earlier than t kept the poser busy until time OVT)
Event Methods and Event Messages When an event message arrives, it is queued on the destination poser as an event to be executed The actual message is stored in the queue along with any additional information associated with the event Because the event may be rolled back and re-executed, the message must not be deleted in the event method
Anti-methods Typically, anti-methods only restore the checkpointed state: void worker::work_anti(WorkMsg *m) { restore(this); } But they can be used instead of checkpointing to undo simple state changes: void myClass::toggleFlag_anti(eventMsg *m) { flag ? flag=0 : flag=1;
Output Printing information about progress, statistics or debugging data in PDES can be confusing in the face of rollbacks CommitPrintf(...): buffers event execution output until the event is committed CommitError(...): buffers error statements and aborts if an event results in an error that is committed
A Main Program Programs that use posers are pure Charm++ --- they are not translated In main, just before creating posers, call POSE_init() to start simulation Then inject posers into the system: WorkerData *wd; for (int i=0; i<42; i++) { wd = new WorkerData; wd->Timestamp(0); int dest = rand() % CkNumPes(); (*(CProxy_worker *) &POSE_Objects)[i].insert (wd, dest); // i is this poser's handle }
A Main Program User must timestamp constructor message and create object with Charm++ syntax Under the hood, constructs a worker in a Chare Array at index i on processor dest WorkerData *wd; for (int i=0; i<42; i++) { wd = new WorkerData; wd->Timestamp(0); int dest = rand() % CkNumPes(); (*(CProxy_worker *) &POSE_Objects)[i].insert (wd, dest); // i is this poser's handle }
POSE_init() Initialization and Simulation start-up Can optionally specify endTime, a virtual time at which to halt the simulation Can optionally specify whether or not to use inactivity detection: terminates simulation if no events are handled for some period of time void POSE_init(); void POSE_init(int ET); void POSE_init(int IDflag, int ET);
Choosing a Synchronization Strategy poser worker : sim adapt4 chpt { ... POSE offers a wide variety of synchronization strategies ranging from conservative to aggressively optimistic Each type of poser can use the strategy best suited to its behavior opt*, spec, adapt*
Choosing a Synchronization Strategy opt*: basic optimistic synchronization, throttled and unthrottled spec: throttled optimism, aggressive speculation adapt*: optimism and speculation adapt to recent behavior of poser
Getting and Using POSE Got Charm++? You've got POSE. build pose ... etrans.pl [-s] Worker Translates Worker.* to Worker_sim.* charmc ... -module pose -language charm++ charmc ... -module seqpose -language charm++
Applications VHDL Simulation: David Kunzman Big Network Simulation: Nilesh Choudhury