What Mum Never Told Me about Parallel Simulation K arim Djemame Informatics Research Lab. & School of Computing University of Leeds.

Slides:



Advertisements
Similar presentations
Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Global States.
Misbah Mubarak, Christopher D. Carothers
Lecture 8: Asynchronous Network Algorithms
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Parallel and Distributed Simulation
Parallel Simulation. Past, Present and Future C.D. Pham Laboratoire RESAM Universit ₫ Claude Bernard Lyon 1
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Optimistic Parallel Discrete Event Simulation Based on Multi-core Platform and its Performance Analysis Nianle Su, Hongtao Hou, Feng Yang, Qun Li and Weiping.
Other Optimistic Mechanism, Memory Management. Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation.
Parallel and Distributed Simulation Time Warp: Other Mechanisms.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Parallel and Distributed Simulation Time Warp: State Saving.
1 A Practical Efficiency Criterion For The Null Message Algorithm András Varga 1, Y. Ahmet Şekerciuğlu 2, Gregory K. Egan 2 1 Omnest Global, Inc. 2 CTIE,
1 Complexity of Network Synchronization Raeda Naamnieh.
PTIDES: Programming Temporally Integrated Distributed Embedded Systems Yang Zhao, EECS, UC Berkeley Edward A. Lee, EECS, UC Berkeley Jie Liu, Microsoft.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
1 IMPROVING RESPONSIVENESS BY LOCALITY IN DISTRIBUTED VIRTUAL ENVIRONMENTS Luca Genovali, Laura Ricci, Fabrizio Baiardi Lucca Institute for Advanced Studies.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
RESAM Laboratory Univ. Lyon 1, France lead by Prof. B. Tourancheau Laurent Lefèvre CongDuc Pham Pascale Primet PhD. student Patrick Geoffray Roland Westrelin.
Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
Parallel and Distributed Simulation Introduction and Motivation By Syed S. Rizvi.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Time Warp OS1 Time Warp Operating System Presenter: Munehiro Fukuda.
Parallel and Distributed Simulation FDK Software.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
1 A Mutual Exclusion Algorithm for Ad Hoc Mobile networks Presentation by Sanjeev Verma For COEN th Nov, 2003 J. E. Walter, J. L. Welch and N. Vaidya.
Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel and Distributed Simulation Introduction and Motivation.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
High Level Architecture Time Management. Time management is a difficult subject There is no real time management in DIS (usually); things happen as packets.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 31 Memory Management.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Simulation Examples And General Principles Part 2
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
Distributed Systems Lecture 6 Global states and snapshots 1.
PDES Introduction The Time Warp Mechanism
Parallel and Distributed Simulation
Parallel and Distributed Simulation Techniques
PDES: Time Warp Mechanism Computing Global Virtual Time
Computer Simulation of Networks
CPSC 531: System Modeling and Simulation
COT 5611 Operating Systems Design Principles Spring 2012
Parallel and Distributed Simulation
CS703 – Advanced Operating Systems
COT 5611 Operating Systems Design Principles Spring 2014
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

What Mum Never Told Me about Parallel Simulation K arim Djemame Informatics Research Lab. & School of Computing University of Leeds

Plan of the Lecture Goals Learn about issues in the design and execution of Parallel Discrete Event Simulation (PADS) Overview Discrete Event Simulation – a Review Parallel Simulation – a Definition Applications Synchonisation Algorithms Conservative Optimistic Synchronous Parallel Simulation Languages Performance Issues Conclusion

Why Simulation? r Mathematical models too abstract for complex systems r Building real systems with multiple configurations too expensive r Simulation is a good compromise!

Discrete Event Simulation (DES) a DES system can be viewed as a collection of simulated objects and a sequence of event computations Changes in state of the model occur at discrete points in time The passage of time is modelled using a simulation clock Event scheduling is the most well used provides locality in time: each event describes related actions that may all occur in a single instant The model maintains a list of events (Event List) that have been scheduled have not occurred yet

Processing the Event List on a Uni-processor Computer An event contains two fields of information - the event it represents (eg. arrival in a queue) - time of occurrence: time when the event should happen - also timestamp e1e1 e2e2 enen EVL timeevent The event list - contains the events - is always ordered by increasing occurrence of time The events are processed sequentially by a single processor

Event-Driven Simulation Engine e1e1 e2e2 enen EVL Remove 1 st event (lowest time of occurrence) from EVL Execute corresponding event routine; modify state (S) accordingly Based on new S, schedule new future events e1e1 e2e2 enen EVL e3e3 14 e2e2 e3e3 enen EVL (1) (2) (3)

Why change? It ’s so simple! r Models becomes larger and larger r The simulation time is overwhelming or the simulation is just untractable r Example: m parallel programs with millions of lines of codes, m mobile networks with millions of mobile hosts, m Networks with hundreds of complex switches, routers m multicast model with thousands of sources, m ever-growing Internet, m and much more...

Some Figures to Convince... r ATM network models m Simulation at the cell-level, m 200 switches m 1000 traffic sources, 50Mbits/s m 155Mbits/s links, m 1 simulation event per cell arrival. m simulation time increases as link speed increases, m usually more than 1 event per cell arrival, m how scalable is traditional simulation? More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us

Motivation for Parallel Simulation r Sequential simulation very slow r Sequential simulation does not exploit the parallelism inherent in models So why not use multiple processors ? Variety of parallel simulation protocols Availability of parallel simulation tools to achieve a certain speedup over the sequential simulator

Processing the Event List on a Multi- Processor Computer The events are processed by many processors. Example: Processor 1 generates event 3 at 9 to be processed by processor 2 Processors Time p1p Event 1 Event 2 In parallel Event 3 Processor 2 has already processed event 2 at 14 Problem: - the future can affect the past ! - this is the causality problem

Causal Dependencies e1, 7 e2, 9 e3, 14 e4, 20 e5, 27 e6, 40 e1, 7 e2, 9 e3, 14 e4, 20 e5, 27 e6, 40 EVL Scheduled events in timestamp order Sequence ordered by causal dependencies Causal dependencies mean restrictions The sequence of events (e1, e2, e4, e6) can be executed in parallel with (e3, e5) If any event were simulated with e1: violation of causal dependencies

Parallel Simulation - Principles r Execution of a discrete event simulation on a parallel or distributed system with several physical processors r The simulation model is decomposed into several sub-models (Logical Processes, LP) that can be executed in parallel m spatial partitioning m LPs communicate by sending timestamped messages r Fundamental concepts m each LP can be at a different simulation time m local causality constraint: events in each LP must be executed in time stamp order

Parallel Simulation – example 1 logical process (LP) packetheventt parallel

Parallel Simulation – example 2 r Logical processes (LPs) modelling airports, air traffic sectors, aircraft, etc. r LPs interact by exchanging messages (events modelling aircraft departures, landings, etc.) LP LP LP LP LP

Synchronisation Mechanisms r Synchronisation Algorithms m Conservative: avoids local causality violations by waiting until it ’s safe to proceed a message or event m Optimistic: allows local causality violations but provisions are done to recover from them at runtime m Synchronous: all LPs process messages/events with the same timestamp in parallel

PDES Applications r VLSI circuit simulation r Parallel computing r Communication networks r Combat scenarios r Health care systems r Road traffic r Simulation of models m Queueing networks m Petri nets m Finite state machines

Conservative Protocols Architecture of a conservative LP The Chandy-Misra-Bryant protocol The lookahead ability

Architecture of a Conservative LP m LPs communicate by sending non-decreasing timestamped messages m each LP keeps a static FIFO channel for each LP with incoming communication m each FIFO channel (input channel, IC) has a clock c i that ticks according to the timestamp of the topmost message, if any, otherwise it keeps the timestamp of the last message LP B LP A LP C LP D c 1 =t B 1 tB1tB1 tB2tB2 tC3tC3 tC4tC4 tC5tC5 tD4tD4 c 2 =t C 3 c 3 =t D 3

A Simple Conservative Algorithm r each LP has to process event in time-stamp order to avoid local causality violations The Chandy-Misra-Bryant algorithm while (simulation is not over) { determine the IC i with the smallest C i if (IC i empty) wait for a message else { remove topmost event from IC i process event } }

Safe but Has to Block LP B LP A LP C LP D IC 1 IC 2 IC 3 min IC event BLOCK

Blocks and Even Deadlocks! S A B M merge point BLOCKED S sends all messages to B

How to Solve Deadlock: Null-Messages S A B M Use of null-messages for artificial propagation of simulation time UNBLOCKED What frequency?

How to Solve Deadlock: Null-Messages a null-message indicates a Lower Bound Time Stamp minimum delay between links is 4 LP C initially at simulation time ABC 4 LP C sends a null-message with time stamp 4 LP A sends a null-message with time stamp 8 8 LP B sends a null-message with time stamp LP C can process event with time stamp 7 12

The Lookahead Ability r Null-messages are sent by an LP to indicate a lower bound time stamp on the future messages that will be sent r null-messages rely on the « lookahead » ability m communication link delays m server processing time (FIFO) r lookahead is very application model dependent and need to be explicitly identified

Conservative: Pros & Cons r Pros m simple, easy to implement m good performance when lookahead is large (communication networks, FIFO queue) r Cons m pessimistic in many cases m large lookahead is essential for performance m no transparent exploitation of parallelism m performances may drop even with small changes in the model (adding preemption, adding one small lookahead link…)

Optimistic Protocols Architecture of an optimistic LP Time Warp

Architecture of an Optimistic LP m LPs send timestamped messages, not necessarily in non- decreasing time stamp order m no static communication channels between LPs, dynamic creation of LPs is easy m each LP processes events as they are received, no need to wait for safe events m local causality violations are detected and corrected at runtime m Most well known optimistic mechanism: Time Warp LP B LP A LP C LP D tB1tB1 tB2tB2 tC3tC3 tC4tC4 tC5tC5 tD4tD4

Processing Events as They Arrive 11 LP B 13 LP D 18 LP B 22 LP C 25 LP D 28 LP C 36 LP B 32 LP D LP B LP A LP C LP D LP A processed! what to do with late messages?

TimeWarp Do, Undo, Redo

TimeWarp Rollback - How? r Late messages (stragglers) are handled with a rollback mechanism m undo false/uncorrect local computations, state saving: save the state variables of an LP reverse computation m undo false/uncorrect remote computations, anti-messages: anti-messages and (real) messages annihilate each other m process late messages m re-process previous messages: processed events are NOT discarded!

Need for a Global Virtual Time r Motivations m an indicator that the simulation time advances m reclaim memory (fossil collection) r Basically, GVT is the minimum of m all LPs ’ logical simulation time m timestamp of messages in transit r GVT garantees that m events below GVT are definitive events m no rollback can occur before the GVT m state points before GVT can be reclaimed m anti-messages before GVT can be reclaimed

Time Warp - Overheads r Periodic state savings m states may be large, very large! m copies are very costly r Periodic GVT computations m costly in a distributed architecture, m may block computations, r Rollback thrashing m cascaded rollback, no advancement! r Memory! m memory is THE limitation

Optimistic Mechanisms: Pros & Cons r Pros m exploits all the parallelism in the model, lookahead is less important m transparent to the end-user m can be general-purpose r Cons m very complex, needs lots of memory m large overheads (state saving, GVT, rollbacks…)

Mixed/Adaptive Approaches r General framework that (automatically) switches to conservative or optimistic r Adaptive approaches may determine at runtime the amount of conservatism or optimism conservativeoptimistic mixed messages performance optimistic conservative

Synchronous Protocols m Architecture of a synchronous LP

Synchronous Protocols TOUS pour UN et UN pour TOUS! The Three Musketeers Alexandre Dumas (1802 – 1870)

A Simple Synchronous Algorithm r avoids local causality violations r LP: same data structures of a single sequential simulator r Global clock shared among all LPS – same value r Some data structures are private LP B LP A LP C My min timestamp is 5 My min timestamp is 12 My min timestamp is 10 My min timestamp is 8 Global clock = 5

A Simple Synchronous Algorithm Clock = 0; while (simulation is not over) { t = minimum_timestamp(); clock = global_minimum(); simulate_events(clock); synchronise(); } Basic operations 1. Computation of Minimum timestamp – reduction operation 2. Event Consumption 3. Message distribution 4. Message Reception – barrier operation

Synchronous Mechanisms: Pros & Cons r Pros m simple, easy to implement m good performance if parallelism exploited with a moderate synchonisation cost r Cons m pessimistic in many cases m Worst case: simulator behaves like the sequential one m performance may drop if cost of LPs synchronisation (reduction, barrier) is high

 PDES Simulation Languages a number of PDES languages have been developed in recent years PARSEC Compose ModSim etc Most of these languages are general purpose languages  PARSEC Developed at UCLA Parallel Computing Lab. Availability - Simplicity Efficient event scheduling mechanism. PDES Languages

Optimistic discrete event simulator developed by PADS group of Georgia Institute of Technology Support small granularity simulation GTW runs on shared-memory multiprocessor machines Sun Enterprise, SGI Origin TeD: Telecommunications Description Language language that has been developed mainly for modeling telecommunicating network elements and protocols Jane: simulator-independent Client/Server-based graphical interface and scripting tool for interactive parallel simulations TeD/GTW simulations can be executed using the Jane system Georgia Tech Time Warp (GTW)

BYOwS ! BYOwS : Build Your Own Simulator Choose a programming language C, C++, Java Learn basic MPI MPI: Message Passing Interface Point-to-Point Communication Available on the school Linux machines Implement a simple PDES protocol Case study: a simple queueing network

Parallel Simulation Today r Lots of algorithms have been proposed m variations on conservative and optimistic m adaptives approaches r Few end-users m Compete with sequential simulators in terms of user interface, generability, ease of use etc. r Research mainly focus on m applications, ultra-large scale simulations m tools and execution environments (clusters) m Federated simulations different simulators interoperate with each other in executing a single simulation –battle field simulation, distributed multi-user games

Parallel Simulation - Conclusion r Pros m reduction of the simulation time m increase of the model size r Cons m causality constraints are difficult to maintain m need of special mechanisms to synchronize the different processors m increase both the model and the simulation kernel complexity r Challenges m ease of use, transparency.

References r Parallel simulation m R. Fujimoto, Parallel and Distributed Simulation Systems, John Wiley & Sons, 2000 m R. Fujimoto, Parallel Discrete Event Simulation, Communications of the ACM, Vol. 33(10), Oct. 90, pp31-53 m Parallel Simulation – Links