Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct. 1990.

Slides:

Advertisements

Similar presentations

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Advertisements

1 Deadlock Solutions: Avoidance, Detection, and Recovery CS 241 March 30, 2012 University of Illinois.

Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Parallel and Distributed Simulation Global Virtual Time - Part 2.

Time Warp: Global Control Distributed Snapshots and Fossil Collection.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Parallel and Distributed Simulation Time Warp: Basic Algorithm.

Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.

Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.

Other Optimistic Mechanism, Memory Management. Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation.

Parallel and Distributed Simulation Time Warp: Other Mechanisms.

What we will cover…  Distributed Coordination 1-1.

Ordering and Consistent Cuts Presented By Biswanath Panda.

Parallel Simulation etc Roger Curry Presentation on Load Balancing.

1 Lecture 8: Deadlocks Operating System Spring 2008.

20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.

Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.

©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.

Deadlocks Gordon College Stephen Brinton. Deadlock Overview The Deadlock Problem System Model Deadlock Characterization Methods for Handling Deadlocks.

1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 7: Deadlock Dr. Mohamed Hefeeda.

Distributed process management: Distributed deadlock

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

Time Warp OS1 Time Warp Operating System Presenter: Munehiro Fukuda.

1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.

Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.

Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.

System Model Deadlock Characterization Methods for Handling Deadlocks Deadlock Prevention, Avoidance, and Detection Recovering from Deadlock Combined Approach.

Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.

Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.

DISTRIBUTED ALGORITHMS By Nancy.A.Lynch Chapter 18 LOGICAL TIME By Sudha Elavarti.

SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.

Event Ordering Greg Bilodeau CS 5204 November 3, 2009.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.

CS6502 Operating Systems - Dr. J. Garrido Deadlock – Part 2 (Lecture 7a) CS5002 Operating Systems Dr. Jose M. Garrido.

Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock.

Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Static Process Scheduling

CS333 Intro to Operating Systems Jonathan Walpole.

Styresystemer og Multiprogrammering Block 3, 2005 Deadlocks Robert Glück.

Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.

Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.

Parallel and Distributed Simulation Deadlock Detection & Recovery.

1 Fault Tolerance and Recovery Mostly taken from

PDES Introduction The Time Warp Mechanism

Auburn University

OPERATING SYSTEMS CS 3502 Fall 2017

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.

Synchronization: Distributed Deadlock Detection

Parallel and Distributed Simulation

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.

Parallel and Distributed Simulation Techniques

EEC 688/788 Secure and Dependable Computing

Vemund Reggestad DASIA 20/06/2011 Paper authors:

PDES: Time Warp Mechanism Computing Global Virtual Time

CPSC 531: System Modeling and Simulation

Timewarp Elias Muche.

Objective of This Course

EEC 688/788 Secure and Dependable Computing

Chapter 15 : Concurrency Control

Parallel and Distributed Simulation

EEC 688/788 Secure and Dependable Computing

Parallel Exact Stochastic Simulation in Biochemical Systems

Presentation transcript:

Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct. 1990

Introduction Execution of a single discrete event simulation program on a parallel computer to facilitate quick execution of large simulation programs Problems usually have substantial amount of “parallelism”. System being simulated has states that change only at discrete instants of time, upon the occurrence of an “event”, for e.g. arrival of a message at some node in the network. Concerns itself primarily with simulation of asynchronous systems where events are not synchronized by a global clock, i.e they are truly distributed in nature.

Approaches Use dedicated functional units to implement specific sequential functions, a la vector processing Use hierarchical decomposition of the simulation model to allow an event consisting of several sub-events to be processed concurrently. To execute independent, sequential simulation programs on different processors, which leads to replication, which is useful if the simulation is largely stochastic. Useful only when done to reduce variance, or a specific simulation problem with different input parameters. Needs an each processor to have sufficient memory to hold an entire simulation run, potentially useless where one sequential run depends on the output of another.

PDES Inherently difficult because of the typical way in which simulation is done utilizing state variables, event list and a global clock variable. In a sequential model, the simulator runs in a loop removing the smallest time-stamped event from the event list and processes it. Processing an event means effecting a change in the system state, and scheduling zero or more new events in the simulated future in order to maintain causality relationships. The challenging aspect of PDES is to maintain this causality relationship while exploiting inherent parallelism to schedule the jobs faster. “Maintaining” causality relationship means maintaining some sequencing order between events executing in two separate processes.

Strategies Model the system as a collection of logical processes with no direct access to shared state variables. All interactions between processes are modeled as time stamped event messages between LPs. Causality errors can be avoided if each LP obeys its local causality constraints and interacts exclusively by exchanging time stamped messages “Cause and effect” relationship between events must be maintained.

Mechanisms for PDES Conservative Approach Avoid the possibility of any type of causality error ever occurring by determining when it is safe to process an event. Uses pessimistic estimates for decision making Optimistic Approach Use a detection and recovery approach. Allow for causality errors, then invoke “rollback” to recover.

Conservative Approach If a process P contains an unprocessed event E 1 with time stamp T 1 such that T 1 is the smallest timestamp it has, then it must ensure that it is impossible for it to receive another event with a lower time stamp before executing E 1. Algorithm –Statically specify links that indicate which process communicates with one another –Each process ensures that the sequence of time stamps sent over the links are increasing –Each link has a clock associated with it that is equal to the timestamp of the message at the head of the queue or the timestamp of the last received message if the queue is empty.

Deadlocks in Conservative Approach Occurs when a system of empty queues exists. Need to send messages, called “null” messages periodically, which are an assurance from each LP that the next message sent on that LP will have a timestamp greater than the null message timestamp. A variation would be to request for null messages when all input queues to a process becomes empty. Eliminate null messages by allowing deadlocks to occur and then breaking them by allowing the smallest time stamped event in the global state to proceed.

Improvements Maintaining a simulated time window, which basically determines the number of events to be looked at for possible parallelism. Lookahead: Ability to predict with certainty the outcome of a future event. Conditional knowledge: Predicates are associated with events, which when satisfied imply that the event occurred. Goal is to make these events definite.

Performance/Shortcomings Degree of look-ahead greatly determines performance benefits. “Avalanche effect” where efficiency is poor for small message population, but increases dramatically with input size. Modestly affected by the amount of computation for each event. DRAWBACKS Does not schedule aggressively. Even if E A might affect E B, it would execute these sequentially. Unsuitable in the context of preemptive processes. Requires static configuration between processes Requires the programmer to have an intricate understanding of the system.

Optimistic Mechanisms Principle: Detect and recover from causality errors Greedy execution Time Warp –A causality error is detected whenever an event message is received by a process that contains a time stamp smaller than the process’s clock. Straggler –The event causing the roll-back is called straggler, the state is restored to the last acceptable event whose time stamp is lesser than the straggler’s timestamp. –Rollback is achieved easily because the states are stored periodically in a state vector. –Anti-message is sent out to all processes to allow them to rollback too, if they are affected by the straggler.

Further Optimizations Lazy Cancellation –Processes do not immediately send out anti-messages. They wait to see if the new computation regenerates the same results. If yes, no anti-messages are sent. Lazy Reevaluation –In this scheme, the start and the end of the rolled-back computation are reevaluated. If no intermediate messages have been sent out by the process, then the process jumps directly to the new state. Optimistic Time Windows –Same idea as the sliding windows, does not offer much performance improvement Wolf Calls –Call sent out by a process as soon as straggler is received to prevent the spread of erroneous computation

Further Optimizations... Direct Cancellation –Maintain links between events if they share a causal relationship. Allows easy and faster cancellation Space-time simulation –Views Simulation as a two dimensional space time graph, where one dimension enumerates all the state variables and the other dimension is time. The graph is partitioned into disjoint regions of state variables and one process is assigned to each region.

Hybrid Approaches Filtered Rollback –Uses the concept of a “minimum distance” between events to decide which events are safe to perform. A distance of zero leads to conservative approach and a distance of infinity to the optimistic approach. Causal errors are allowed to occur within this distance and rollback is used to correct them SRADS protocol –In the conservative approach if the process has no safe events, it simply blocks. Here it optimistically processes other events, however does not transmit the result of these events to other processes. So any rollback is local.

Performance/Shortcomings Speed-ups as high as 37 using 100 processor BBN configuration. Improvement by including direct cancellation resulted in a speedup of approximately 57 in a 64 node network. Time warp achieves speed-up proportional to the amount of parallelism available in the workload Roll-back costs have been shown to very minimal in a variety of studies, in fact they can be neglected for large workloads. DRAWBACKS –Theoretically possible to have thrashing, where all work done is in rollbacks –Takes a large amount of memory –Must be able to recover from arbitrary errors, infinite loops –Much more complex

Conclusion Optimistic methods such as Time Warp are the best way to simulate large simulation problems, while conservative methods offer good potential for certain class of problems Simulation is fun. Parallel Discrete Event Simulation is even more so!