PDES Introduction The Time Warp Mechanism

Slides:



Advertisements
Similar presentations
Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Advertisements

Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Parallel and Distributed Simulation
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Other Optimistic Mechanism, Memory Management. Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation.
Parallel and Distributed Simulation Time Warp: Other Mechanisms.
EECB 473 Data Network Architecture and Electronics Lecture 3 Packet Processing Functions.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Parallel and Distributed Simulation Time Warp: State Saving.
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
What Mum Never Told Me about Parallel Simulation K arim Djemame Informatics Research Lab. & School of Computing University of Leeds.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Discrete Event Simulation Event-Oriented Simulations By Syed S. Rizvi.
Parallel and Distributed Simulation Object-Oriented Simulation.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:
Time Warp OS1 Time Warp Operating System Presenter: Munehiro Fukuda.
Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Process Concept Process – a program.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Parallel and Distributed Simulation Process Oriented Simulation.
Winter, 2004CSS490 Synchronization1 Textbook Ch6 Instructor: Munehiro Fukuda These slides were compiled from the textbook, the reference books, and the.
Dr. Anis Koubâa CS433 Modeling and Simulation
CS433 Modeling and Simulation Lecture 09 – Part 02 Discrete Events Simulation Dr. Anis Koubâa 27 Dec 2008 Al-Imam.
Parallel and Distributed Simulation Process Oriented Simulation.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Timewarp Rigid Body Simulation Brian Mirtich. Simulation Discontinuities “Events that change the dynamic states or the equations of motion of some subset.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
1 Fault Tolerance and Recovery Mostly taken from
Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.
Introduction to Simulation What is a simulation? A system that represents or emulates the behavior of another system over time; a computer simulation is.
Database Recovery Techniques
Creating Simulations with POSE
Recovery Control (Chapter 17)
Parallel and Distributed Simulation
ADVANTAGES OF SIMULATION
Parallel and Distributed Simulation Techniques
EEC 688/788 Secure and Dependable Computing
Vemund Reggestad DASIA 20/06/2011 Paper authors:
PDES: Time Warp Mechanism Computing Global Virtual Time
Operating System Reliability
Operating System Reliability
Chapter 12: Concurrency, Deadlock and Starvation
Applied Operating System Concepts
CPSC 531: System Modeling and Simulation
Parallel and Distributed Simulation
Timewarp Elias Muche.
EECS 498 Introduction to Distributed Systems Fall 2017
Operating System Reliability
Operating System Reliability
EEC 688/788 Secure and Dependable Computing
Process Description and Control
Parallel and Distributed Simulation
EEC 688/788 Secure and Dependable Computing
Operating System Reliability
Parallel Discrete-Event Simulations
Parallel Exact Stochastic Simulation in Biochemical Systems
Operating System Reliability
Operating System Reliability
Presentation transcript:

PDES Introduction The Time Warp Mechanism Advanced Simulation PDES Introduction The Time Warp Mechanism

Optimistic Synchronization Time Warp Local Control Mechanism Rollback Event cancellation Global Control Mechanism Global Virtual Time Fossil Collection

The Synchronization Problem Golden rule Local causality constraint: Events within each logical process must be processed in time stamp order Observation: Adherence to the local causality constraint is sufficient to ensure that the parallel simulation will produce exactly the same results as the corresponding sequential simulation* Synchronization Algorithms Conservative synchronization: avoid violating the local causality constraint (wait until it’s safe) 1st generation: null messages (Chandy/Misra/Bryant) 2nd generation (later): time stamp of next event Optimistic synchronization: allow violations of local causality to occur, but detect them at runtime and recover using a rollback mechanism Time Warp (Jefferson & Sowizral) Chandy/Misra/Bryant – device a lower bound on the time stamp (LBTS) that it may later receive. Armed with this information the LP can then determine which events can be safely processed. - Only utilizing the current simulation time of each LP and lookahead information in computing LBTS values. could only guarantee that the smallest TS message produced by a LP was its current time + lookahead, this information was conveyed to other LPs in the form of NULL messages. (0 lookahead – infinite loop – in cycles of NULL messages could degrade performance). - Newer algorithms – also include information of the time stamp of the next unprocessed event – enables a SNAP shot of the entire simulation – will talk about these algorithms later (this information were first used in deadlock recover algorithms). * provided events with the same time stamp are processed in the same order as in the sequential execution

Time Warp Algorithm JFK logical process Assumptions: logical processes (LPs) exchanging time stamped events (messages) dynamic network topology, dynamic creation of LPs messages sent on each link need not be sent in time stamp order network provides reliable delivery, but need not preserve order when received Basic idea: process events w/o worrying about messages that will arrive later detect out of order execution, recover using rollback JFK logical process JFK LAX ORD 9 8 2 4 5 process all available events (2, 4, 5, 8, 9) in time stamp order

Time Warp Algorithms Many have been proposed, will cover fundamental concepts: rollback, anti-messages, Global Virtual Time (GVT). Initially assume ‘non-zero’ look-ahead Time Warp Structure: local control mechanism: implemented within each processor, mostly independent of other processors global control mechanism: used to reclaim memory and used to commit operations such as I/O that cannot be rolled back: requires a distributed computation involving all processors in the system.

Time Warp: Local Control Mechanism processed events unprocessed events 41 12 21 35 Each LP: process events in time stamp order, like a sequential simulator, except: (1) do NOT discard processed events (backs up a history) and (2) add a rollback mechanism Input Queue (event list) 18 straggler message arrives in the past, causing rollback snapshot LP state 12 19 42 anti-messages State Queue Output Queue (anti-messages) Problem: Need to account for messages received in the LP’s past. Approach: Rollback and then re-compute Sub Problem : Rollback changes to state variables performed by events Solution: checkpoint state or use incremental state saving (state queue) Sub Problem: Rollback previously sent messages Solution: Anti-messages and message annihilations (output queue)

Anti-Messages Undo message sends by ‘unsending’ a previously sent message 42 positive message anti-messages 42 Each positive (regular) message sent by an LP has a corresponding anti-message An anti-message is an identical (copy) to its positive message, except for a sign bit. Rule of cancellation: When an anti-message and its matching positive message meet in the same queue, the two annihilate each other (analogous to matter and anti-matter). Mechanism: To undo the effects of a previously sent (positive) message, the LP need only send the corresponding anti-message Message send: in addition to sending the message, leave a copy of the corresponding anti-message in a data structure in the sending LP called the output queue.

Rollback: Receiving a Straggler Message snapshot LP state anti-messages processed events unprocessed events 41 12 21 35 Input Queue (event list) State Queue Output Queue (anti-messages) 19 2. roll back events at times 35 and 21 2(a) restore state of LP to that prior to processing time stamp 21 event 18 42 2(b) send anti-message BEFORE 1. straggler message arrives in the past, causing rollback 12 41 21 35 Input Queue (event list) State Queue Output Queue (anti-messages) 19 AFTER 18 5. resume execution by processing event at time 18

Processing Incoming Anti-Messages Case I: Corresponding message has not yet been processed annihilate message/anti-message pair Case II: Corresponding message has already been processed rollback to time prior to processing message (secondary rollback) annihilate message/anti-message pair Case III: Corresponding message has not yet been received queue anti-message annihilate message/anti-message pair when message is received 2. roll back events (42 and 45) 42 55 1. anti-message arrive 2(a) restore state processed events 27 42 45 45 55 unprocessed events snapshot LP state 33 57 anti-messages 2(b) send anti-message

Processing Incoming Anti-Messages Case I: Corresponding message has not yet been processed annihilate message/anti-message pair Case II: Corresponding message has already been processed rollback to time prior to processing message (secondary rollback) annihilate message/anti-message pair Case III: Corresponding message has not yet been received queue anti-message annihilate message/anti-message pair when message is received 42 1. anti-message arrive processed events 27 45 55 unprocessed events snapshot LP state 33 57 anti-messages

LP Simulation Example Now: current simulation time InTheAir: number of aircraft landing or waiting to land OnTheGround: number of landed aircraft RunwayFree: Boolean, true if runway available Arrival Event: InTheAir := InTheAir+1; if( RunwayFree ) RunwayFree:=FALSE; Schedule Landed event(local) @ Now + R; Landed Event: InTheAir := InTheAir-1; OnTheGround := OnTheGround + 1; Schedule Departure event(local) @ Now + G; if( InTheAir > 0 ) Schedule Landed event(local) @ Now + R; else RunwayFree := True; Departure Event: (D = Delay to reach another airport) OnTheGround := OnTheGround - 1; Schedule Arrival Event (remote) @ (Now+D) @ another airport

Global Virtual Time and Fossil Collection A mechanism is needed to: reclaim memory resources (e.g., old state and events) perform irrevocable operations (e.g., I/O) Observation: A lower bound on the time stamp of any rollback that can occur in the future is needed. Global Virtual Time (GVT) is defined as the minimum time stamp of any unprocessed (or partially processed) message or anti-message in the system. GVT provides a lower bound on the time stamp of any future rollback. storage for events and state vectors older than GVT (except one state vector) can be reclaimed I/O operations with time stamp at GVT can be performed. Observation: The computation corresponding to GVT will not be rolled back, guaranteeing forward progress.

Time Warp and Chandy/Misra Performance eight processors closed queuing network, hypercube topology high priority jobs preempt service from low priority jobs (1% high priority) exponential service time (poor lookahead)

Summary Optimistic synchronization: detect and recover from synchronization errors rather than prevent them Time Warp Local control mechanism Rollback State saving Anti-messages Cascaded rollbacks Global control mechanism Global Virtual Time (GVT) Fossil collection to reclaim memory Commit irrevocable operations (e.g., I/O)