Parallel and Distributed Simulation Time Warp: Other Mechanisms.

Slides:



Advertisements
Similar presentations
Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Advertisements

Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Object Oriented Programming Elhanan Borenstein Lecture #12 copyrights © Elhanan Borenstein.
Other Optimistic Mechanism, Memory Management. Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation.
Parallel and Distributed Simulation Time Warp: State Saving.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
CSCI 4550/8556 Computer Networks Comer, Chapter 7: Packets, Frames, And Error Detection.
Concurrency: Deadlock and Starvation Chapter 6. Revision Describe three necessary conditions for deadlock Which condition is the result of the three necessary.
Exceptions in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
3.5 Interprocess Communication
Parallel and Distributed Simulation Object-Oriented Simulation.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Java Review 2 – Errors, Exceptions, Debugging Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CHAPTER 9: Input / Output
1 I/O Management in Representative Operating Systems.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
Time Warp OS1 Time Warp Operating System Presenter: Munehiro Fukuda.
Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.
By Dominik Seifert B Overview Data Alignment Hashtable MurMurHash Function “Stupid Parallel Hashing” Lookup The Complete Algorithm The Little.
COMP201 Computer Systems Exceptions and Interrupts.
Parallel and Distributed Simulation FDK Software.
Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
ICS 145B -- L. Bic1 Project: Main Memory Management Textbook: pages ICS 145B L. Bic.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Lecture 14 Today’s topics MARIE Architecture Registers Buses
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating System Chapter 6. Concurrency: Deadlock and Starvation Lynn Choi School of Electrical Engineering.
Parallel and Distributed Simulation Time Parallel Simulation.
Marlon Dumas University of Tartu
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
CMSC 202 Computer Science II for Majors. CMSC 202UMBC Topics Exceptions Exception handling.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
1 Lecture 19: Unix signals and Terminal management n what is a signal n signal handling u kernel u user n signal generation n signal example usage n terminal.
Timewarp Rigid Body Simulation Brian Mirtich. Simulation Discontinuities “Events that change the dynamic states or the equations of motion of some subset.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Heterogeneous Computing using openCL lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
PDES Introduction The Time Warp Mechanism
Presented by: Daniel Taylor
Parallel and Distributed Simulation
Parallel and Distributed Simulation Techniques
PDES: Time Warp Mechanism Computing Global Virtual Time
Chapter 9: Virtual-Memory Management
Timewarp Elias Muche.
Parallel and Distributed Simulation
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

Parallel and Distributed Simulation Time Warp: Other Mechanisms

Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation

Dynamic Memory Allocation Issues Roll back of memory allocation (e.g., malloc) –Memory leak –Solution: release memory if malloc rolled back Roll back of memory release (e.g., free) –Reuse memory that has already been released –Solution: Treat memory release like an I/O operation Only release memory when GVT has advanced past the simulation time when the memory was released

Error Handling What if an execution error is rolled back? –Solution: do not abort program until the error is committed (GVT advances past the simulation time when the error occurred) –Requires Time Warp executive to “catch” errors when they occur Types of error –Program detected Treat “abort” procedure like an I/O operation –Infinite loops Interrupt mechanism to receive incoming messages Poll for messages in loop –Benign errors (e.g., divide by zero) Trap mechanism to catch runtime execution errors –Destructive errors (e.g., overwrite state of Time Warp executive) Runtime checks (e.g., array bounds) Strong type checking, avoid pointer arithmetic, etc.

Event Retraction Unschedule a previously scheduled event Approach 1: Application Level Approach Schedule a retraction event with time stamp < the event being retracted Process retraction event: Set flag in LP state to indicate the event has been retracted Process event: Check if it has been retracted before processing any event

LP 2 LP 1 Retraction handled within the application Example: Application Approach E1E1 E schedule original event E invoke retract primitive process R, set flag begin to process E, notice flag is set, ignore event R schedule retract event R

Event Retraction (cont.) Approach 2: Implement in Time Warp executive Retraction: send anti-message to cancel the retracted event –Retraction: invoked by application program –Cancellation: invoked by Time Warp executive (transparent to the application) Rollback retraction request –Reschedule the original event –Retraction: place positive copy of message being retracted in output queue –Rollback: Send messages in output queue (same as before)

LP 2 LP 1 Retraction handled within Time Warp executive Example: Kernel Approach E1E1 E+ schedule original event E invoke retract primitive annihilate E E- send anti-message for E E+ leave E+ in output queue roll back LP 1 E+ reschedule E

Lazy Cancellation Motivation: re-execution after rollback may generate the same messages as the original execution in this case, need not cancel original message Mechanism: rollback: do not immediately send anti-messages after rollback, recompute forward only send anti-message if recomputation does NOT produce the same message again

LP 2 LP 1 Lazy cancellation avoids unnecessary rollback Example: Lazy Cancellation E1+E1+ anti-message in output queue E1-E1- E2+E2+ E2-E2- roll back don’t send anti-messages E2-E2- send anti-message Annihilate E 2 + and E 2 - E1+E1+ execute forward E 1 resent don’t send anti-message execute forward E 2 not resent

Lazy Cancellation: Evaluation Benefit: avoid unnecessary message cancellations Liabilities: extra overhead (message comparisons) delay in canceling wrong computations more memory required Conventional wisdom Lazy cancellation typically improves performance Empirical data indicate 10% improvement typical

Lazy Re-evaluation Motivation: re-execution of event after rollback may be produce same result (LP state) as the original execution in this case, original rollback was unnecessary Mechanism: rollback: do not discard state vectors of rolled back computations process straggler event, recompute forward during recomputation, if the state vector and input queue match that of the original execution, immediately “jump forward” to state prior to rollback.

Lazy Re-evaluation Benefit: avoid unnecessary recomputation on rollback works well if straggler does not affect LP state (query events) Liabilities: extra overhead (state comparisons) more memory required Conventional wisdom Typically does not improve overall performance Useful in certain special cases (e.g., query events)

Summary Other Mechanisms –Simple operations in conservative systems (dynamic memory allocation, error handling) present non-trivial issues in Time Warp systems –Solutions exist for most, but at the cost of increased complexity in the Time Warp executive Event retraction –Not to be confused with cancellation –Application & kernel level solutions exist Optimizations –Lazy cancellation often provides some benefit –Conventional wisdom is lazy re-evaluation costs outweigh the benefits