Parallel and Distributed Simulation Deadlock Detection & Recovery.

Slides:



Advertisements
Similar presentations
Distributed Deadlocks
Advertisements

Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Global States.
Last Class: Clock Synchronization
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Handling Deadlocks n definition, wait-for graphs n fundamental causes of deadlocks n resource allocation graphs and conditions for deadlock existence n.
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
6.852: Distributed Algorithms Spring, 2008 Class 12.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.
1 Causality. 2 The “happens before” relation happens before (causes)
Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
Global State Collection. Global state collection Some applications - computing network topology - termination detection - deadlock detection Chandy-Lamport.
Distributed Snapshot (continued)
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Chapter 11 Detecting Termination and Deadlocks. Motivation – Diffusing computation Started by a special process, the environment environment sends messages.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Mutex EE324 Lecture 11.
Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
1 A Mutual Exclusion Algorithm for Ad Hoc Mobile networks Presentation by Sanjeev Verma For COEN th Nov, 2003 J. E. Walter, J. L. Welch and N. Vaidya.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Termination Detection
Chapter 7 –System Model – typical assumptions underlying the study of distributed deadlock detection Only reusable resources, only exclusive access, single.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed and hierarchical deadlock detection, deadlock resolution
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Global State Collection
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Hwajung Lee. Some applications - computing network topology - termination detection - deadlock detection Chandy Lamport algorithm does a partial job.
1 Chapter 11 Global Properties (Distributed Termination)
By Purva Gawde For Advanced Operating Systems Instructor: Mikhail Nesterenko.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
PDES Introduction The Time Warp Mechanism
Termination detection
Parallel and Distributed Simulation
Global State Recording
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Parallel and Distributed Simulation Techniques
EEC 688/788 Secure and Dependable Computing
PDES: Time Warp Mechanism Computing Global Virtual Time
Distributed Mutex EE324 Lecture 11.
Lecture 9: Asynchronous Network Algorithms
ITEC452 Distributed Computing Lecture 9 Global State Collection
Global State Recording
EEC 688/788 Secure and Dependable Computing
Outline Theoretical Foundations - continued Lab 1
EEC 688/788 Secure and Dependable Computing
Global State Collection
Parallel and Distributed Simulation
Distributed Snapshot Distributed Systems.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Outline Theoretical Foundations - continued
Presentation transcript:

Parallel and Distributed Simulation Deadlock Detection & Recovery

Outline Deadlock Detection and Recovery Algorithm (Chandy and Misra) –Basic Approach –Deadlock Detection Diffusing distributed computations Dijkstra/Scholten algorithm –Deadlock Recovery

Deadlock Detection & Recovery Algorithm A (executed by each LP): Goal: Ensure events are processed in time stamp order: WHILE (simulation is not over) wait until each FIFO contains at least one message remove smallest time stamped event from its FIFO process that event END-LOOP No null messages Allow simulation to execute until deadlock occurs Provide a mechanism to detect deadlock Provide a mechanism to recover from deadlocks

Deadlock Detection Diffusing computations (Dijkstra/Scholten) Computation consists of a collection of processes that communicate by exchanging messages Receiving a message triggers computation; may result in sending/receiving more messages Processes do not spontaneously start new computations (must first receive a message) One process identified as the “controller” that is used for deadlock detection and recovery Goal: determine when all of the processes are blocked (global deadlock)

Basic Idea Initially, all processes blocked except controller Controller sends messages to one or more processes to break deadlock Computation spreads as processes send messages Construct a tree of processes that expands as the computation spreads, contracts as processes become idle –Processes in tree are said to be engaged –Processes not in tree are said to be disengaged A disengaged process becomes engaged (added to tree) when it receives a message An engage process becomes disengaged (removed from the tree) when it is a leaf node and it is idle (blocked) If the tree only includes the controller, the processes are deadlocked

Cntl Controller sends start message to 3 3 Cntl 3 added to tree, sends messages to 1, 2, , 2, and 4 added to engagement tree 1 and 3 become idle disengaged, 3 still engaged; 2 sends message to 4 2 and 4 become idle 2 2, 4 become disengaged, dropped from tree 4 3 Cntl 3 becomes disengaged, controller begins recovery Example Disengaged Engaged, busy Engaged, blocked Tree arc Message (event)

Implementation: Signaling Protocol Add signaling protocol to simulation execution When an engaged process receives a message –Immediately return a signal to sender indicating message did not spawn a new node in the tree When a disengaged process receives a message: –Receiving process becomes engaged –Do not return a signal until it becomes disengaged An engaged process becomes disengaged (and sends a signal to its parent in the tree) when –It is idle, and –It is a leaf node in the tree Process is a leaf it it has received signals for all messages it has sent

Implementation (cont.) Each process maintains two variables C = # messages received for which process hasn’t returned a signal D = # messages sent for which a signal has not yet been received When are C and D updated? Send a message: Increment D in sender, increment C in receiver Return a signal: Decrement C in sender, decrement D in receiver When is a process disengaged? A process is disengaged if C = D = 0 When is a process a leaf node of tree? A process is a leaf node if C>0, D=0 When does a process send a signal? If C=1, D=0, and the process is idle (becomes disengaged), or If it receives a message and C>0 When is deadlock detected? System deadlocked if C = D = 0 in the controller

Example Disengaged Engaged, busy Engaged, blocked Tree arc Message (event) Signal Cntl Controller sends start message to 3 C=0 D=1 C=1 D=0 C=0 D=0 C=0 D=0 C=0 D=0 3 Cntl 3 added to tree, sends messages to 1, 2, 4 C=1 D= , 2, and 4 added to engagement tree 1 and 3 become blocked sends signal to 3, disengages, 3 still engaged C=0 1 D=2 3 2 sends message to 4; 4 immediately returns a signal 2 and 4 become idle , 4 become disengaged, send signals to 3 4 C=0 D=0 3 3 becomes disengaged, sends signal to controller C=0 Cntl controller detects deadlock, begins recovery D=0

Deadlock Recovery Deadlock recovery: identify “safe” events (events that can be processed w/o violating local causality), Which events are safe? Time stamp 7: smallest time stamped event in system Time stamp 8, 9: safe because of lookahead constraint Time stamp 10: OK if events with the same time stamp can be processed in any order No time creep! deadlock state Assume minimum delay between airports is 3 98 JFK (waiting on ORD) 7 ORD (waiting on SFO) 10 SFO (waiting on JFK)

Summary Deadlock Detection –Diffusing computation: Dijkstra/Scholten algorithm –Simple signaling protocol detects deadlock –Does not detect partial (local) deadlocks Deadlock Recovery –Smallest time stamp event safe to process –Others may also be safe (requires additional work to determine this)