UNIT-II Distributed Synchronization

UNIT-II Distributed Synchronization

Mutual exclusion Mutual exclusion : makes sure that concurrent process access shared resources or data in a serialized way. If a process , say Pi , is executing in its critical section, then no other processes can be executing in their critical sections Example: updating a DB Directory management sending control signals to an IO device

Mutual Exclusion Algorithms
Non-token based: A site/process can enter a critical section when an assertion (condition) becomes true. Algorithm should ensure that the assertion will be true in only one site/process. Token based: A unique token (a known, unique message) is shared among cooperating sites/processes. Possessor of the token has access to critical section. Need to take care of conditions such as loss of token, crash of token holder, possibility of multiple tokens, etc.

General System Model At any instant, a site may have several requests for critical section (CS), queued up, and serviced one at a time. Site States: Requesting CS, executing CS, idle (neither requesting nor executing CS). Requesting CS: blocked until granted access, cannot make additional requests for CS. Executing CS: using the CS. Idle: In token-based approaches, idle site can have the token.

Mutual Exclusion: Requirements
Freedom from deadlocks: two or more sites should not endlessly wait on conditions/messages that never become true/arrive. Freedom from starvation: No indefinite waiting. Fairness: Order of execution of CS follows the order of the requests for CS. (equal priority). Fault tolerance: recognize “faults”, reorganize, continue. (e.g., loss of token).

Performance Number of messages per CS invocation: should be minimized.
Synchronization delay, i.e., time between the leaving of CS by a site and the entry of CS by the next one: should be minimized. Response time: time interval between request messages transmissions and exit of CS. System throughput, i.e., rate at which system executes requests for CS: should be maximized. If sd is synchronization delay, E the average CS execution time: system throughput = 1 / (sd + E).

Performance metrics Next site enters CS Last site exits CS Time
Synchronization delay Messages sent Enter CS CS Request arrives Exit CS Time E Response Time

Performance ... Low and High Load: Best and Worst Case:
Low load: No more than one request at a given point in time. High load: Always a pending mutual exclusion request at a site. Best and Worst Case: Best Case (low loads): Round-trip message delay + Execution time. 2T + E. Worst case (high loads). Message traffic: low at low loads, high at high loads. Average performance: when load conditions fluctuate widely.

Simple Solution Control site: grants permission for CS execution.
A site sends REQUEST message to control site. Controller grants access one by one. Synchronization delay: 2T -> A site release CS by sending message to controller and controller sends permission to another site. System throughput: 1/(2T + E). If synchronization delay is reduced to T, throughput doubles. Controller becomes a bottleneck, congestion can occur.

Non-token Based Algorithms
Notations: Si: Site I Ri: Request set, containing the ids of all Sis from which permission must be received before accessing CS. Non-token based approaches use time stamps to order requests for CS. Smaller time stamps get priority over larger ones. Lamport’s Algorithm Ri = {S1, S2, …, Sn}, i.e., all sites. Request queue: maintained at each Si. Ordered by time stamps. Assumption: message delivered in FIFO.

Lamport’s Algorithm Requesting CS: Executing CS: Releasing CS:
Send REQUEST(tsi, i). (tsi,i): Request time stamp. Place REQUEST in request_queuei. On receiving the message; sj sends time-stamped REPLY message to si. Si’s request placed in request_queuej. Executing CS: Si has received a message with time stamp larger than (tsi,i) from all other sites. Si’s request is the top most one in request_queuei. Releasing CS: Exiting CS: send a time stamped RELEASE message to all sites in its request set. Receiving RELEASE message: Sj removes Si’s request from its queue.

Lamport’s Algorithm… Performance. 3(N-1) messages per CS invocation.
(N - 1) REQUEST, (N - 1) REPLY, (N - 1) RELEASE messages. Synchronization delay: T

Lamport’s Algorithm: Example-1
Step 1: (2,1) S1 S2 (1,2) S3 Step 2: S1 (1,2) (2,1) S2 enters CS S2 (1,2) (2,1) S3 (1,2) (2,1)

Lamport’s: Example… Step 3: S1 S2 leaves CS S2 S3 Step 4: (2,1) S1
(1,2) (2,1) S2 leaves CS S2 (1,2) (2,1) S3 (1,2) (2,1) Step 4: S1 (1,2) (2,1) (2,1) S1 enters CS S2 (2,1) (1,2) (2,1) S3 (1,2) (2,1) (2,1)

Example-2

Ricart-Agrawala Algorithm
Requesting critical section Si sends time stamped REQUEST message Sj sends REPLY to Si, if Sj is not requesting nor executing CS If Sj is requesting CS and Si’s time stamp is smaller than its own request. Request is deferred(postponed) otherwise. Executing CS: after it has received REPLY from all sites in its request set. Releasing CS: Send REPLY to all deferred requests. i.e., a site’s REPLY messages are blocked only by sites with smaller time stamps

Ricart-Agrawala: Performance
2(N-1) messages per CS execution. (N-1) REQUEST + (N-1) REPLY. Synchronization delay: T.

Ricart-Agrawala: Example
Step 1: S1 (2,1) S2 (1,2) S3 Step 2: S1 S2 enters CS S2 (2,1) S3

Ricart-Agrawala: Example…
Step 3: S1 S1 enters CS S2 (2,1) S2 leaves CS S3

Maekawa’s Algorithm A site requests permission only from a subset of sites. Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj. A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message. Request Sets Rules: Sets Ri and Rj have atleast one common site. Si is always in Ri. Cardinality of Ri, i.e., the number of sites in Ri is K. Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N.

Maekawa’s Algorithm ... Requesting CS
Si sends REQUEST(i) to sites in Ri. Sj sends REPLY to Si if Sj has NOT sent a REPLY message to any site after it received the last RELEASE message. Otherwise, queue up Si’s request. Executing CS: after getting REPLY from all sites in Ri. Releasing CS send RELEASE(i) to all sites in Ri Any Sj after receiving RELEASE message, send REPLY message to the next request in queue. If queue empty, update status indicating receipt of RELEASE.

Maekawa’s Algorithm ... Performance Deadlocks
Synchronization delay: 2T Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages) Deadlocks Message deliveries are not ordered. Assume Si, Sj, Sk concurrently request CS Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski} Possible that: Sij is locked by Si (forcing Sj to wait at Sij) Sjk by Sj (forcing Sk to wait at Sjk) Ski by Sk (forcing Si to wait at Ski) -> deadlocks among Si, Sj, and Sk.

Token-based Algorithms
Unique token circulates among the participating sites. A site can enter CS if it has the token. Token-based approaches use sequence numbers instead of time stamps. Request for a token contains a sequence number. Sequence number of sites advance independently. Correctness issue is trivial since only one token is present -> only one site can enter CS. Deadlock and starvation issues to be addressed.

Suzuki-Kasami Algorithm
If a site without a token needs to enter a CS, broadcast a REQUEST for token message to all other sites. Token: (a) Queue of request sites (b) Array LN[1..N], the sequence number of the most recent execution by a site j. Token holder sends token to requestor, if it is not inside CS. Otherwise, sends after exiting CS. Token holder can make multiple CS accesses. Design issues: Distinguishing outdated REQUEST messages. Format: REQUEST(j,n) -> jth site making nth request. Each site has RNi[1..N] -> RNi[j] is the largest sequence number of request from j. Determining which site has an outstanding token request. If LN[j] = RNi[j] - 1, then Sj has an outstanding request.

Suzuki-Kasami Algorithm ...
Passing the token After finishing CS (assuming Si has token), LN[i] := RNi[i] Token consists of Q and LN. Q is a queue of requesting sites. Token holder checks if RNi[j] = LN[j] + 1. If so, place j in Q. Send token to the site at head of Q. Performance 0 to N messages per CS invocation. Synchronization delay is 0 (if the token holder repeats CS) or T.

Example 1 2 4 3 initial state req=[1,0,0,0,0] req=[1,0,0,0,0]
last=[0,0,0,0,0] 1 2 req=[1,0,0,0,0] 4 req=[1,0,0,0,0] 3 req=[1,0,0,0,0] initial state

Example 1 2 4 3 1 & 2 send requests req=[1,1,1,0,0] req=[1,1,1,0,0]
last=[0,0,0,0,0] 1 2 req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0] 1 & 2 send requests

Example 1 2 4 3 0 prepares to exit CS req=[1,1,1,0,0] req=[1,1,1,0,0]
last=[1,0,0,0,0] Q=(1,2) 1 2 req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0] 0 prepares to exit CS

Example 1 2 4 3 0 passes token to 1 req=[1,1,1,0,0] last=[1,0,0,0,0]
2 req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0] 0 passes token to 1

Example 1 2 4 3 0 and 3 send requests req=[2,1,1,1,0] last=[1,0,0,0,0]
2 req=[2,1,1,1,0] 4 req=[2,1,1,1,0] 3 req=[2,1,1,1,0] 0 and 3 send requests

Example 1 2 4 3 1 sends token to 2 req=[2,1,1,1,0] req=[2,1,1,1,0]
2 req=[2,1,1,1,0] last=[1,1,0,0,0] Q=(0,3) 4 req=[2,1,1,1,0] 3 req=[2,1,1,1,0] 1 sends token to 2

Raymond’s Algorithm Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root. Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Root’s holder point to itself). Requesting CS If Si does not hold token and request CS, sends REQUEST upwards provided its request_q is empty. It then adds its request to request_q. Non-empty request_q -> REQUEST message for top entry in q (if not done before). Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q. Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site. Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site.

Raymond’s Algorithm … Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS. Releasing CS If request_q is non-empty, delete top entry from q, send token to that site, set holder to that site. If request_q is non-empty now, send REQUEST message to the holder site. Performance Average messages: O(log N) as average distance between 2 nodes in the tree is O(log N). Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2. Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation.

Raymond’s Algorithm: Example
Step 1: Token holder S1 Token request S2 S3 S4 S5 S6 S7 Step 2: S1 S2 S3 Token S4 S5 S6 S7

Raymond’s Algm.: Example…
Step 3: S1 S2 S3 S4 S5 S6 S7 Token holder

Example- 1 4 1 4,7 1,4 1,4,7 want to enter their CS

Raymond’s Algorithm 1 4 1,4 4,7 2 sends the token to 6

Raymond’s Algorithm 4 4 4 4,7 6 forwards the token to 1

Singhal’s Heuristic Algorithm
Instead of broadcast: each site maintains information on other sites, guess the sites likely to have the token. Data Structures: Si maintains SVi[1..N] and SNi[1..N] for storing information on other sites: state and highest sequence number. Token contains 2 arrays: TSV[1..N] and TSN[1..N]. States of a site R : requesting CS E : executing CS H : Holding token, idle N : None of the above Initialization: SVi[j] := N, for j = N .. i; SVi[j] := R, for j = i ; SNi[j] := 0, j = 1..N. S1 (Site 1) is in state H. Token: TSV[j] := N & TSN[j] := 0, j = 1 .. N.

Singhal’s Heuristic Algorithm …
Requesting CS If Si has no token and requests CS: SVi[i] := R. SNi[i] := SNi[i] + 1. Send REQUEST(i,sn) to sites Sj for which SVi[j] = R. (sn: sequence number, updated value of SNi[i]). Receiving REQUEST(i,sn): if sn <= SNj[i], ignore. Otherwise, update SNj[i] and do: SVj[j] = N -> SVj[i] := R. SVj[j] = R -> If SVj[i] != R, set it to R & send REQUEST(j,SNj[j]) to Si. Else do nothing. SVj[j] = E -> SVj[i] := R. SVj[j] = H -> SVj[i] := R, TSV[i] := R, TSN[i] := sn, SVj[j] = N. Send token to Si. Executing CS: after getting token. Set SVi[i] := E.

Singhal’s Heuristic Algorithm …
Releasing CS SVi[i] := N, TSV[i] := N. Then, do: For other Sj: if (SNi[j] > TSN[j]), then {TSV[j] := SVi[j]; TSN[j] := SNi[j]} else {SVi[j] := TSV[j]; SNi[j] := TSN[j]} If SVi[j] = N, for all j, then set SVi[i] := H. Else send token to a site Sj provided SVi[j] = R. Fairness of algorithm will depend on choice of Si, since no queue is maintained in token. Arbitration rules to ensure fairness used. Performance Low to moderate loads: average of N/2 messages. High loads: N messages (all sites request CS). Synchronization delay: T.

Assume there are 3 sites in the system. Initially:
Singhal: Example Assume there are 3 sites in the system. Initially: Site 1: SV1[1] = H, SV1[2] = N, SV1[3] = N. SN1[1], SN1[2], SN1[3] are 0. Site 2: SV2[1] = R, SV2[2] = N, SV2[3] = N. SNs are 0. Site 3: SV3[1] = R, SV3[2] = R, SV3[3] = N. SNs are 0. Token: TSVs are N. TSNs are 0. Assume site 2 is requesting token. S2 sets SV2[2] = R, SN2[2] = 1. S2 sends REQUEST(2,1) to S1 (since only S1 is set to R in SV[2]) S1 receives the REQUEST. Accepts the REQUEST since SN1[2] is smaller than the message sequence number. Since SV1[1] is H: SV1[2] = R, TSV[2] = R, TSN[2] = 1, SV1[1] = N. Send token to S2 S2 receives the token. SV2[2] = E. After exiting the CS, SV2[2] = TSV[2] = N. Updates SN, SV, TSN, TSV. Since nobody is REQUESTing, SV2[2] = H. Assume S3 makes a REQUEST now. It will be sent to both S1 and S2. Only S2 responds since only SV2[2] is H (SV1[1] is N now).

Comparison Non-Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl) Lamport 2T+E T 3(N-1) 3(N-1) Ricart-Agrawala 2T+E T 2(N-1) 2(N-1) Maekawa 2T+E 2T 3*sq.rt(N) 5*sq.rt(N) Token Resp. Time(ll) Sync. Delay Messages(ll) Messages(hl) Suzuki-Kasami 2T+E T N N Singhal 2T+E T N/2 N Raymond T(log N)+E Tlog(N)/2 log(N) 4

Deadlock A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. a deadlock is a situation which occurs when a process enters a waiting state because a resource requested by it is being held by another waiting process, which in turn is waiting for another resource.

Cont.. If a process is unable to change its state indefinitely because the resources requested by it are being used by another waiting process, then the system is said to be in a deadlock. Deadlock is a common problem in multiprocessing systems, parallel computing and distributed systems.

Example-

Example- Suppose a computer has three CD drives and three processes. Each of the three processes holds one of the drives. If each process now requests another drive, the three processes will be in a deadlock. Each process will be waiting for the "CD drive released" event, which can be only caused by one of the other waiting processes. Thus, it results in a circular chain.

Necessary conditions Mutual Exclusion
A deadlock situation can arise if and only if all of the following conditions hold simultaneously in a system: Mutual Exclusion Hold and Wait or Resource Holding: No Preemption Circular Wait

DISTRIBUTED DEADLOCK DETECTION

System Model System have Only Reusable Resources
Processes are allowed only exclusive access to resources. There is Only One Copy of each resource Process can be in Running state or Blocked

Deadlocks in Distributed Systems
Deadlocks in distributed systems are similar to deadlocks in single processor systems, They are harder to avoid, prevent or even detect. They are hard to cure when tracked down because all relevant information is scattered over many machines. Tulika Ringan (AL_IT)

Types of Deadlocks People sometimes might classify deadlock into the following types: Communication deadlocks -- competing with buffers for send/receive Resources deadlocks -- exclusive access on I/O devices, files, locks, and other resources. We treat everything as resources, there we only have resources deadlocks.

Strategies to Handle Deadlock
Four best-known strategies to handle deadlocks: Detection (let deadlocks occur, detect them, and try to recover) Prevention (statically make deadlocks structurally impossible) Avoidance (avoid deadlocks by allocating resources carefully)

Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages in a DS Detection May be practical, and is primary chapter focus Resolution More complex than in non-distributed systems

DS Deadlock Detection Bi-partite graph strategy modified
Use Wait For Graph (WFG or TWF) All nodes are processes (threads) Resource allocation is done by a process (thread) sending a request message to another process (thread) which manages the resource (client - server communication model, RPC paradigm) A system is deadlocked If and only if there is a directed cycle (or knot) in a global WFG Tulika Ringan (AL_IT)

DS Deadlock Detection, Cycle vs. Knot
The AND model of requests requires all resources currently being requested to be granted to un-block a computation A cycle is sufficient to declare a deadlock with this model The OR model of requests allows a computation making multiple different resource requests to un-block as soon as any are granted A cycle is a necessary condition A knot is a sufficient condition

Deadlock in the AND model; there is a cycle but no knot
No Deadlock in the OR model P3 P1 P2 S1 P4 P6 P8 P5 P9 P7 P10 S2 S3

Deadlock in both the AND model and the OR model;
there are cycles and a knot P3 P1 P2 S1 P4 P6 P8 P5 P9 P7 P10 S2 S3

DS Detection Requirements
Progress No undetected deadlocks All deadlocks found Deadlocks found in finite time Safety No false deadlock detection Phantom deadlocks (false) caused by network latencies Principal problem in building correct DS deadlock detection algorithms

Resolution Breaking Existing Wait for dependencies in system
Rolling back one or more processes that are deadlocked and assigning their resources to blocked processes in the deadlock. When WF dependency is broken the corresponding information should be immediately cleaned up (detection of phantom deadlock).

Control Framework Approaches to DS deadlock detection fall in three domains: Centralized control one node responsible for building and analyzing a real WFG for cycles Distributed Control each node participates equally in detecting deadlocks … abstracted WFG Hierarchical Control nodes are organized in a tree which tends to look like a business organizational chart

Total Centralized Control
Simple conceptually: Each node reports to the master detection node The master detection node builds and analyzes the WFG The master detection node manages resolution when a deadlock is detected Some serious problems: Single point of failure Network congestion issues False deadlock detection

Total Centralized Control (cont)
The Ho-Ramamoorthy Algorithms Two phase (can be for AND or OR model) each site has a status table of locked and waited resources the control site will periodically ask for this table from each node the control node will search for cycles and, if found, will request the table again from each node Only the information common in both reports will be analyzed for confirmation of a cycle

Total Centralized Control (cont)
The Ho-Ramamoorthy Algorithms (cont) One phase (can be for AND or OR model) each site keeps 2 tables; process status and resource status the control site will periodically ask for these tables (both together in a single message) from each node the control site will then build and analyze the WFG, looking for cycles and resolving them when found

Distributed Control Each node has the same responsibility for, and will expend the same amount of effort in detecting deadlock The WFG becomes an abstraction, with any single node knowing just some small part of it Generally detection is launched from a site when some thread at that site has been waiting for a “long” time in a resource request message Tulika Ringan (AL_IT)

Distributed Control Models
Four common models are used in building distributed deadlock control algorithms: Path-pushing path info sent from waiting node to blocking node Edge-chasing probe messages are sent along graph edges Diffusion computation echo messages are sent along graph edges Global state detection sweep-out, sweep-in WFG construction and reduction Tulika Ringan (AL_IT)

Path-pushing Obermarck’s algorithm for path propagation : (an AND model) based on a database model using transaction processing sites which detect a cycle in their partial WFG views convey the paths discovered to members of the (totally ordered) transaction the highest priority transaction detects the deadlock “Ex => T1 => T2 => Ex” Algorithm can detect phantoms due to its asynchronous snapshot method Tulika Ringan (AL_IT)

Edge Chasing Algorithms
Chandy-Misra-Haas Algorithm (an AND model) probe messages M(i, j, k) initiated by Pj for Pi and sent to Pk probe messages work their way through the WFG and if they return to sender, a deadlock is detected make sure you can follow the example in Figure 7.1 of the book Tulika Ringan (AL_IT)

Chandy-Misra-Haas Algorithm
P3 P1 P1 launches P2 Probe (1, 3, 4) Probe (1, 9, 1) S1 Probe (1, 6, 8) P4 P6 P8 P5 P9 P7 P10 Probe (1, 7, 10) S2 S3 Tulika Ringan (AL_IT)

Edge Chasing Algorithms (cont)
Mitchell-Meritt Algorithm (an AND model) propagates message in the reverse direction uses public - private labeling of messages messages may replace their labels at each site when a message arrives at a site with a matching public label, a deadlock is detected (by only the process with the largest public label in the cycle) which normally does resolution by self - destruct Tulika Ringan (AL_IT)

Mitchell-Meritt Algorithm
1. P6 initially asks P8 for its Public label and changes its own 2 to 3 2. P3 asks P4 and changes its Public label 1 to 3 3. P9 asks P1 and finds its own Public label 3 and thus detects the deadlock P1=>P2=>P3=>P4=>P5=>P6=>P8=>P9=>P1 Public 1=> 3 Private 1 P3 P1 P2 3 S1 2 P4 P6 P8 P5 P9 P7 P10 S2 Public 2 => 3 Private 2 Public 3 Private 3 S3 1 Tulika Ringan (AL_IT)

Diffusion Computation
Deadlock detection computations are diffused through the WFG of the system => are sent from a computation (process or thread) on a node and diffused across the edges of the WFG When a query reaches an active (non-blocked) computation the query is discarded, but when a query reaches a blocked computation the query is echoed back to the originator when( and if) all outstanding => of the blocked computation are returned to it If all => sent are echoed back to an initiator, there is deadlock Tulika Ringan (AL_IT)

Diffusion Computation of Chandy et al (an OR model)
A waiting computation on node x periodically sends => to all computations it is waiting for (the dependent set), marked with the originator ID and target ID Each of these computations in turn will query their dependent set members (only if they are blocked themselves) marking each query with the originator ID, their own ID and a new target ID they are waiting on A computation cannot echo a reply to its requestor until it has received replies from its entire dependent set, at which time its sends a reply marked with the originator ID, its own ID and the most distant dependent ID When (and if) the original requestor receives echo replies from all members of its dependent set, it can declare a deadlock when an echo reply’s originator ID and most distant ID are its own Tulika Ringan (AL_IT)

Diffusion Computation of Chandy et al
Tulika Ringan (AL_IT)

P5 cannot reply until both P6 and P7
Diffusion Computation of Chandy et al P1 => P2 message at P2 from P1 (P1, P1, P2) P2 => P3 message at P3 from P2 (P1, P2, P3) P3 => P4 message at P4 from P3 (P1, P3, P4) P4 => P ETC. P5 => P6 P5 => P7 P6 => P8 P7 => P10 P8 => P9 (P1, P8, P9), now reply (P1, P9, P1) P10 => P9 (P1, P10, P9), now reply (P1, P9, P1) P8 <= P9 reply (P1, P9, P8) P10<= P9 reply (P1, P9, P10) P6 <= P8 reply (P1, P8, P6) P7 <= P10 reply (P1, P10, P7) P5 <= P ETC. P5 <= P7 P4 <= P5 P3 <= P4 P2 <= P3 P1 <= P2 reply (P1, P2, P1) end condition P5 cannot reply until both P6 and P7 replies arrive ! deadlock condition

Global State Detection
Based on 2 facts of distributed systems: A consistent snapshot of a distributed system can be obtained without freezing the underlying computation A consistent snapshot may not represent the system state at any moment in time, but if a stable property holds in the system before the snapshot collection is initiated, this property will still hold in the snapshot

Global State Detection (the P-out-of-Q request model)
The Kshemkalyani-Singhal algorithm is demonstrated in the text An initiator computation snapshots the system by sending FLOOD messages along all its outbound edges in an outward sweep A computation receiving a FLOOD message either returns an ECHO message (if it has no dependencies itself), or propagates the FLOOD message to it dependencies An echo message is analogous to dropping a request edge in a resource allocation graph (RAG) As ECHOs arrive in response to FLOODs the region of the WFG the initiator is involved with becomes reduced If a dependency does not return an ECHO by termination, such a node represents part (or all) of a deadlock with the initiator Termination is achieved by summing weighted ECHO and SHORT messages (returning initial FLOOD weights)

Hierarchical Deadlock Detection
These algorithms represent a middle ground between fully centralized and fully distributed Sets of nodes are required to report periodically to a control site node (as with centralized algorithms) but control sites are organized in a tree The master control site forms the root of the tree, with leaf nodes having no control responsibility, and interior nodes serving as controllers for their branches

Master Control Node Level 1 Control Node Level 2 Control Node Level 3 Control Node

The Menasce-Muntz Algorithm Leaf controllers allocate resources Branch controllers are responsible for the finding deadlock among the resources that their children span in the tree Network congestion can be managed Node failure is less critical than in fully centralized Detection can be done many ways: Continuous allocation reporting Periodic allocation reporting

Hierarchical Deadlock Detection (cont’d)
The Ho-Ramamoorthy Algorithm Uses only 2 levels Master control node Cluster control nodes Cluster control nodes are responsible for detecting deadlock among their members and reporting dependencies outside their cluster to the Master control node (they use the one phase version of the Ho-Ramamoorthy algorithm discussed earlier for centralized detection) The Master control node is responsible for detecting intercluster deadlocks Node assignment to clusters is dynamic

Agreement Protocols

Agreement Protocols When distributed systems engage in cooperative efforts like enforcing distributed mutual exclusion algorithms, processor failure can become a critical factor Processors may fail in various ways, and their failure modes and communication interfaces are central to the ability of healthy processors to detect and respond to such failures

The System Model The are n processors in the system and at most m of them can be faulty The processors can directly communicate with others processors via messages (fully connected system) A receiver computation always knows the identity of a sending computation The communication system is pipelined and reliable

Faulty Processors May fail in various ways
Drop out of sight completely Start sending spurious messages Start to lie in its messages (behave maliciously) Send only occasional messages (fail to reply when expected to) May believe themselves to be healthy Are not known to be faulty initially by non-faulty processors

Communication Requirements
Synchronous model communication is assumed in this section: Healthy processors receive, process and reply to messages in a lockstep manner The receive, process, reply sequence is called a round In the synchronous-communication model, processes know what messages they expect to receive in a round The synchronous model is critical to agreement protocols, and the agreement problem is not solvable in an asynchronous system

Processor Failures Crash fault Omission fault Malicious fault
Abrupt halt, never resumes operation Omission fault Processor “omits” to send required messages to some other processors Malicious fault Processor behaves randomly and arbitrarily Known as Byzantine faults

Authenticated vs. Non-Authenticated Messages
Authenticated messages (also called signed messages) assure the receiver of correct identification of the sender assure the receiver that the message content was not modified in transit Non-authenticated messages (also called oral messages) are subject to intermediate manipulation may lie about their origin

Authenticated vs. Non-Authenticated Messages (cont’d)
To be generally useful, agreement protocols must be able to handle non-authenticated messages The classification of agreement problems include: The Byzantine agreement problem The consensus problem the interactive consistency problem

Agreement Problems Byzantine One Processor Single Value Agreement
Problem Who initiates value Final agreement Byzantine One Processor Single Value Agreement Consensus All Processors Single Value Interactive All Processors A Vector of Values Consistency Tulika Ringan (AL_IT)

Agreement Problems (cont’d)
Byzantine Agreement One processor broadcasts a value to all other processors All non-faulty processors agree on this value, faulty processors may agree on any (or no) value Consensus Each processor broadcasts a value to all other processors All non-faulty processors agree on one common value from among those sent out. Faulty processors may agree on any (or no) value

Interactive Consistency
Each processor broadcasts a value to all other processors All non-faulty processors agree on the same vector of values such that vi is the initial broadcast value of non-faulty processori . Faulty processors may agree on any (or no) value

Agreement Problems (cont’d)
The Byzantine Agreement problem is a primitive to the other 2 problems The focus here is thus the Byzantine Agreement problem Lamport showed the first solutions to the problem An initial broadcast of a value to all processors A following set of messages exchanged among all (healthy) processors within a set of message rounds

The Byzantine Agreement problem
The upper bound on number of faulty processors: It is impossible to reach a consensus (in a fully connected network) if the number of faulty processors m exceeds ( n - 1) / 3 (from Pease et al) Lamport et al were the first to provide a protocol to reach Byzantine agreement which requires m + 1 rounds of message exchanges Fischer et al showed that m + 1 rounds is the lower bound to reach agreement in a fully connected network where only processors are faulty Thus, in a three processor system with one faulty processor, agreement cannot be reached Tulika Ringan (AL_IT)

Lamport - Shostak - Pease Algorithm
The Oral Message (OM(m)) algorithm with m > 0 (some faulty processor(s)) solves the Byzantine agreement problem for 3m + 1 processors with at most m faulty processors The initiator sends n - 1 messages to everyone else to start the algorithm Everyone else begins OM( m - 1) activity, sending messages to n - 2 processors Each of these messages causes OM (m - 2) activity, etc., until OM(0) is reached when the algorithm stops When the algorithm stops each processor has input from all others and chooses the majority value as its value

Lamport - Shostak - Pease Algorithm (cont’d)
The algorithm has O(nm) message complexity, with m + 1 rounds of message exchange, where n  (3m + 1) See the examples on page in the book, where, with 4 nodes, m can only be 1 and the OM(1) and OM(0) rounds must be exchanged The algorithm meets the Byzantine conditions: A single value is agreed upon by healthy processors That single value is the initiators value if the initiator is non-faulty

Dolev et al Algorithm Since the message complexity of the Oral Message algorithm is NP, polynomial solutions were sought. Dolev et al found an algorithm which runs with polynomial message complexity and requires 2m + 3 rounds to reach agreement The algorithm is a trade-off between message complexity and time-delay (rounds) see the description of the algorithm on page 87

Additional Considerations to Dolev
Consider the case where n > (3m + 1) more messages are sent than needed a set of processors can be selected such the set size is 3m + 1 (called active processors) and messages can be limited to a degree among these processors all active and passive processors using Dolev’s algorithm this way reach Byzantine agreement in 2m + 3 rounds of these limited messages

Applications Atomic Commit in Distributed Database system
In Distributed systems each system performs its individual transaction independently They decide individually whether to commit or abort. Once they decide, each system transfer its decision to all others Then the final decision is taken depending upon the common agreement. This way it follows the Byzantine agreement solution to the problem.

Atomic Commit Protocol
Two-phase commit protocol: most commonly used atomic commit protocol. Implemented as: an exchange of messages between the coordinator and the cohorts. Guarantees global atomicity: of the transaction even if failures should occur while the protocol is executing.

DEADLOCK EXTRA NOTES

DEADLOCK DETECTION ALGORITHMS IN DISTRIBUTED SYSTEMS
Advanced Operating System DEADLOCK DETECTION ALGORITHMS IN DISTRIBUTED SYSTEMS

Overview Deadlocks – An Introduction.
Deadlocks in Distributed Systems. Deadlock Handling Techniques. Algorithms For Deadlock Detection in Distributed Systems. First I would start off with a brief introduction to the concept of Deadlocks. I would be covering what deadlocks are and what are the causes of their occurrence. Then I would discuss the types of deadlocks that occur in distributed systems. Then in Deadlock Handling Techniques I would discuss the various ways in which we can either prevent or work around the situation of a deadlock occurrence. Finally the presentation will move on to cover the various Deadlock Detection algorithms in distributed systems. Summary.

Deadlocks – An Introduction
What Are DEADLOCKS ? A Blocked Process which can never be resolved unless there is some outside Intervention. For Example:- Resource R1 is requested by Process P1 but is held by Process P2. So lets begin with an overview of the concept of Deadlocks. I would be keeping the discussion on this introductory slides very brief as these concepts are mostly familiar to everyone of you. So what are Deadlocks ? ‘ Any blocked process which cannot be resolved unless there is some outside intervention’. So deadlocks can be visualized as occurring where there are processes involved. These processes have some resources held by them and are waiting for some resources to fulfill their completion which are instead being held by some other blocked process. A very simple real-world example is of the two trains halted next to each other and none of them moving unless the other one moves. The allocation of the various resources to the running processes is depicted by a Resource Allocation Graph(RAG). They are used to represent the current state of a system.

Illustrating A Deadlock
Wait-For-Graph (WFG) Nodes – Processes in the system Directed Edges – Wait-For blocking relation Process 1 Process 2 Resource 1 Resource 2 Waits For Held By A Wait-for-Graph is the representation of the deadlock situation. In a WFG the nodes represent the processes of the system and the directed edges signifies the wait-for relation between the different nodes. A WFG is a part of the Resource Allocation Graph and is specifically used to identify the presence of deadlocks in the system. The existence of a cycle in the WFG means that there exists a deadlock in the system. In case a particular process waiting for a resource is not able to obtain it indefinitely then this situation is called starvation. A Cycle represents a Deadlock Starvation - A process’ execution is permanently halted.

Causes Of Deadlocks Mutual Exclusion – Resources being held must be in non-shareable mode. Hold n Wait – A Process is holding one resource and is waiting for another, which is held by another process. No Preemption – Resource cannot be preempted even if it is being requested. A deadlock occurs when there are four necessary conditions satisfied. They are Mutual Exclusion – The resources that are being held by the processes must be in non-shareable mode. Which means that at any given time one and only one process has access to that resource and no other process can use that resource until it has been released by that process. Hold n Wait – The situation when a given process is holding one of the resources and is waiting for another resource currently being held by another process is called Hold n Wait. No Pre-emption – Another necessary condition for deadlock to occur is that the resource being requested cannot be preempted. Circular Wait – The final required condition for a deadlock to occur is that the processes waiting for resources form a cycle. Ie the Process 1 waiting for resource from Process 2, Process 2 waiting for resource from process 3 and so on and Process N waiting for resource from process 1 making a complete cycle. Circular Wait – Presence of a cycle of waiting processes.

Deadlocks in Distributed Systems
Resource Deadlock Most Common. Occurs due to lack of requested Resource. Communication Deadlock A Process waits for certain messages before it can proceed. There are 2 situation that cause deadlocks – 1 – The lack of requested resource is one cause. Which is called the resource deadlock. 2 – The other type is caused due to communication, in which case the process waits for a certain message before it can proceed.

Handling Deadlocks Deadlock Avoidance
Only fulfill those resource requests that won’t cause deadlock in the future. Simulate resource allocation and determine if resultant state is safe or not. Drawbacks Deadlock Avoidance is achieved when the system has a prior knowledge of all the resource requirements of the running processes. So when a resource is requested from the system, the system simulates the allocation of those resources and determines if the resultant state is safe. And the system fulfills only those resource requests that wont cause any deadlock. Inefficient. Requires Prior resource requirement information for all processes. High Cost of scalability.

Handling Deadlocks Deadlock Prevention
Prioritize processes. Assign resources accordingly. Provide all required resources from start itself. Make Prior Rules: For Ex. – Process P1 cannot request resource R1 unless it releases resource R2. So now lets take a look at the various deadlock handling approaches – - First approach is of deadlock prevention. In this technique all the processes are prioritized and resources are allocated to them accordingly. - The resources are allocated from the start itself. - Rules of resource allocation are made such as process P1 can not request a resource R1 unless it releases resource 2. Etc. - The major drawback of this approach is that it is inefficient because it is not scalable. Also it effects the concurrency of the system. Drawbacks Inefficient and effects Concurrency. Future resource requirement unpredictable. Starvation possible.

Handling Deadlocks Deadlock Detection
Resource allocation with an optimistic outlook. Periodically examine process status. Detect then break the Deadlock. Resolution – Roll back 1 or More processes and break dependency. Besides the previous 2 mentioned methods is the method of deadlock detection. In Deadlock detection we perform the resource allocation with an optimistic outlook. We periodically examine the process state and detect the presence of deadlocks and if present then break those deadlocks. The approach is generally to roll back 1 or more processes involved in the deadlock and break the dependency.

Deadlock Detection CONTROL ORGANIZATIONS
Centralized Deadlock Detection One control node (Coordinator) maintains Global WFG and searches for cycles. Distributed Deadlock Detection Each node equally responsible in maintaining Global WFG and detecting Deadlocks. Now lets delve deeper into the various approaches that are used to detect deadlocks and techniques to handle them. The deadlock detection in distributed systems exist in 3 different control organizations. They are - Centralized Deadlock Detection System. - Distributed Deadlock Detection System. - Hierarchical Deadlock Detection System. Centralized Deadlock Detection – In the centralized deadlock detection architecture there exists one Control Node or Coordinator that maintains the Global WFG and searches for the existence of cycles in them. 2. Distributed Deadlock Detection – In this architecture each node equally responsible in maintaining Global WFG and detecting Deadlocks. 3. Hierarchical Deadlock Detection – The 3rd type of architecture is of Hierarchical Deadlock Detection. In this architecture the nodes organized in a tree, where each site detects deadlocks involving only its descendants. Hierarchical Deadlock Detection Nodes organized in a tree, where each site detects deadlocks involving only its descendants.

Deadlock Detection Algorithms
Centralized Deadlock Detection Ho-Ramamoorthy’s one and two phase algorithms. Distributed Deadlock Detection Obermarck’s Path Pushing Algorithm. Chandy-Misra-Haas Edge Chasing algorithm. So now lets see what are the various types of algorithms that come under these deadlock detection control organizations. - Firstly in a centralized deadlock detection set-up “Ho-Ramamoorthy’s 1 and 2 Phase Algorithms” are used. - While under Distributed Deadlock detection approach there are 2 algorithms that are used. One is ‘Obermarck’s Path Pushing Algorithm’ and another one is ‘Chandy-Misra-Haas’ Edge Chasing Algorithm’. - Finally under hierarchical deadlock detection there are “Menasce-Muntz” and “Ho-Ramamoorthy’s” Algorithms that are used. Hierarchical Deadlock Detection Menasce-Muntz Algorithm. Ho-Ramamoorthy’s Algorithm.

Centralized Deadlock Detection
Ho-Ramamoorthy’s 1-Phase Algorithm Each site maintains 2 Status Tables: Process Table. Resource Table. One of the Sites Becomes the Central Control site. So lets start off with the centralized deadlock detection algorithms. The first one is Ho-Ramamoorthy’s 1-phase algorithm. As per this algorithm each of the sites of the network maintains 2 status tables. A process table and a resource table. Among these sites one of the site becomes the central control site. These central control site periodically asks for status tables from the other sites. The Central Control site periodically asks for the status tables. Contd…

Ho-Ramamoorthy’s 1-Phase Algorithm Contd… Control site builds WFG using the status tables. Control site analyzes WFG and resolves any present cycles. Shortcomings After collecting the status tables the central control site builds the WFG for the system. The control site finally analyzes the WFG and resolves the presence of any cycles. Drawbacks – This algorithm can detect Phantom Deadlocks. A phantom deadlock means the detection of a false deadlock, a deadlock that doesn’t really exist but is falsely detected. This algorithm also incurs high storage and communication costs. Phantom Deadlocks. High Storage & Communication Costs.

Phantom Deadlocks P0 P2 P1 R S T
System A System B P1 releases resource S and asks-for resource T. 2 Messages sent to Control Site: Here is an illustration of how phantom deadlocks can occur in distributed systems. The diagram represents 2 systems. System A and System B. System A has process P1 holding resource S and waiting for resources T which is in system B. Resource T is presently held by Process 2. Here process P1 sends 2 messages to the control site stating the release of resource S and need for resource T. So in case the message 2 ie waiting for T reaches the control site first then a cycle would be detected and it would be induced that the system has a deadlock. This false deadlock is called a phantom deadlock. 1. Releasing S. 2. Waiting-for T. Message 2 arrives at Control Site first. Control Site makes a WFG with cycle, detecting a phantom deadlock.

Ho-Ramamoorthy’s 2-Phase Algorithm Each site maintains a status table for processes. Resources Locked & Resources Awaited. Phase 1 Control Site periodically asks for these Locked & Waited tables. The other centralized algorithm proposed by Ho and Ramamoorthy is the 2 phase algorithm. In this algorithm each site maintains a status table for the processes. The status table has the information of the resources that are locked and are awaited by its processes. Phase 1 – In phase 1 the control site asks for the locked and waited tables from the other sites and searches for the presence of cycles in these tables. It then searches for presence of cycles in these tables. Contd…

Ho-Ramamoorthy’s 2-Phase Algorithm Contd… Phase 2 If cycles are found in phase 1 search, Control site makes 2nd request for the tables. The details found common in both table requests will be analyzed for cycle confirmation. Phase 2 – Then in phase 2 the control site looks for the cycles found in phase 1 and makes a 2nd request for the tables. The details that are found common in both the table requests are then analyzed for the cycle confirmation. In this algorithm too there are chances of detecting phantom deadlocks thus making it not a completely accurate algorithm. Shortcomings Phantom Deadlocks.

Distributed Deadlock Detection
Obermarck’s Path-Pushing Algorithm Individual Sites maintain local WFG A virtual node ‘x’ exists at each site. Node ‘x’ represents external processes. Detection Process Now lets look at deadlock detection in distributed scenario. Obermarck’s path pushing algorithm is the one that is used for deadlock detection in distributed system architecture. In distributed architectures any of the sites can play the role of the control site. In this approach each site maintains a local Wait-For Graph. A virtual node exists at each site that represents external processes. Detection Process – When the deadlock detection process begins the site Sn collects the WFG from all other sites. In case a cycle is found at Site Sn which doesn’t involve the external node ‘x’ it means the deadlock exists. This deadlock exists within the site Sn itself. While in case there exists a cycle involving ‘x’ it signifies a possibility of a deadlock. Case 1: If Site Sn finds a cycle not involving ‘x’ -> Deadlock exists. Case 2: If Site Sn finds a cycle involving ‘x’ -> Deadlock possible. Contd…

Obermarck’s Path-Pushing Algorithm
If Case 2 -> Site Sn sends a message containing its detected cycles to other sites. All sites receive the message, update their WFG and re-evaluate the graph. Consider Site Sj receives the message: Site Sj checks for local cycles. If cycle found not involving ‘x’ (of Sj) -> Deadlock exists. Now, this site Sn sends a message to other sites, the message contains the detected cycles.Now each site update their WFG and evaluate the resource graph. For each site if there exists a cycle not involving x it would mean that a deadlock exists. While if the cycle exists involving ‘x’ then it forwards the message to other sites. This process is continued until a deadlock is found. Detection of phantom deadlocks is a limitation of this technique too. Phantom deadlocks are detected in this technique because the different sites take asynchronous snapshot of the WFG status. If site Sj finds cycle involving ‘x’ it forwards the message to other sites. Process continues till deadlock found.

Distributed Deadlock Detection
Chandy-Misra-Haas Edge Chasing algorithm. The blocked process sends ‘probe’ message to the resource holding process. ‘Probe’ message contains: ID of blocked process. ID of process sending the message. ID of process to which the message was sent. This Algorithm uses a probe message to detect the presence of cycles. The blocked process sends the probe message. The message contains three variables. The Id of the blocked process, Id of the process sending the message and the Id of the process to which the message was sent. When the probe is received by the blocked process it forwards it to the processes holding the requested resources. If the blocked process receives it own probe it means that a cycle exists and hence the deadlock also exists. When probe is received by blocked process it forwards it to processes holding the requested resources. If Blocked Process receives its own probe -> Deadlock Exists.

Menasce-Muntz Algorithm Sites (controllers) organized in a tree structure. Leaf controllers manage local WFG. Upper controllers handle Deadlock Detection. Each Parent node maintains a Global WFG, union of WFG’s of its children. Deadlock detected for its children. Now lets move on to the hierarchical Deadlock detection algorithms. - Menasce-Muntz Algorithm In this algorithm the sites are organized in a tree like structure. The leaf controllers manage the WFGs. While the parent of the leaves ie the upper controllers manage the deadlock detection. In this each parent node maintains a Global WFG, which is a union of WFG’s of its children. The Global WFG’s and their outcomes are propagated upwards in the tree. Changes propagated upwards in the tree.

Ho-Ramamoorthy’s Algorithm Sites grouped into clusters. Periodically 1 site chosen as central control site: Central control site chooses controls site for other clusters. Control site for each cluster collects the status graph there: In this algorithm the various sites are grouped into clusters. Periodically 1 site is chosen as the central control site. Now this central control sites chooses control sites for other clusters. The control sites of each cluster collect the status graphs for their respective cluster. Within each cluster the Ho-Ramamoorthy’s centralized Deadlock Detection algorithm is applied. The status reports from each of the clusters are sent to the Central Control Site. The Central Control Site combines the WFG from all control sites and performs the cycle search. Ho-Ramamoorthy’s 1-phase algorithm centralized DD algorithm used. All control sites forward status report to Central Control site which combines the WFG and performs cycle search.

Summary Centralized Deadlock Detection Algorithms
Large communication overhead. Coordinator is performance bottleneck. Possibility of single point of failure. Distributed Deadlock Detection Algorithms High Complexity. So to summarize I would like to make a brief analysis on the different architecture types of Deadlock Detection algorithms. The centralized deadlock algorithms are not a very common and effective method as they incur large communication overhead. Also as this type depends on a Central Control Site so it represents over-reliability on a single node. Hence any degradation in performance in the central coordinator would become a performance bottleneck for the whole system. And also there is a possibility of failure of the control site thus jeopardizing the whole system. The distributed deadlock detection algorithms are better than to centralized ones to an extent as they solve the problem of single point of control. But they are highly complex to implement. Besides they still are not able to solve the issue of phantom deadlocks. The most common model in use today are the Hierarchical architectures. They are also the most efficient and solve the problems that are faced in other approaches. Detection of phantom deadlocks possible. Hierarchical Deadlock Detection Algorithms Most Common. Efficient.

“Choose the least general technique - which is still general enough to solve the problem”.
Edgar Knapp. At last I would like to say that one should choose the least general technique which is still general enough to solve the problem. So depending on the type of distributed system one has the decision can be made on the deadlock detection architecture. THANK YOU

UNIT-II Distributed Synchronization

Similar presentations

Presentation on theme: "UNIT-II Distributed Synchronization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UNIT-II Distributed Synchronization

Similar presentations

Presentation on theme: "UNIT-II Distributed Synchronization"— Presentation transcript:

Similar presentations

About project

Feedback