Practical Session 13 Distributed synchronization

Practical Session 13 Distributed synchronization
Operating Systems, 162 Practical Session 13 Distributed synchronization

Motivation Interest in distributed computation models is rapidly growing (Grids, Cloud computation, internet relay, etc) A good model for computation is difficult to come up with: Concurrent computation No global time and no global state Hard to capture effects of possible failures

The model Each instance is executed on a different processor
Assume no shared memory, communication is handled with messages of the following format: <destination, action; parameters> Sending is non-blocking and reliable A processor waits for events (messages) – a timeout mechanism is possible but we will not discuss it here Sending is non blocking and reliable, ה

Global states and causality
It is impossible to determine the global state of a distributed system: Noninstantaneous communication (delays, lost messages, etc) Can’t synchronize with a timer mechanisms (drift, initial synchronization) Local interruptions (can’t trust simultaneous reactions) Thus, we must find global system properties which we can depend on – causal order of events

Happened before We would like to define some order over system events – a “happened before” relation (denoted <H): If (e1 <p e2) then e1 <H e2 If (e1 <m e2) then e1 <H e2 If (e1 <H e2 && e2 <H e3) then e1 <H e3 Defines a partial order Can be defined as a DAG Same processor event Send – receive event A <p B : means that A happened before B on the same processor A <m B : means that A sent B a message (send before receive…) Transitivity of <H

Global time – Lamport’s timestamps
Defines a global (and total) order on events Order is consistent with <H Created on the fly Will assume that each event has a timestamp attached to it An ID is appended to the timestamp and allows for tie breaking Lamport’s algorithm: If e1<He2 then e1.TS < e2.TS 1 Initially my_TS=0 2 Upon event e, 3 if e is the receipt of message m my_TS=max(m.TS, my_TS) 5 my_TS++ 6 e.TS=my_TS 7 If e is the sending of message m m.TS=my_TS Lamport’s timestamp mechanism attempt to find an ordering of all events which is consistent with <H It guarantees: If e1<He2 then e1.TS < e2.TS

Causality violation and vector timestamps
Lamport’s algorithm does not guarantee that if e.TS < e’.TS then e <H e’ This make it difficult to detect causality violation A causality violation occurs if a message m is sent to a remote processor p before another message m’ is sent, but p receives m’ before m Written as: m<cm’ and r(m’)<pr(m) We will use a vector of timestamps to overcome this problem Note the minor differences: m<cm’ implies that m was sent before m’ r(m’)<p r(m) which means that m’ was received on the same processor (hence <p) before m

Global time – vector timestamps
e.VT ≤v e’.VT iff e.VT[i] ≤ e’.VT[i], 1 ≤ i ≤ M e.VT <v e’.VT iff e.VT ≤v e’.VT and e.VT≠e’.VT Can be used to detect causality violations VT algorithm: e1<He2 iff e1 <VT e2 1 Initially my_VT=[0,…,0] 2 Upon event e, 3 if e is the receipt of message m for i=1 to M my_VT[i] = max(m.VT[i],my_VT[i]) My_VT[self]++ e.VT=my_VT if e is the sending of message m m.VT=my_VT

Question 1 Consider the following interaction between four processors:
Time e1 e8 e14 e20 e2 e9 e10 e15 e3 e21 e4 e11 e22 e23 e12 e16 e5 e17 e18 e24 e6 e7 e13 e19

Question 1 What is the largest Lamport’s timestamp value? (hint: you can answer without calculating all time stamps) List the Lamport timestamp of each event. List the vector timestamp of each event. Is there a potential causality violation? What can indicate this violation? 1) Longest track

Question 1 P1 P2 P3 P4 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13 e14 e15 e16 e17 e18 e19 e20 e21 e22 e23 e24 The Lamport timestamp mechanism calculates the longest chain of event occurring within a system. Thus, the largest timestamp value would be the number of vertices included in the longest path of events of the underlying DAG. In this case the answer is 12.

Question 1 P1 P2 P3 P4 Time Event TS e1 1 e2 2 e3 3 e4 4 e5 6 e6 10 e7
11 e8 e9 e10 e11 e12 5 e13 8 e14 e15 e16 7 e17 e18 9 e19 12 e20 e21 e22 e23 e24 P1 P2 P3 P4 Time e1 e8 e14 e20 e2 e9 e10 e15 e3 e21 e4 e11 e22 e23 e12 e16 e5 e17 e18 e24 e6 e7 e13 e19

Question 1 P1 P2 P3 P4 Time Event TS e1 (1,0,0,0) e2 (2,1,0,0) e3
(3,1,0,0) e4 (4,1,0,0) e5 (5,5,1,2) e6 (6,5,5,4) e7 (7,5,5,4) e8 (0,1,0,0) e9 (0,2,1,0) e10 (0,3,1,0) e11 (1,4,1,2) e12 (1,5,1,2) e13 (4,6,1,5) e14 (0,0,1,0) e15 (0,3,2,0) e16 (4,3,3,4) e17 (4,3,4,4) e18 (4,3,5,4) e19 (7,5,6,4) e20 (1,0,0,1) e21 (1,0,0,2) e22 (4,1,0,3) e23 (4,1,0,4) e24 (4,1,0,5) P1 P2 P3 P4 Time e1 e8 e14 e20 e2 e9 e10 e15 e3 e21 e4 e11 e22 e23 e12 e16 e5 e17 e18 e24 e6 e7 e13 e19

Question 1 P1 P2 P3 P4 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13 e14 e15 e16 e17 e18 e19 e20 e21 e22 e23 e24 A possible causality violation exist. Note that the send event e3 (a message from p1 to p3) happens before e23 (a message from p4 to p3) but is received afterward. When using VT, e3.VT=(3,1,0,0) but right before the reception of this message (e17) the clock’s VT is e16.VT=(4,3,3,4). Thus, when P3 receives this message it knows that e23 arrived “too soon”.

The Ricart - Agrawala algorithm
An algorithm for handling distributed mutual exclusion Uses Lamport’s timestamps Each process only uses the following set of variables / data structures: Timestamp current_time Timestamp my_timestamp integer reply_count boolean isRequesting boolean reply_deferred[M] Current time => from lamports My timestamp => save for request timestamp Reply count => count return msg from all other computers isRequesting => am I interested in going into the CS Reply deferred => remember to send replay after exiting CS to someone who requested

The following code is used to enter the critical section: Request_CS: my_timstamp := current_time isRequesting := TRUE Reply_count := M-1 for every processor j≠i send(REMOTE_REQUEST; my_timestamp) wait until reply_count = 0

A listener thread is used so that the node responds to requests from others: CS_monitoring: Wait until a REMOTE_REUQUEST or REPLY message is received REMOTE_REQUEST(sender; request_time) Let j be the sender of the REMOTE_REQUEST message if (not is_requesting or my_timestamp > request_time) send(j, REPLY) else reply_deferred[j] = TRUE REPLY 6. reply_count := reply_count-1 Ties are broken with processor IDs

Releasing the CS: Release_CS_monitoring: is_requesting := false For j=1 through M (other than this processor's ID) if (reply_deferred[i]=TRUE) send(j, REPLY) reply_deferred[j]=FALSE

Question 2 Assume that N processors are handling mutual exclusion with the aid of the Ricart-Agrawala’s mutual exclusion algorithm. How many messages will be passed in the system whenever a processor wishes to enter the critical section? Are there scenarios where this number is lower/greater? Why is this algorithm deadlock free? What can happen if a single message is lost? שאלה טפשית, (2N-1) כמובן, רוצים לחסוך הודעות? לא נשלח אישורים לאחרים פשוט ניכנס מספר פעמים לכל אחד יש ID מיוחד משלו (timestamp + processor ID) ולכן תמיד ישלח reply Deadlock...

Question 2 To enter the CS a processor must request permission from all other processors – i.e. N-1 messages are sent. Only after the processor received a REPLY response from all other processors may it enter the CS. Note, that these may be deferred for a while… That is, entering the CS will require a total of 2(N-1) messages passed. One means to reduce this network load is by keeping several requests deferred for a while. This will allow agents to enter the CS more than once without having to send messages to all N-1 nodes in the system [Roucairol & Carvalho]. This optimization is called Roucairol & Carvalho Optimization.

Question 2 2. The algorithm relies on the fact that each timestamp is unique (based on the Lamport’s time and processor ID). Thus, a total order over request can be easily deduced, and CS access is handled through this order. 3. The algorithm assumes that the system is failure free and its correctness heavily relies on this condition. It is easy to see that if a single message is lost a deadlock can easily occur.

Raymond’s algorithm Solve the mutual exclusion problem via a token (only the token holder may enter CS) Communication is based on an underlying tree structure of all nodes The tree is always oriented towards token holder Uses a FIFO queue to prevent starvation Good performance (number of messages per CS entry decreases as the load increases!) Uses O(log n) messages only

Raymond’s algorithm Each process only uses the following set of variables / data structures: Boolean token_holder Boolean inCS Processor current_dir Queue requests_queue Token holder => am I the token holder inCS => am I in the critical section Current dir => the parent in the tree graph Request queue => the processors that still want the token

Raymond’s algorithm Request_CS: If not token_holder
if requests_queue.isEmpty( ) send(current_dir, REQUEST) requests_queue.enqueue(self) wait until token_holder is true inCS := true Release_CS: inCS := false If not requests_queue.isEmpty( ) current_dir := requests_queue.dequeue( ) send(current_dir, TOKEN) token_holder := false if not requests_queue.isEmpty( ) Request – אם הרשימה מלאה לשלוח לכיוון בעל הtoken שאני רוצה אותו אחרת להוסיף אותי לqueue Release – אם רשימת המחכים לא ריקה לשים בכיוון את המטרה (מהתור) ולשלוח את הtoken אם הרשימה עדיין לא ריקה לשלוח בקשה שיחירו אחר כך

Raymond’s algorithm Monitor_CS: while (true)
wait for a REQUEST or a TOKEN message REQUEST if token_holder if inCS requests_queue.enqueue(sender) else current_dir := sender send(current_dir, TOKEN) token_holder := false if requests_queue.isEmpty() send(current_dir,REQUEST) טיפול בבקשת request – אם אני בעל הtoken ובcs תוסיף לתור, אם לא בCS תכוון למי שביקש ותשלח לו את הtoken. אם גם אין לי את הtoken תשלח בקשה לכיוון המחזיק ותוסיף לתור את השולח.

Raymond’s algorithm (cont.)
TOKEN current_dir := requests_queue.dequeue( ) if current_dir = self token_holder := true else send(current_dir, TOKEN) if not requests_queue.isEmpty( ) send(current_dir, REQUEST) טיפול בבקשת token – לשנות את הכיוון למבקש הבא של הtoken אם זה אני לסמן שהוא אצלי, אחרת להמשיך להעביר את הtoken. אם אני עדיין צריך להעביר (יש עוד שביקשו) לשלוח הלאה.

Question 3, Moed B 2006 The following 8 processor network is using Raymond’s algorithm to solve the mutual exclusion problem. In the initial state, the token is with processor A at the root of the tree (and wants to enter the critical section), and no requests for the CS are recorded.

Question 3, Moed B 2006 Directed edges correspond to the current_direction var A B F C D E G H

Question 3, Moed B 2006 To allow for a convenient representation we define agent steps as the invocation of a procedure or an action. Use the sketch above to describe the result of applying Raymond’s algorithm if nodes C,D,F and G request the token. Provide a detailed description of all concurrent steps (in which a single step is taken by all relevant nodes) by sketching the system’s state after each one and up until three of the four agents receive the token. Note: assume that ties are broken based on ID.

Question 3, Moed B 2006 A F B F C C D D E G G H REQUEST REQUEST

Question 3, Moed B 2006 F A REQUEST C D F G B F C C D D E G G H

Question 3, Moed B 2006 F B A C D F G B F C C D D E G G H
Assume that A is still in CS C C D D E G G H

Question 3, Moed B 2006 B A C D F G B F C C D D E G G H
REQUEST C D F G B F Note that the edge has switched its direction C C D D E G G H

Question 3, Moed B 2006 B A C D G A B F C C D D E G G H
Note that the edge has switched its direction C C D D E G G H

Question 3, Moed B 2006 B A C D A B F REQUEST C C D D E G G H

Question 3, Moed B 2006 B A C D A B F C C D D E G F H

Question 3, Moed B 2006 B A C D A B F C C D D E G H

Question 3, Moed B 2006 B A C D B F C C D D E G H

Question 3, Moed B 2006 A C D B F C C D D E G H

Question 3, Moed B 2006 A D B F REQUEST C C D D E G H

Question 3, Moed B 2006 A D B F B C D D E G H

Practical Session 13 Distributed synchronization

Similar presentations

Presentation on theme: "Practical Session 13 Distributed synchronization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Practical Session 13 Distributed synchronization

Similar presentations

Presentation on theme: "Practical Session 13 Distributed synchronization"— Presentation transcript:

Similar presentations

About project

Feedback