Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slides for Chapter 12: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson.

Similar presentations


Presentation on theme: "Slides for Chapter 12: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson."— Presentation transcript:

1 Slides for Chapter 12: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education 2005

2 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Failure Assumptions and Failure Detectors zreliable communication channels zprocess failures: crashes zfailure detector: object/code in a process that detects failures of other processes zunreliable failure detector yunsuspected or suspected (evidence of possible failures) yeach process sends ``alive'' message to everyone else ynot receiving ``alive'' message after timeout ymost practical systems zreliable failure detector yunsuspected or failure ysynchronous system yfew practical systems

3 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Distributed Mutual Exclusion zprovide critical region in a distributed environment zmessage passing zfor example, locking files, lockd daemon in UNIX (NFS is stateless, no file-locking at the NFS level)

4 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Algorithms for mutual exclusion zN processes zprocesses don't fail zmessage delivery is reliable zcritical region: enter(), resourceAccesses(), exit() zProperties: y[ME1] safety: only one process at a time y[ME2] liveness: eventually enter or exit y[ME3] happened-before ordering: ordering of enter() is the same as HB ordering

5 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Algorithms for mutual exclusion zPerformance evaluation: yoverhead and bandwidth consumption: # of messages sent yclient delay incurred by a process at entry and exit ythroughput measured by synchronization delay: delay between one's exit and next's entry

6 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A central server algorithm zserver keeps track of a token---permission to enter critical region za process requests the server for the token zthe server grants the token if it has the token za process can enter if it gets the token, otherwise waits zwhen done, a process sends release and exits

7 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Server managing a mutual exclusion token for a set of processes Server 1. Request token Queue of requests 2. Release token 3. Grant token 4 2 p 4 p 3 p 2 p 1

8 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A central server algorithm zProperties: ysafety, why? yliveness, why? yHB ordering not guaranteed, why? [VC in processes vs. server] zPerformance: yenter overhead: two messages (request and grant) yenter delay: time between request and grant yexit overhead: one message (release) yexit delay: none ysynchronization delay: between release and grant ycentralized server is the bottle neck

9 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm zlogical ring, could be unrelated to the physical configuration  p i sends messages to p (i+1) mod N zwhen a process holds a token, it can enter, otherwise waits zwhen a process releases a token (exit), it sends to its neighbor

10 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring of processes transferring a mutual exclusion token

11 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm zproperties: ysafety, why? yliveness, why? yHB ordering not guaranteed, why? zPerformance: ybandwidth consumption: token keeps circulating yenter overhead: 0 to N messages yenter delay: delay for 0 to N messages yexit overhead: one message yexit delay: none ysynchronization delay: delay for 1 to N messages

12 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 An algorithm using multicast and logical clocks zmulticast a request message for the token zenter only if all the other processes reply  totally-ordered timestamps: zeach process keeps a state: RELEASED, HELD, WANTED zif all have state = RELEASED, all reply, a process can hold the token and enter zif a process has state = HELD, doesn't reply until it exits zif more than one process has state = WANTED, process with the lowest timestamp will get all N-1 replies first.

13 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ricart and Agrawala’s algorithm On initialization state := RELEASED; To enter the section state := WANTED; Multicast request to all processes;request processing deferred here T := request’s timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request at p j (i ≠ j) if (state = HELD or (state = WANTED and (T, p j ) < (T i, p i ))) then queue request from p i without replying; else reply immediately to p i ; end if To exit the critical section state := RELEASED; reply to any queued requests;

14 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Multicast synchronization p 3 34 Reply 34 41 34 p 1 p 2 Reply

15 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 An algorithm using multicast and logical locks zProperties ysafety, why? yliveness, why? yHB ordering, why?

16 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 An algorithm using multicast and logical locks zperformance: ybandwidth consumption: no token keeps circulating yentry overhead: 2(N-1), why? [with multicast support: 1 + (N -1) = N] yentry delay: delay between request and getting all replies yexit overhead: 0 to N-1 messages yexit delay: none ysynchronization delay: delay for 1 message (one last reply from the previous holder)

17 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Maekawa’s Voting Algorithm – Main Idea zWe actually don’t need all N – 1 replies zConsider processes A, B, X yA needs replies from “A” and X yB needs replies from “B” and X yIf X can only reply to (vote) *one process at a time* xA and B cannot have a reply from X at the same time xMutex between A and B--hinges on X zProcesses in overlapping groups ymembers in the overlap “control” mutex

18 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Mutual exclusion for all processes zA group for every process zA pair of groups for A and B overlaps y=> mutex(A, B) zEvery possible pair of groups overlaps y=> mutex(all possible pairs of processes) y=> mutex(all processes)

19 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Groups and members zA group for each process yN processes yN groups zEach group has K members ynumbers of processes to request for permission zEach process is in M groups (M > 1) yAllows overlapping => mutex xIf the groups are disjoint, M = 1, no overlapping, no mutex ynumber of processes to grant permission

20 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Parameters and group membership zN = number of processes zK = voting group size zM = number of voting groups each process is in zOptimal (smallest K, why?) yK = M ~= sqrt(N) yNon-trivial to construct the groups zApproximation yK = 2 * sqrt(N) - 1 xPut process id’s in a sqrt(N) by sqrt(N) table xUnion the rows and columns where p i is yN groups yM = 2 * sqrt(N) - 1

21 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Group membership ABC DEF GHI Group for A: {A, B, C, D, G} Group for B: {A, B, C, E, H} … Group for I: {G, H, I, C, F} Every pair of groups overlap N groups Each group has K = 2 * sqrt(N) – 1 members M = 2 * sqrt(N) – 1 [# of groups each process are in] ABC DEF GHI

22 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Group membership (alternative?) ABC DEF GHI Groups for A, B, C: {A, B, C, D, G} Groups for D, E, F: {D, E, F, B, H} Groups for G, H, I: {G, H, I, C, F} Every pair of groups overlap N groups [Sqrt(N) unique groups] Each group has K = 2 * sqrt(N) – 1 members M = 3 [sqrt(N)] for A, E, I, but M = 6 [2*sqrt(N)] for the rest Undesirable, why? ABC DEF GHI

23 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Sketch of the Voting Algorithm zA group of processes V i for each process p i yEach pair of groups overlap xX in groups: [A,X] and [B,X] y=> the groups for any pair of processes overlap zTo enter the critical region, p i ySends REQUEST’s to all processes in V i yWaits for REPLY’s (VOTE’s) from all processes in V i zTo exit the critical region, p i ySends RELEASE’s to all processes in V i zThree types of messages, not two as in the multicast alg.

24 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.6 Maekawa’s voting algorithm On initialization state := RELEASED; voted := FALSE; For p i to enter the critical section state := WANTED; Multicast request to all processes in V i ; Wait until (number of replies received = K); state := HELD; On receipt of a request from p i at p j if (state = HELD or voted = TRUE) then queue request from p i without replying; else send reply to p i ; voted := TRUE; end if For p i to exit the critical section state := RELEASED; Multicast release to all processes in V i ; On receipt of a release from p i at p j if (queue of requests is non-empty) then remove head of queue – from p k, say; send reply to p k ; voted := TRUE; else voted := FALSE; end if

25 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Deadlock? zProcesses: A, B, C yGroup A: A, B yGroup B: B, C yGroup C: C, A zDeadlock yA has A’s reply, waiting for B’s reply yB has B’s reply, waiting for C’s reply yC has C’s reply, waiting for A’s reply zTimestamp the requests in HB ordering y holding according to the timestamp

26 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Properties zSafety: No process can reply/vote more than once at any time. zLiveness: timestamp (HB ordering) zHB ordering: above

27 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Performance zEntry overhead [assuming k = sqrt(N)] y Sqrt(N) requests + Sqrt(N) replies y 2* Sqrt(N) x 4] zExit overhead y Sqrt(N) releases

28 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Elections zchoosing a unique process for a particular role zfor example, server in dist. mutex zeach process can call only one election zmultiple concurrent elections can be called by different processes zparticipant: engages in an election zprocess with the largest id wins  each process p i has variable elected i = ? (don't know) initially

29 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Elections zProperties:  [E1] elected i of a ``participant'' process must be P max (elected process---largest id) or ?  [E2] liveness: all processes participate and eventually set elected i != ? (or crash) zPerformance: yoverhead (bandwidth consumption): # of messages yturnaround time: # of messages to complete an election

30 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm zlogical ring, could be unrelated to the physical configuration  p i sends messages to p (i+1) mod N zno failures zelect the coordinator with the largest id zinitially, every process is a non-participant zany process can call an election: ymarks itself as participant yplaces its id in an election message ysends the message to its neighbor

31 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm zreceiving an election message: yif id > myid, forward the msg, mark participant yif id < myid xnon-participant: replace id with myid: forward the msg, mark participant xparticipant: stop forwarding (why? Later, multiple elections)  if id = myid, coordinator found, mark non-participant, elected i := id, send elected message with myid zreceiving an elected message: yid != myid, mark non-participant, elected i := id forward the msg yif id = myid, stop forwarding

32 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based election in progress Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened

33 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm zProperties: ysafety: only the process with the largest id can send an elected message yliveness: every process in the ring eventually participates in the election; extra elections are stopped

34 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm zPerformance: yone election, best case, when? xN election messages xN elected messages xturnaround: 2N messages yone election, worst case, when? x2N - 1 election messages xN elected messages xturnaround: 3N - 1 messages ycan't tolerate failures, not very practical

35 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm zprocesses can crash and can be detected by other processes ztimeout T = 2T transmitting + T processing zeach process knows all the other processes and can communicate with them zMessages: election, answer, coordinator

36 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm zstart an election ydetects the coordinator has failed ysends an election message to all processes with higher id's and waits for answers (except the failed coordinator/process) yif no answers in time T, xit is the coordinator xsends coordinator message (with its id) to all processes with lower id's yelse xwaits for a coordinator message xstarts an election if timeout

37 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm The election of coordinator p 2, after the failure of p 4 and then p 3

38 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm zreceiving an election message ysends an answer message back ystarts an election if it hasn't started one—send election messages to all higher-id processes (including the “failed” coordinator—the coordinator might be up by now) zreceiving a coordinator message yset elected i to the new coordinator zto be a coordinator, it has to start an election zwhen a crashed process is replaced ythe new process starts an election and ycan replace the current coordinator (hence ``bully'')

39 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm zproperties: ysafety: xa lower-id process always yields to a higher-id process xHowever, during an election, if a failed process is replaced the low-id processes might have two different coordinators: the newly elected coordinator and the new process, why? xfailure detection might be unreliable yliveness: all processes participate and know the coordinator at the end

40 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm zPerformance ybest case: when? xoverhead: N-2 coordinator messages xturnaround delay: no election/answer messages yworst case: when? xoverhead: x1+ 2 +...+ (N-2) + (N-2)= (N-1)(N-2)/2 + (N-2) election messages, x1+...+ (N-2) answer messages, xN-2 coordinator messages, xtotal: (N-1)(N-2) + 2(N-2) = (N+1)(N-2) = O(N 2 ) yturnaround delay: delay of election and answer messages


Download ppt "Slides for Chapter 12: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson."

Similar presentations


Ads by Google