Coordination and Agreement

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
CS542 Topics in Distributed Systems Diganta Goswami.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Lab 2 Group Communication Andreas Larsson
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 12: Mutual Exclusion All slides © IG.
Distributed Mutual Exclusion Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Distributed Mutex EE324 Lecture 11.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
1 Coordination and Agreement. 3 Failure model zAssumptions: Independent processes, reliable channels, no Byzantine errors. zFailure detector: yMay be.
November 2005Distributed systems: distributed algorithms 1 Distributed Systems: Distributed algorithms.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Synchronization Chapter 5.
Lecture 10 – Mutual Exclusion Distributed Systems.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
Distributed Systems Topic 5: Time, Coordination and Agreement
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 15: Coordination.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
Slides for Chapter 11: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
Lecture 18: Mutual Exclusion
Exercises for Chapter 11: COORDINATION AND AGREEMENT
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Mutual Exclusion Continued
Distributed Mutex EE324 Lecture 11.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
Agreement Protocols CS60002: Distributed Systems
Distributed Mutual Exclusion
Outline Distributed Mutual Exclusion Introduction Performance measures
Mutual Exclusion CS p0 CS p1 p2 CS CS p3.
CSE 486/586 Distributed Systems Leader Election
CSE 486/586 Distributed Systems Mutual Exclusion
Lecture 10: Coordination and Agreement
Synchronization (2) – Mutual Exclusion
Prof. Leonardo Mostarda University of Camerino
Lecture 11: Coordination and Agreement
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Distributed Mutual eXclusion
CSE 486/586 Distributed Systems Mutual Exclusion
CSE 486/586 Distributed Systems Reliable Multicast --- 1
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Coordination and Agreement Đàm Khánh Quốc Minh (00707172) Phạm Hồng Thái (00707706) Lê Anh Vũ (00707195)

Outline Introduction Distributed Mutual Exclusion Election Algorithms Group Communication Consensus and Related Problems

Main Assumptions Each pair of processes is connected by reliable channels Processes independent from each other Network: don’t disconnect Processes fail only by crashing Local failure detector

Distributed Mutual Exclusion (1) Process 2 Process 1 Process 3 … Shared resource Process n Mutual exclusion very important Prevent interference Ensure consistency when accessing the resources

Distributed Mutual Exclusion (2) Mutual exclusion useful when the server managing the resources don’t use locks Critical section Enter() enter critical section – blocking Access shared resources in critical section Exit() Leave critical section

Distributed Mutual Exclusion (3) Distributed mutual exclusion: no shared variables, only message passing Properties: Safety: At most one process may execute in the critical section at a time Liveness: Requests to enter and exit the critical section eventually succeed No deadlock and no starvation Ordering: If one request to enter the CS happened-before another, then entry to the CS is granted in that order

Mutual Exclusion Algorithms Basic hypotheses: System: asynchronous Processes: don’t fail Message transmission: reliable Central Server Algorithm Ring-Based Algorithm Mutual Exclusion using Multicast and Logical Clocks Maekawa’s Voting Algorithm Mutual Exclusion Algorithms Comparison

Central Server Algorithm Queue of requests Holds the token 2 4 2 3) Grant token 2) Release token Request token P4 P4 P1 Waiting P3 P3 P2 P2 Holds the token

Ring-Based Algorithm (1) A group of unordered processes in a network P4 P2 Pn P1 P3 Ethernet

Ring-Based Algorithm (2) P1 P1 P1 Critical Section Enter() P2 P2 P2 Pn Exit() P3 P3 P3 P4 P4 Token navigates around the ring

Mutual Exclusion using Multicast and Logical Clocks (1) P3 Waiting queue 19 19 P1 P1 P1 2 OK 23 OK OK Critical Section Enter() 23 P1 and P2 request entering the critical section simultaneously OK P2 P2 19 23 Exit()

Mutual Exclusion using Multicast and Logical Clocks (2) Main steps of the algorithm: Initialization State := RELEASED; Process pi request entering the critical section State := WANTED; T := request’s timestamp; Multicast request <T, pi> to all processes; Wait until (Number of replies received = (N – 1)); State := HELD;

Mutual Exclusion using Multicast and Logical Clocks(3) Main steps of the algorithm (cont’d): On receipt of a request <Ti, pi> at pj (i  j) If (state = HELD) OR (state = WANTED AND (T, pj) < (Ti, pi)) Then queue request from pi without replying; Else reply immediately to pi; To quit the critical section state := RELEASED; Reply to any queued requests;

Maekawa’s Voting Algorithm (1) Candidate process: must collect sufficient votes to enter to the critical section Each process pi maintain a voting set Vi (i=1, …, N), where Vi  {p1, …, pN} Sets Vi: chosen such that  i,j pi  Vi (at least one common member of any two voting sets) Vi  Vj    Vi = k (fairness) Each process pj is contained in M of the voting sets Vi

Maekawa’s Voting Algorithm (2) Main steps of the algorithm: Initialization state := RELEASED; voted := FALSE; For pi to enter the critical section state := WANTED; Multicast request to all processes in Vi – {pi}; Wait until (number of replies received = K – 1); pi enter the critical section only after collecting K-1 votes state := HELD;

Maekawa’s Voting Algorithm (3) Main steps of the algorithm (cont’d): On receipt of a request from pi at pj (i  j) If (state = HELD OR voted = TRUE) Then queue request from pi without replying; Else Reply immediately to pi; voted := TRUE; For pi to exit the critical section state := RELEASED; Multicast release to all processes Vi – {pi};

Maekawa’s Voting Algorithm (4) Main steps of the algorithm (cont’d): On a receipt of a release from pi at pj (i  j) If (queue of requests is non-empty) Then remove head of queue, e.g., pk; send reply to pk; voted := TRUE; Else voted := FALSE;

M. E. Algorithms Comparison Number of messages Problems Enter()/Exit Before Enter() Centralized 3 2 Crash of server Crash of a process Token lost Ordering non satisfied Virtual ring 1 to  0 to N-1 Logical clocks Crash of a process 2(N-1) 2(N-1) Crash of a process who votes Maekawa’s Alg. 3N 2N

Outline Introduction Distributed Mutual Exclusion Election Algorithms Group Communication Consensus and Related Problems

Election Algorithms (1) Objective: Elect one process pi from a group of processes p1…pN Even if multiple elections have been started simultaneously Utility: Elect a primary manager, a master process, a coordinator or a central server Each process pi maintains the identity of the elected in the variable Electedi (NIL if it isn’t defined yet) Properties to satisfy:  pi, Safety: Electedi = NIL or Elected = P A non-crashed process with the largest identifier Liveness: pi participates and sets Electedi  NIL, or crashes

Election Algorithms (2) Ring-Based Election Algorithm Bully Algorithm Election Algorithms Comparison

Ring-Based Election Algorithm (1) 5 5 16 16 9 25 Process 5 starts the election 25 3 25

Ring-Based Election Algorithm (2) Initialization Participanti := FALSE; Electedi := NIL Pi starts an election Participanti := TRUE; Send the message <election, pi> to its neighbor Receipt of a message <elected, pj> at pi Participanti := FALSE; If pi  pj Then Send the message <elected, pj> to its neighbor

Ring-Based Election Algorithm (3) Receipt of the election’s message <election, pi> at pj If pi > pj Then Send the message <election, pi> to its neighbor Participantj := TRUE; Else If pi < pj AND Participantj = FALSE Then Send the message <election, pj> to its neighbor Participantj := TRUE; Else If pi = pj Then Electedj := TRUE; Participantj := FALSE; Send the message <elected, pj> to its neighbor

T = 2 DelayTrans. + DelayTrait. Bully Algorithm (1) Characteristic: Allows processes to crash during an election Hypotheses: Reliable transmission Synchronous system DelayTrans. DelayTrans. DelayTrait. T = 2 DelayTrans. + DelayTrait.

Bully Algorithm (2) Hypotheses (cont’d): Each process knows which processes have higher identifiers, and it can communicate with all such processes Three types of messages: Election: starts an election OK: sent in response to an election message Coordinator: announces the new coordinator Election started by a process when it notices, through timeouts, that the coordinator has failed

Bully Algorithm (3) 2 3 6 5 7 7 1 4 8 8 Coordinator Coord. Election OK Process 5 detects it first 5 7 7 New Coordinator 1 4 8 8 Coordinator failed Coordinator

Bully Algorithm (4) Initialization Electedi := NIL pi starts the election Send the message (Election, pi) to pj , i.e., pj > pi Waits until all messages (OK, pj) from pj are received; If no message (OK, pj) arrives during T Then Elected := pi; Send the message (Coordinator, pi) to pj , i.e., pj < pi Else waits until receipt of the message (coordinator) (if it doesn’t arrive during another timeout T’, it begins another election)

Bully Algorithm (5) Receipt of the message (Coordinator, pj) Elected := pj; Receipt of the message (Election, pj ) at pi Send the message (OK, pi) to pj Start the election unless it has begun one already When a process is started to replace a crashed process: it begins an election

Election Algorithms Comparison Number of messages Problems Virtual ring Don’t tolerate faults 2N to 3N-1 System must be synchronous Bully N-2 to (N2)

Outline Introduction Distributed Mutual Exclusion Election Algorithms Group Communication Consensus and Related Problems

Group Communication (1) Objective: each of a group of processes must receive copies of the messages sent to the group Group communication requires: Coordination Agreement: on the set of messages that is received and on the delivery ordering We study multicast communication of processes whose membership is known (static groups)

Group Communication (2) System: contains a collection of processes, which can communicate reliably over one-to-one channels Processes: members of groups, may fail only by crashing Groups: Closed group Open group

Group Communication (3) Primitives: multicast(g, m): sends the message m to all members of group g deliver(m) : delivers the message m to the calling process sender(m) : unique identifier of the process that sent the message m group(m): unique identifier of the group to which the message m was sent

Group Communication (4) Basic Multicast Reliable Multicast Ordered Multicast

Basic Multicast Objective: Guarantee that a correct process will eventually deliver the message as long as the multicaster does not crash Primitives: B_multicast, B_deliver Implementation: Use a reliable one-to-one communication To B_multicast(g, m) For each process p  g, send(p, m); On receive(m) at p Use of threads to perform the send operations simultaneously B_deliver(m) to p Unreliable: Acknowledgments may be dropped

Reliable Multicast (1) Primitives: R_multicast, R_deliver Integrity: A correct process P delivers the message m at most once Properties to satisfy: Validity: If a correct process multicasts a message m, then it will eventually deliver m Agreement: If a correct process delivers the message m, then all other correct processes in group(m) will eventually deliver m Primitives: R_multicast, R_deliver

Reliable Multicast (2) Implementation using B-multicast: Initialization Correct algorithm, but inefficient (each message is sent |g| times to each process) msgReceived := {}; R-multicast(g, m) by p B-multicast(g, m); // p g B-deliver(m) by q with g = group(m) If (m  msgReceived) Then msgReceived := msgReceived  {m}; If (q  p) Then B-multicast(g, m); R-deliver(m);

Ordered Multicast Ordering categories: FIFO Ordering Total Ordering Causal Ordering Hybrid Ordering: Total-Causal, Total-FIFO

FIFO Ordering (1) If a correct process issues multicast(g, m1) and then multicast(g, m2), then every correct process that delivers m2 will deliver m1 before m2 m1 Process 1 Process 2 Process 3 m3 m2 Time

FIFO Ordering (2) Primitives: FO_multicast, FO_deliver Implementation: Use of sequence numbers Variables maintained by each process p: : Number of messages sent by p to group g Sg p : sequence number of the latest message p has delivered from process q that was sent to the group Rg q Algorithm FIFO Ordering is reached only under the assumption that groups are non-overlapping

Total Ordering (1) If a correct process delivers message m2 before it delivers m1, then any correct process that delivers m1 will deliver m2 before m1 m1 m2 Process 1 Process 2 Process 3 Time Primitives: TO_multicast, TO_deliver

Total Ordering (2) Implementation: Assign totally ordered identifiers to multicast messages Each process makes the same ordering decision based upon these identifiers Methods for assigning identifiers to messages: Sequencer process Processes collectively agree on the assignment of sequence numbers to messages in a distributed fashion

Total Ordering (3) Sequencer process: Maintains a group-specific sequence number Sg Initialization Sg := 0; B-deliver(<m, Ident.>) with g = group(m) B-multicast(g, <“order”, Ident., Sg>); Sg = Sg + 1; Algorithm for group member p  g Initialization Rg := 0;

Total Ordering (4) TO-multicast(g, m) by p Unique identifier of m TO-multicast(g, m) by p B-multicast(g  Sequencer(g), <m, Ident.>); B-deliver(<m, Ident.>) by p, with g = group(m) Place <m, Ident.> in hold-back queue; B-deliver(morder= <“order”, Ident., S>) by p, with g = group(morder) Wait until (<m, Ident.> in hold-back queue AND S = Rg); TO-deliver(m); Rg = S + 1;

Total Ordering (5) Processes collectively agree on the assignment of sequence numbers to messages in a distributed fashion Variables maintained by each process p: : largest sequence number proposed by q to group g Pg q : largest agreed sequence number q has observed so far for group g Ag q

Total Ordering (6) Proposition of a sequence number Ag = SN p3 p4 p5 p2 p1 P3 Pg = MAX(Ag, Pg ) + 1 p3 Proposition of a sequence number Assigning a sequence number to the message Message transmission <Ident., > Pg P5 P4 P3 P2 <Ident., SN> <m, Ident.> P2 P1 P4 SN = MAXi=1,..,5(Pg ) pi Pg = MAX(Ag, Pg ) + 1 p2 p5 p4 P5

Causal Ordering (1) If multicast(g, m1)  multicast(g, m3), then any correct process that delivers m3 will deliver m1 before m3 m1 Process 1 Process 2 Process 3 m2 m3 Time

Causal Ordering (2) Primitives: CO_multicast, CO_deliver Each process pi of group g maintains a timestamp vector Vi g Number of multicast messages received from pj that happened-before the next message to be sent Vi [j] = g Algorithm for group member pi: Initialization Example Vi [j] := 0 (j = 1, …, N); g

Causal Ordering (3) CO-multicast(g, m) Vi [i] := + 1; g Vi [i] B-multicast(g, <m, >); Vi g B-deliver(<m, >) of pj, with g = group(m) Vj g Place <m, > in a hold-back queue; Vj g Wait until (Vj [j] = + 1) AND ( g Vi [j] Vj [k]  ); Vi [k] (k  j) CO-deliver(m); Vi [j] := + 1; g Vi [j]

Outline Introduction Distributed Mutual Exclusion Election Algorithms Group Communication Consensus and Related Problems

Consensus introduction Make agreement in a distributed manner Mutual exclusion: who can enter the critical region Totally ordered multicast: the order of message delivery Byzantine generals: attack or retreat? Consensus problem Agree on a value after one or more of the processes has proposed what the value should be

Consensus (1) Objective: processes must agree on a value after one or more of the processes has proposed what that value should be Hypotheses: reliable communication, but processes may fail Consensus problem: Every process Pi begins in the undecided state Proposes a value Vi  D (i=1, …, N) Processes communicate with one another, exchanging values Each process then sets the value of a decision variable di Enters the state decided, in which it may no longer change di (i=1, …, N)

Consensus (2) P2 P1 Consensus algorithm P3 d1:=proceed d2:=proceed V1:=proceed V2:=proceed Consensus algorithm V3:=abort P3 (Crashes)

Consensus (3) Proprieties to satisfy: Termination: Eventually each correct process sets its decision variable Agreement: the decision value of all correct processes is the same: Pi and Pj are correct  di = dj (i,j=1, …, N) Integrity: If the correct processes all proposed the same value, then any correct process in the decided state has chosen that value

Consensus (4) Consensus in a synchronous system: Use of basic multicast At most f processes may crash f+1 rounds are necessary Delay of one round is bounded by a timeout Consensus in a synchronous system: Valuesir : set of proposed values known to process pi at the beginning of round r

Consensus (5) Interactive consistency problem: variant of the consensus problem Objective: correct processes must agree on a vector of values, one for each process Proprieties to satisfy: Termination: Eventually each correct process sets its decision variable Agreement: the decision vector of all correct processes is the same Integrity: If Pi is correct, then all correct processes decide on Vi as the ith component of their vector

Consensus (6) Byzantine generals problems: variant of the consensus problem Objective: a distinguished process supplies a value that the others must agree upon Proprieties to satisfy: Termination: Eventually each correct process sets its decision variable Agreement: the decision value of all correct processes is the same: Pi and Pj are correct  di = dj (i,j=1, …, N) Agreement: the decision value of all correct processes is the same Integrity: If the commander is correct, then all correct processes decide on the value that the commander proposed

Consensus (7) Byzantine agreement in a synchronous system: Example : a system composed of three processes (must agree on a binary value 0 or 1) Nodej Scenario 1: process j is faulty Nodei Commander 1 Scenario 2: Commander is faulty Nodej Nodei Commander 1 Number of faulty processes must be bounded

Consensus (8) For m faulty processes, n  3m+1, where n denotes the total number of processes Interactive Consistency Algorithm: ICA(m), m>0, m denotes the maximal number of processes that may fail simultaneously Sender: all nodes must agree upon its value Receivers: all other processes If a process doesn’t send a message, the receiving process will use a default value  ICA Algorithm requires m+1 rounds in order to achieve the consensus

Consensus (9) Interactive Consistency Algorithm: Algorithm ICA(m) 1. The sender sends its value to all the other n-1 processes 2. Let Vi be the value received by process i from the sender, or the default value if no message is received Process i consider itself as a sender in ICA(m-1): it sends the value Vi to the n-2 other processes 3.  i, Let Vj be the value received from process j (j  i) The process i uses the value Choice(V1, …, Vn) End Algorithm ICA(0) 1. The sender sends its value to all the other n-1 processes 2. Each process uses the value received from the sender, or use the default value if no message is received End

References PhD. Mourad Elhadef’s presentation Coulouris G (et al) – Distributed System – Concepts and Design – Pearson 2001 Other presentations Wikipedia: www.wikipedia.com

THANK YOU!