DC6: Chapter 12 Coordination Election Algorithms Distributed Mutual Exclusion Consensus Group Communication.

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Distributed Systems Spring 2009
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
What we will cover…  Distributed Coordination 1-1.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Chapter 8 Coordination. Topics Election algorithms Mutual exclusion Deadlock Transaction.
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
Mutual exclusion Concurrent access of processes to a shared resource or data is executed in mutually exclusive manner Distributed mutual exclusion can.
4.5 DISTRIBUTED MUTUAL EXCLUSION MOSES RENTAPALLI.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
1 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Global State (1) a)A consistent cut b)An inconsistent cut.
Synchronization CSCI 4780/6780. Mutual Exclusion Concurrency and collaboration are fundamental to distributed systems Simultaneous access to resources.
Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 15/01/2008.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Vector Clock Each process maintains an array of clocks –vc.j.k denotes the knowledge that j has about the clock of k –vc.j.j, thus, denotes the clock of.
Synchronization Chapter 5. Outline 1.Clock synchronization 2.Logical clocks 3.Global state 4.Election algorithms 5.Mutual exclusion 6.Distributed transactions.
Synchronization Chapter 5.
Lecture 10 – Mutual Exclusion Distributed Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Synchronization. Clock Synchronization In a centralized system time is unambiguous. In a distributed system agreement on time is not obvious. When each.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Lecture on Synchronization Submitted by
Revisiting Logical Clocks: Mutual Exclusion Problem statement: Given a set of n processes, and a shared resource, it is required that: –Mutual exclusion.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
Distributed Systems 31. Theoretical Foundations of Distributed Systems - Coordination Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano.
CSE 486/586 Distributed Systems Leader Election
Chapter 11 Coordination Election Algorithms
Outline Distributed Mutual Exclusion Introduction Performance measures
Chapter 5 Continued Termination Detection Election algorithms
CSE 486/586 Distributed Systems Leader Election
Lecture 10: Coordination and Agreement
Synchronization (2) – Mutual Exclusion
Prof. Leonardo Mostarda University of Camerino
Lecture 11: Coordination and Agreement
Distributed Mutual eXclusion
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

DC6: Chapter 12 Coordination Election Algorithms Distributed Mutual Exclusion Consensus Group Communication

6-2 Leader Election In many distributed algorithms, one node acts as a coordinator or leader. It often doesn’t matter which node performs this function. After a network partition, the leader-less partition must elect a leader.

6-3 Election Algorithms This is the way nodes in a DS choose and agree on a new coordinator when the old one failed or was cut out of the network. In the following algorithms, each processor (node) has a unique ID. The highest or lowest surviving processor ends up the coordinator. Assumptions: Communications are reliable (messages are not dropped or corrupted).

6-4 Election Algorithms - Bully(1) (Garcia-Molina) “Node with highest ID bullies his way into leadership”. Assumptions: A live processor will respond to a message in a predetermined amount of time (including delivery time). [synchronous] Each node can be in one of four states: –Down (just recovered, unaware of who coordinator is); –Election (in process of election leader); –Normal.

6-5 The Bully Algorithm(2) Each node in Normal mode keeps the ID of the current coordinator. Each node knows the ID of all the other nodes in the DS. Any node, say P1, that notices that the current coordinator is not responding can initiate an election.

6-6 The Bully Algorithm(3) 1.P1 sends an election message with his ID to all higher numbered nodes. If a node gets an election message from a lower numbered node, it responds “OK”, which means “I will take over from here”, and starts his own election. 2.If no higher numbered node responded, P1 sends messages to all lower numbered nodes declaring himself the new coordinator. They respond with an ack so that the new leader knows what the group membership is.

6-7 The Bully Algorithm (4) The bully election algorithm a)Process 4 holds an election b)Process 5 and 6 respond, telling 4 to stop c)Now 5 and 6 each hold an election

6-8 The Bully Algorithm (5) d)Process 6 tells 5 to stop e)Process 6 wins and tells everyone

6-9 The Bully Algorithm(6) In a network of N nodes, if the leader fails and #N-1 starts an election, he will send one message to the old leader (to give him one last chance to respond) and N-2 messages declaring himself leader. He will get N-2 responses. So the cost is about 2N messages. If the lowest numbered node is the first to notice that the leader failed, he will initiate an election (N-1 messages) with the result that everyone responds (N-1 messages) and they all begin elections and get responses (  I=1 to N of 2I). The high numbered node now sends everyone a messages declaring himself leader and gets acks (plus 2N). So the number of messages could be as high as N(N+1) + 2N or N 2 +2N.

6-10 A Ring Election Algorithm Nodes are physically or logically organized in a ring. Nodes might not know the number of nodes in the ring and what all of the IDs are; they only know their own ID and how to find their clockwise and counterclockwise neighbors. The ring can be unidirectional, in which case, they only need to be able to send messages to their clockwise neighbor. Communication can be asynchronous, but failures are not tolerated.

6-11 A Ring Election Algorithm Node states are: Normal, Election, Leader. Any node that notices that the leader is not functioning, changes his state to Election, starts an election message containing his ID and sends it to his clockwise neighbor election

6-12 A Ring Election Algorithm When a node receives an election message: If his ID is higher than the ID in the message, he replaces the ID with his own, changes his state to Election and sends it to his clockwise neighbor. If his ID is lower than the message ID, he relays it to his clockwise neighbor (changing his state to Election). If his ID is the same as the ID in the message, he changes his state to Leader and sends out a “I am leader” message” to his clockwise neighbor election:5

6-13 A Ring Election Algorithm When a node receives a “I am leader message”, he records the ID of the new leader and changes state to Normal, and forwards the message (unless he is the leader). Variation has all ID’s in election token Election 0,3,4,5

6-14 A Ring Algorithm Election algorithm using a ring.

6-15 The Ring Algorithm: Complexity In the best case, only the node with the highest ID starts an election message, so the number of messages is 2N. –If messages go clockwise and his counterclockwise neighbor has highest ID, number of messages is 3N-1 In the worst case, N nodes start an election message resulting in O(N 2 ).

6-16 Ring Algorithm Modifications can be made to the algorithm to eliminate duplicate election messages. Ie, if a node in election mode sees another election message with the same info in less time than it takes to traverse the ring, it doesn’t forward it. Timing must be closely watched since highest ID node could have died and this is a new election.

6-17 The Ring Algorithm: Limitations We must assume: –The ring is reformed after the failure of the leader (which brings about the election) –There are no failures during the election process, or –If there is a failure during the election It is quickly detected The election is called off The ring is reformed and The election is restarted from the beginning

6-18 The Ring Algorithm: Modifications LCR Ring Election Assumptions: the ring size can be unknown. The communications are unidirectional. Each node sends a message with its ID around the ring. When a process receives an incoming message, it compares the ID with its own. If the incoming ID is greater than its own, it passes it to the next node; if it is less than its own, it discards it; if it is equal to its own, it declares itself leader Elect 3 Elect 5 Elect 0

6-19 LCR Ring Election What is the message complexity? Look ordered nodes. If messages are passed clockwise…only one survives after the first round. If messages are passed counter-clockwise... Best case O(N), worst case O(N 2 ) Elect 0 Elect 1 Elect 2 Elect 3

6-20 HS Ring Election (1) Hirschberg Sinclair Algorithm (ring modification) O(N 2 ) is a lot of messages. Here is a modification that is O(N log N). Assumptions: the ring size can be unknown. The communications must be bidirectional. All nodes start more or less at the same time. Each node operates in phases and sends out tokens. The tokens carry hop- counts and direction flags in addition to the ID of the sender. 3 ID=3,2 hops clockwise ID=3,2 hops counterclckws

6-21 HS Ring Election (2) Phases are numbered 0, 1, 2, 3, … In each phase, k, node j sends out tokens u j containing its ID in both directions. The tokens travel 2 k hops then return to their origin j. If both tokens make it back, process j continues with the next phase (increments k). If both tokens do not make it back, process j simply waits to be told who the results of the election. 3 3 x x

6-22 HS Ring Election (3) If a process m receives a token u j going in the outbound direction, it compares the token’s ID with its own. –If it has a larger ID, it simply discards the token. –If it has a smaller ID, it relays the token as requested. –If it is equal to the token ID, it has received its own token in the outbound direction, so the token has gone clear around the ring and the process declares itself leader. All processes always relay inbound tokens. 4 ID=3,2 hops clockwise

6-23 HS Ring Election (4) Communications Complexity: In the first phase, every process sends out 2 tokens and they go one hop and return. This is a total of 4N messages for the tokens to go out and return. In phase k, where k>0, a node sends out tokens if it was not overruled in the previous phase, that is by a process within a distance of 2 k-1 in either direction. This implies that within group of 2 k-1 +1consecutive nodes, at most one goes on to send out tokens in phase k. This limits the message complexity to O(N log N). (see Distributed Algorithms, by Nancy Lynch).

6-24 Election Summary Bully –Must know ID’s of all other nodes –Message transit and response time bounded –Any node initiates election –Best: 2N Worst: N 2 Ring – basic –Logical ring, nodes only know neighbors –If failure, ring must reform then restart election –Election msgs travel around ring to find highest ID.

6-25 Election Summary LCR Ring – Like basic ring but –Many nodes start election at the same time –Election msg with lower ID is discarded –Best: N Worst: N 2 HS Ring – improve worst case –Comm must be bidirectional –Works in phases – comm nearly synchronous –Worst case is N log N

6-26 Mutual Exclusion in DS Mutual exclusion is needed for restricting access to a shared resource. We use semaphores, monitors and similar constructs to enforce mutual exclusion on a centralized system. We need the same capabilities on DS. As in the one processor case, we are interested in safety (mutual exclusion), progress, and bounded waiting (fairness).

6-27 Mutual Exclusion Centralized algorithm: The easiest way is to mimic the uniprocessor solution. Have one node be the coordinator. Any process that wants to access the resource has to get permission from the coordinator. The coordinator can use semaphores and other structures to ensure mutual exclusion, progress, and fairness. Easy solution, we know how to implement. Works if resource is closely associated with the coordinator. But, coordinator is a single point of failure – if coordinator fails, processes may just wait forever for permission. 4 May I, please?

6-28 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c)When process 1 exits the critical region, it tells the coordinator, who then replies to 2

6-29 Ricart and Agrawala Timestamp algorithm (1) Assumption: there is a total ordering of all events in the system (Lamport’s timestamps will provide this). Communications are reliable. Each process must maintain a queue for each critical region or resource if there is more than one resource to be shared resource

6-30 Ricart and Agrawala (2) When a process wants to enter the Critical Region or obtain a resource, it sends a message with its ID and a total order Lamport timestamp (t,pid) to all processes including itself (multicast). It can proceed to enter the CR when it gets an “OK” message from all other processes. When it is done with the CR, it sends an “OK” message to every process on its wait queue and removes them from the queue.

6-31 Ricart and Agrawala (3) When a process, P1, receives a request for the resource from process, P2: –If P1 is not in the CR and does not want the CR, it sends back an “OK” message. –If P1 is currently in the CR, it does not reply, but queues P2’s request. –If P1 wants to enter the CR but has not yet received all the permissions, it compares the timestamp in P2’s message with the one in the message that P1 sent out to request the CR. The lowest timestamp wins. If TS(P1) < TS(P2), then P2’s message is put on the queue. If TS(P1) > TS(P2), then P1 sends P2 an “OK” message.

6-32 Ricart and Agrawala (4) a)Two processes, 0 and 2, want to enter the same critical region at the same moment. b)Process 0 has the lowest timestamp, so it wins. c)When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

6-33 Ricart and Agrawala (5) Message complexity: 2(N-1). This simple algorithm illustrates Symmetric Information ie, every process has the same information. Algorithm ensures: –mutual exclusion (no 2 have the lowest timestamp) –progress (someone has the lowest timestamp) –bounded waiting

6-34 Token Ring for Mutual Exclusion(1) Assumption: Processes are ordered in a ring. Communications are reliable and can be limited to one direction. Size of ring can be unknown and each process is only required to know his immediate neighbor. A single token circulates around the ring (in one direction only) token

6-35 Token Ring (2) a)An unordered group of processes on a network. b)A logical ring constructed in software.

6-36 Token Ring (3) When a process has the token, he can enter the CR at most once. Then he must pass the token on. Only the process with the token can enter the CR, thus Mutual Exclusion is ensured. Bounded waiting since the token circulates. Liveness: as long as the process with the token doesn’t fail, progress is ensured. Global snapshots can be used if a lost token is suspected token

6-37 Comparison A comparison of three mutual exclusion algorithms. Algorithm Messages per entry/exit Delay before entry (in message times) Problems Centralized32Coordinator crash Distributed2 ( n – 1 ) Crash of any process Token ring 1 to  0 to n – 1 Lost token, process crash

6-38 Voting for Mutual Exclusion In the Ricart Agrawala algorithm, a process has to get permission from all other processes. This is overkill, it may take a long time to contact every other node to get permission. One slow or faulty process can mess things up. A different approach is to let processes that want to enter the CR compete for votes. If you know you have received more votes than any other process, you can enter the CR. If you don’t have enough votes, you wait until the process in the CR is done and releases his votes.

6-39 Voting for Mutual Exclusion Potential problems: You must be sure you have more votes than any other process to enter the CR: In a system of 9 nodes, if P1 has 4 and P2 has 3 and P3 has 2, P1 has the most votes, but how does he know without communicating (costly) with other contenders? Just having 4 votes is not enough: what if P1 has 4 and P2 has 5 ? Potential solution: require a simple majority to win. But 4 is not a majority of 9, so in this example, no one can go. Problem: processes are deadlocked. Must be a way to resolve this kind of deadlock.

6-40 Voting: Quorums Do we need to get a majority of votes or is there some smaller set of votes that will do? Different nodes could have different voting districts as long as any two districts have a non-empty intersection. Quorums have the property that any 2 have a non- empty intersection. Simple majorities are quorums. Any 2 sets whose sizes are simple majorities must have at least one element in common.

6-41 Voting: Majority Quorum Majority quorums have at least one node in common. There are 12 nodes, so a majority is 7.

6-42 Quorums Defined by Physical or Logical Arrangement Grid quorum: arrange nodes in logical grid (square). A quorum is all of a row and all of a column. Quorum size is 2*sqrt(n) –1. Finite Projective Plane (Maekawa): if N=7, form coteries of 3

6-43 Voting Deadlock: Timestamp Resolution When a process makes a request, it attaches a Lamport timestamp. Voters will prefer candidates with the smaller timestamp. If voter V has voted for P1 and then receives a request for vote from P2 with an earlier timestamp, V will try to retrieve its vote. V retrieves his vote by sending an INQUIRE message to P1. If P1 has not yet received all the needed votes, he must relinquish V’s vote, in which case, V now gives his vote to P2. This avoids deadlock. When the P1 is finished with the CR, he sends release messages to all his voters, so they can give their votes to new candidates.

6-44 Voting Deadlock: Anti-quorum Resolution An anti-quorum is any set of nodes that has a non- empty intersection with all quorums. A voter votes YES to one process and NO to other processes seeking the same resource. When process gets a quorum of YES votes: proceeds to the CR. When he gets an anti-quorum of NO votes, he knows he will not get enough YES votes, so he “withdraws his candidacy” and releases his votes. After waiting a specified time, he tries again to gain enough votes.

6-45 Mutual Exclusion Summary Ricart and Agrawala –Pro: all nodes participate –Con: all nodes must participate Token Ring –Pro: nodes can know only neighbors –Con: token can take a long time to circulate even if no nodes uses it Voting –Majority quorum or other –Deadlock resolution: timestamp or “yes/no” vote and anti-quorum