Download presentation
Presentation is loading. Please wait.
1
Chapter 11 Coordination Election Algorithms
Distributed Mutual Exclusion Consensus Group Communication
2
Leader Election In many distributed algorithms, one node acts as a coordinator or leader. It often doesn’t matter which node performs this function. After a network partition, the leader-less partition must elect a leader.
3
Election Algorithms This is the way nodes in a DS elect a new coordinator when the old one failed or was cut out of the network. In the following algorithms, each processor (node) has a unique ID. The highest or lowest surviving processor ends up the coordinator. Assumptions: Communications are reliable (messages are not dropped or corrupted).
4
Election Algorithms - Bully(1)
(Garcia-Molina) “Node with highest ID bullies his way into leadership”. Assumptions: A live processor will respond to a message in a predetermined amount of time (including delivery time). [synchronous] Each node can be in one of four states: Down (just recovered, unaware of who coordinator is); Election (in process of election leader); Reorganization (election complete, tasks being assigned); Normal.
5
The Bully Algorithm(2) Each node in Reorganization or Normal mode keeps the ID of the current coordinator. Each node knows the ID of all the other nodes in the DS. Any node, say P1, that notices that the current coordinator is not responding can initiate an election.
6
The Bully Algorithm(3) P1 sends an election message with his ID to all higher numbered nodes. If a node gets an election message from a lower numbered node, it responds “OK”, which means “I will take over from here”, and starts his own election. If no higher numbered node responded, P1 sends messages to all lower numbered nodes declaring himself the new coordinator. They respond with an ack so that the new leader knows what the group membership is.
7
The Bully Algorithm (4) The bully election algorithm
Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election
8
The Bully Algorithm (5) Process 6 tells 5 to stop
Process 6 wins and tells everyone
9
The Bully Algorithm(6) In a network of N nodes, if the leader fails and #N-1 starts an election, he will send one message to the old leader (to give him one last chance to respond) and N-2 messages declaring himself leader. He will get N-2 responses. So the cost is about 2N messages. If the lowest numbered node is the first to notice that the leader failed, he will initiate an election (N-1 messages) with the result that everyone responds (N-1 messages) and they all begin elections and get responses ( I=1 to N of 2I). The high numbered node now sends everyone a messages declaring himself leader and gets acks (plus 2N). So the number of messages could be as high as N(N+1) + 2N or N2 +2N.
10
A Ring Election Algorithm
Nodes are physically or logically organized in a ring. Nodes might not know the number of nodes in the ring and what all of the IDs are; they only know their own ID and how to find their clockwise and counterclockwise neighbors. The ring can be unidirectional, in which case, they only need to be able to send messages to their clockwise neighbor. Communication can be asynchronous, but failures are not tolerated.
11
A Ring Election Algorithm
Node states are: Normal, Election, Leader. Any node that notices that the leader is not functioning, changes his state to Election, starts an election message containing his ID and sends it to his clockwise neighbor. 3 election 5
12
A Ring Election Algorithm
When a node receives an election message: If his ID is higher than the ID in the message, he replaces the ID with his own, changes his state to Election and sends it to his clockwise neighbor. If his ID is lower than the message ID, he relays it to his clockwise neighbor (changing his state to Election). If his ID is the same as the ID in the message, he changes his state to Leader and sends out a “I am leader” message” to his clockwise neighbor. election:5 3 5
13
A Ring Election Algorithm
When a node receives a “I am leader message”, he records the ID of the new leader and changes state to Normal, and forwards the message (unless he is the leader). Variation has all ID’s in election token. 3 Election 0,3,4,5 5
14
Election algorithm using a ring.
A Ring Algorithm Election algorithm using a ring.
15
The Ring Algorithm: Complexity
In the best case, only one node starts an election message, so the number of messages is 2N. This assumes that originator has highest ID If messages go clockwise and his counterclockwise neighbor has highest ID, number of messages is 3N-1 In the worst case, N nodes start an election message resulting in O(N2).
16
Ring Algorithm Modifications can be made to the algorithm to eliminate duplicate election messages. Ie, if a node in election mode sees another election message with the same info in less time than it takes to traverse the ring, it doesn’t forward it. Timing must be closely watched since highest ID node could have died and this is a new election.
17
The Ring Algorithm: Limitations
We must assume: The ring is reformed after the failure of the leader (which brings about the election) There are no failures during the election process, or If there is a failure during the election It is quickly detected The election is called off The ring is reformed and The election is restarted from the beginning
18
The Ring Algorithm: Modifications
LCR Ring Election Assumptions: the ring size can be unknown. The communications are unidirectional. Each node sends a message with its ID around the ring. When a process receives an incoming message, it compares the ID with its own. If the incoming ID is greater than its own, it passes it to the next node; if it is less than its own, it discards it; if it is equal to its own, it declares itself leader. Elect 0 3 Elect 3 Elect 5 5
19
LCR Ring Election What is the message complexity?
If messages are passed clockwise…only one survives after the first round. If messages are passed counter-clockwise... Best case O(N), worst case O(N2). 2 Elect 2 Elect 1 1 3 Elect 0 Elect 3
20
HS Ring Election (1) Hirschberg Sinclair Algorithm (ring modification)
O(N2) is a lot of messages. Here is a modification that is O(N log N). Assumptions: the ring size can be unknown. The communications must be bidirectional. All nodes start more or less at the same time. Each node operates in phases and sends out tokens. The tokens carry hop-counts and direction flags in addition to the ID of the sender. ID=3,2 hops counterclckws ID=3,2 hops clockwise 3
21
HS Ring Election (2) Phases are numbered 0, 1, 2, 3, … In each phase, k, node j sends out tokens uj containing its ID in both directions. The tokens travel 2k hops then return to their origin j. If both tokens make it back, process j continues with the next phase (increments k). If both tokens do not make it back, process j simply waits to be told who the results of the election. 3 x 3 x
22
HS Ring Election (3) If a process m receives a token uj going in the outbound direction, it compares the token’s ID with its own. If it has a larger ID, it simply discards the token. If it has a smaller ID, it relays the token as requested. If it is equal to the token ID, it has received its own token in the outbound direction, so the token has gone clear around the ring and the process declares itself leader. All processes always relay inbound tokens. ID=3,2 hops clockwise 4
23
HS Ring Election (4) Communications Complexity: In the first phase, every process sends out 2 tokens and they go one hop and return. This is a total of 4N messages for the tokens to go out and return. In phase k, where k>0, a node sends out tokens if it was not overruled in the previous phase, that is by a process within a distance of 2k-1 in either direction. This implies that within group of 2k-1+1consecutive nodes, at most one goes on to send out tokens in phase k. This limits the message complexity to O(N log N). (see Distributed Algorithms, by Nancy Lynch).
24
Election Summary Bully Ring – basic Must know ID’s of all other nodes
Message transit and response time bounded Any node initiates election Best: 2N Worst: N2 Ring – basic Logical ring, nodes only know neighbors If failure, ring must reform then restart election Election msgs travel around ring to find highest ID.
25
Election summary LCR Ring – Like basic ring but
Many nodes start election at the same time Election msg with lower ID is discarded Best: N Worst: N2 HS Ring – improve worst case Comm must be bidirectional Works in phases – comm nearly synchronous Worst case is N log N
26
Mutual Exclusion in DS Mutual exclusion is needed for restricting access to a shared resource. We use semaphores, monitors and similar constructs to enforce mutual exclusion on a centralized system. We need the same capabilities on DS. As in the one processor case, we are interested in safety (mutual exclusion), progress, and bounded waiting (fairness).
27
Mutual Exclusion Centralized algorithm: The easiest way is to mimic the uniprocessor solution. Have one node be the coordinator. Any process that wants to access the resource has to get permission from the coordinator. The coordinator can use semaphores and other structures to ensure mutual exclusion, progress, and fairness. Easy solution, we know how to implement. Works if resource is closely associated with the coordinator. But, coordinator is a single point of failure – if coordinator fails, processes may just wait forever for permission. May I, please? 4
28
Mutual Exclusion: A Centralized Algorithm
Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, who then replies to 2
29
Ricart and Agrawala Timestamp algorithm (1)
Assumption: there is a total ordering of all events in the system (Lamport’s timestamps will provide this). Communications are reliable. Each process must maintain a queue for each critical region or resource if there is more than one resource to be shared. resource 1 2
30
Ricart and Agrawala (2) When a process wants to enter the Critical Region or obtain a resource, it sends a message with its ID and a total order Lamport timestamp (t,pid) to all processes. It can proceed to enter the CR when it gets an “OK” message from all other processes. When it is done with the CR, it sends an “OK” message to every process on its wait queue and removes them from the queue.
31
Ricart and Agrawala (3) When a process, P1, receives a request for the resource from process, P2: If P1 is not in the CR and does not want the CR, it sends back an “OK” message. If P1 is currently in the CR, it does not reply, but queues P2’s request. If P1 wants to enter the CR but has not yet received all the permissions, it compares the timestamp in P2’s message with the one in the message that P1 sent out to request the CR. The lowest timestamp wins. If TS(P1) < TS(P2), then P2’s message is put on the queue. If TS(P1) > TS(P2), then P1 sends P2 an “OK” message.
32
Ricart and Agrawala (4) Two processes, 0 and 2, want to enter the same critical region at the same moment. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
33
Ricart and Agrawala (5) Message complexity: 2(N-1).
This simple algorithm illustrates Symmetric Information ie, every process has the same information. Algorithm ensures: mutual exclusion (no 2 have the lowest timestamp) progress (someone has the lowest timestamp) bounded waiting
34
Token Ring for Mutual Exclusion(1)
Assumption: Processes are ordered in a ring. Communications are reliable and can be limited to one direction. Size of ring can be unknown and each process is only required to know his immediate neighbor. A single token circulates around the ring (in one direction only). 3 token 5
35
Token Ring (2) An unordered group of processes on a network.
A logical ring constructed in software.
36
Token Ring (3) When a process has the token, he can enter the CR at most once. Then he must pass the token on. Only the process with the token can enter the CR, thus Mutual Exclusion is ensured. Bounded waiting since the token circulates. Liveness: as long as the process with the token doesn’t fail, progress in ensures. Global snapshots can be used if a lost token is suspected. 3 token 5
37
Messages per entry/exit Delay before entry (in message times)
Comparison Algorithm Messages per entry/exit Delay before entry (in message times) Problems Centralized 3 2 Coordinator crash Distributed 2 ( n – 1 ) Crash of any process Token ring 1 to 0 to n – 1 Lost token, process crash A comparison of three mutual exclusion algorithms.
38
Voting for Mutual Exclusion
In the Ricart Agrawala algorithm, a process has to get permission from all other processes. This is overkill, it may take a long time to contact every other node to get permission. One slow or faulty process can mess things up. A different approach is to let processes that want to enter the CR compete for votes. If you know you have received more votes than any other process, you can enter the CR. If you don’t have enough votes, you wait until the process in the CR is done and releases his votes.
39
Voting for Mutual Exclusion
Potential problems: You must be sure you have more votes than any other process to enter the CR: In a system of 9 nodes, if P1 has 4 and P2 has 3 and P3 has 2, P1 has the most votes, but how does he know without communicating (costly) with other contenders? Just having 4 votes is not enough: what if P1 has 4 and P2 has 5 ? Potential solution: require a simple majority to win. But 4 is not a majority of 9, so in this example, no one can go. Problem: processes are deadlocked. Must be a way to resolve this kind of deadlock.
40
Voting: Quorums Do we need to get a majority of votes or is there some smaller set of votes that will do? Different nodes could have different voting districts as long as any two districts have a non-empty intersection. Quorums have the property that any 2 have a non-empty intersection. Simple majorities are quorums. Any 2 sets whose sizes are simple majorities must have at least one element in common.
41
Voting: Quorums Grid quorum: arrange nodes in logical grid (square). A quorum is all of a row and all of a column. Quorum size is 2*sqrt(n) –1. Finite Projective Plane (Maekawa): if N=7, form coteries of 3
42
Voting Deadlock: Timestamp Resolution
When a process makes a request, it attaches a Lamport timestamp. Voters will prefer candidates with the smaller timestamp. If voter V has voted for P1 and then receives a request for vote from P2 with an earlier timestamp, V will try to retrieve its vote. V retrieves his vote by sending an INQUIRE message to P1. If P1 has not yet received all the needed votes, he must relinquish V’s vote, in which case, V now gives his vote to P2. This avoids deadlock. When the P1 is finished with the CR, he sends release messages to all his voters, so they can give their votes to new candidates.
43
Voting Deadlock: Anti-quorum Resolution
An anti-quorum is any set of nodes that has a non-empty intersection with all quorums. A voter votes YES to one process and NO to other processes seeking the same resource. When process gets a quorum of YES votes: proceeds to the CR. When he gets an anti-quorum of NO votes, he knows he will not get enough YES votes, so he “withdraws his candidacy” and releases his votes. After waiting a specified time, he tries again to gain enough votes.
44
Mutual Exclusion Summary
Ricart and Agrawala Pro: all nodes participate Con: all nodes must participate Token Ring Pro: nodes can know only neighbors Con: token can take a long time to circulate even if no nodes uses it Voting Majority quorum or other Deadlock resolution: timestamp or “yes/no” vote and anti-quorum
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.