LEADER ELECTION CS 2711
Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the job, just need to pick one Election algorithms: technique to pick a unique coordinator (aka leader election) Types of election algorithms: Bully and Ring algorithms CS 2712
Bully Algorithm Each process has a unique numerical ID Processes know Ids and address of all other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I won Processes can initiate elections simultaneously – Need consistent result CS 2713
Bully Algorithm Details Any process P can initiate an election P sends Election messages to all process with higher Ids and awaits OK messages If no OK messages, P becomes coordinator & sends I won to all process with lower Ids If it receives OK, it drops out & waits for I won If a process receives Election msg, it returns OK and starts an election If a process receives I won then sender is coordinator CS 2714
Bully Algorithm Example a)Process 4 holds an election b)Process 5 and 6 respond, telling 4 to stop c)Now 5 and 6 each hold an election CS 2715
Bully Algorithm Example d)Process 6 tells 5 to stop e)Process 6 wins and tells everyone CS 2716
Simple Ring-based Election Processes have unique Ids and arranged in a logical ring Each process knows its neighbors Select process with highest ID as leader Begin election if just recovered or coordinator has failed Send Election to closest downstream node that is alive – Sequentially poll each successor until a live node is found Each process tags its ID on the message Initiator picks node with highest ID and sends a coordinator message Multiple elections can be in progress —no harm. CS 2717
Ring Algorithm Example CS 271 8
Ring Algorithm Example CS 271 9
Comparison Assume n processes and one election in progress Bully algorithm – Worst case: initiator is node with lowest ID Triggers n-2 elections at higher ranked nodes: O(n 2 ) msgs – Best case: immediate election: n-2 messages Ring – 2 (n-1) messages always CS 27110
Highlights of Leader Election Basic idea: each process has a unique process-id. Once leader is discovered died, elect process with highest (lowest) process-id. CS 27111
BROADCAST PROTOCOLS CS 27112
Broadcast Protocols Why Broadcast protocols? – Data replication – Highly available servers – Cluster management – Distributed logging – …… Sometimes, message is received, but delivered later to satisfy some order requirements. CS 27113
Ordering properties: FIFO(Cornell) Fifo or sender ordered multicast: fbcast Messages are delivered in the order they were sent (by any single sender) pqrspqrs ae CS 27114
Ordering properties: FIFO pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered CS 27115
Limitations of FIFO Broadcast Scenario: User A broadcasts a message to a mailing list B delivers that message B broadcasts reply C delivers B’s response without A´s original message and misinterprets the message CS 27116
Ordering properties: Causal Causal or happens-before ordering: cbcast If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a b CS 27117
Ordering properties: Causal pqrspqrs a bc delivery of c to p is delayed until after b is delivered CS 27118
Ordering properties: Causal pqrspqrs a bc e delivery of c to p is delayed until after b is delivered e is sent (causally) after b CS 27119
Ordering properties: Causal pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered delivery of e to r is delayed until after b&c are delivered CS 27120
Limitation of Causal Broadcast Causal broadcast does not impose any order on unrelated messages. Two replicas can deliver operations/request in different order. CS 27121
Ordering properties: Total Total or locally total multicast: atomic bcast Messages are delivered in same order to all recipients (including the sender) pqrspqrs a b c d e all deliver a, b, c, d, then e CS 27122
Simple Causal broadcast protocol Each broadcast message carries all causally preceding messages Before delivery, ensure causality by delivering any missed causally preceding messages. CS 27123
Isis Causal Broadcast Each process maintains a time vector of size n. Initially VT[i] = 0. When p sends a new message m: VT[p]++ Each message is piggybacked with VT m which is the current VT of the sender. When p delivers a message, p updates its vector: for k in 1..n: – VT p [k] = max{ VT p [k], VT m [k] }. CS 27124
Isis Causal Order Requirement for delivery at node j: – VT sender [sender] = VT receiver [sender]+1 This is the next message from sender – VT sender [k] =< VT receiver [k] for all k not sender Receiver has received all causally preceding messages send er recei ver VT sender VT receiver CS 27125
Total order Different classes of total order broadcast: – Fixed sequencer – Moving sequencer using Token – Dstributed agreement using Timestamp CS 27126
Using Sequencer (Amoeba) Delivery algorithm similar to FIFO except for using a special “sequencer” to order messages Sender attaches unique id i to each message m and sends to the sequencer as well as to all destinations Sequencer maintains sequence number S (consecutive and increasing) and broadcast to all destinations. Message(k) is delivered – if all messages(j) (0 j < k) are received CS 27127
Distributed Total Order Protocol (ISIS) Processes collectively agree on sequence numbers (priority) in three rounds Sender sends message to all receivers; Receivers suggest priority (sequence number) and reply to sender with proposed priority; Sender collects all proposed priorities; decides on final priority (breaking ties with process ids), and resends the agreed final priority for message m Receivers deliver message m according to decided final priority CS 27128
ISIS algorithm for total ordering Message 2 Proposed Seq P 2 P 3 P 1 P 4 3 Agreed Seq 3 3 Group g: P1, P2, P3, P4 CS 27129