Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform Consensus Spring 2008 Prof. Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Material Distributed Algorithms, Nancy Lynch –Ch. 6 Distributed Computing, Attiya and Welch –Ch. 5
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: State Machine Replication (SMR) Client A Client B atomic broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all replicas receive all client requests –What happens when a replica (server) fails? –What happens when a client fails? Order: replicas process requests in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Uniform Atomic Broadcast Uniform Reliable Broadcast –Validity: if a correct process broadcasts m then all correct processes eventually deliver m –Uniform Agreement: if any process delivers m then all correct processes eventually deliver m –Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast Uniform Total Order –If any two processes deliver both m and m’, they deliver them in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Problem: Uniform Consensus Each process has an input, should on decide an output (one-shot problem) Uniform Agreement: every two decisions are the same Validity: every decision is an input of one of the processes Termination: eventually all correct processes decide
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring (Uniform) Consensus versus (Uniform) Atomic Broadcast From Atomic Broadcast to Consensus From Consensus to Atomic Broadcast –Homework question From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast and SMR
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Model(s) Round-based synchronous Static set P = {p 1, …, p n } of processes Fault tolerance: 1. Crash failures, reliable links 2. Message loss
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round Synchronous Round-Based Model Synchronous rounds: –Send messages to any set of processes; –Receive messages from this round; –Do local processing (possibly decide, halt)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model 1 : Round-Based Failstop If process p i does not crash in round r, and p j does not crash in or before round r then any message sent by p i to p j in round r is received by p j in round r Note: If p i crashes in a round, then any subset of the messages p i sends in this round can be lost
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round-Based Failstop Model If no message from p j is received, then p j is suspected If p i fails in round r, then any subset of the messages p i sends in r may arrive If p i is suspected in round r, p i fails in round r or r-1 –No further messages from p i will arrive round 1round 2 p1 p2 p3 p 1 crashes in round 2; p 2 receives p 1 ’s round 2 message p 3 suspects p 1 in round 2
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Algorithm t is a threshold on the number of potential failures –The algorithm is correct as long as no more than t processes fail In the following algorithm, 0 ≤ t < n We denote by f the number of actual failures that occur in a given run, 0 ≤ f ≤ t We’d like t to be big (robust algorithm) –But f will usually be small (failures are rare)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Notation P = {p 1, …, p n } is the set of processes init i is p i ’s initial value (input) The decide action determines the output Show code for process p i Local variables of p i are denoted: v i, Alive i
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Failstop Uniform Consensus Algorithm v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( ( p j Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i {p j }
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Validity Lemma: For every process p i, v i always includes the initial value init j of some process p j.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement Lemma: –If exist value v, round r, and process p i s.t. –all processes that are in Alive i at the beginning of round r send v in round r, –then v is the only possible decision value from r onward.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement (Cont’d) From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward. Now look at the first round in which some process decides.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Termination After a round r in which no process fails, all processes have the same v i forever. –Because all receive the same messages in r, –By induction… Consider a run where f processes fail. –In f+2 rounds, there is at least one failure-free round followed by a round in which Alive i does not change for a correct process pi. –Thus, after at most f+2 rounds, there is a round in which Alive i does not change and all received values are the same.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Long Does it Take? Early-deciding: in a run with f failures, decision is reached by the end of round f+2 This is optimal –For Uniform Consensus, but not for Consensus –As long as f < t-1
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Deciding vs. Stopping (Halting) The algorithm is not early-stopping: –It continues running for t+2 rounds –Even after reaching a decision Homework question: can you change the algorithm to be early-stopping? –Stop (halt) after f+k rounds in runs with t≥f≥0 failures for some constant k
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model 2: Message Loss Aka “Two Generals Problem”
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Coordinated Attack Let’s attack A B
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model Two generals (processes) –Do not fail Synchronous –Pocessing and communication Lossy communication
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Coordinated Attack Problem Requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack then they must attack Motivation: atomic transaction commit in distributed databases [Gray 78]
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack is 2-Process Uniform Consensus Agreement: If both generals decide, they decide the same Termination: Every general eventually decides Validity: If both inputs are “not ready” the decision is “no attack”; if both inputs are “ready” then the decision is “attack”
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring A Simple Solution General A sends vote (“yes” or “no”) General B responds with his vote If both say yes, they attack Otherwise they do not Aka 2-phase commit Problems?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Not so Fast… Any number of messengers can be captured (message loss) Agreement impossible Proof on the board
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack Definition: Take II Revised requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack and no messages are lost then they must attack Note: this is not an assumption about the model. It’s a conditional requirement that has to hold only in runs in which no messages are lost. Proof on the board!