 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos and Authenticated Byzantine Paxos Spring 2008 Prof. Idit Keidar

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 2 Material The Part-Time Parliament Lamport, TOCS 1998 Practical Byzantine Fault-Tolerance Castro and Liskov, OSDI 1999 The ABCDs of Paxos Lampson 2001

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 3 SMR (Atomic Broadcast) by Running a Sequence of Consensus Instances

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 4 Reminder: State-Machine Replication (SMR) Data is replicated at n servers Operations are initiated by clients Operations need to be performed at all correct servers in the same order Servers need to agree upon the sequence of operations

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 5 Reminder: Paxos Algorithm for state machine replication with eventual synchrony (ES) –Uses  failure detector (leader election) Overcomes transient crashes & recoveries and message loss Main component: (one-shot) consensus protocol (aka Synod) –Last week

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 6 The 2 Phases of Paxos (Synod) 11 2 n...... (“accept”, b, v) 1 2 n...... 11 2 n...... (“prepare”, b) (“ack”, b, n’, v’) (“accept”, b, v) Phase 1: Learn about smaller ballotsPhase 2: Majority accepts v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 7 SMR: Client-Server Interaction Leader-based: each process (client/server) has an estimate of who is the current leader A client sends a request to its current leader –E.g., “store X 100” The leader runs the Paxos (Synod) consensus algorithm to agree on the place of the request in the sequence –Input value: request + proposed sequence number –E.g., “store X 100” is the 7 th operation in the sequence The leader sends the response to the client –After invoking the operation on its copy

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 8 Consensus per Request Number Many consensus instances are running at the same time, each for some request number, ReqNum Each node has unbounded arrays –AcceptNum[r], AcceptVal[r], r = 1,2, … –AcceptVal holds the client’s requested operation E.g., “store X 100” Invoke operations on the state machine –In order: AcceptVal[1], then AcceptVal[2], etc. –After the respective consensus algorithm decides –Then send outcome as response to client (leader only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 9 Failure-Free Message Flow S1 S2 Sn...... C S1 S2 Sn...... S1 S2 Sn...... (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) C Phase 1Phase 2 request response (“accept”, b, r, v) Client’s request ReqNum

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 10 Observation In Phase 1, no new consensus values sent: –Leader chooses largest unique ballot number –Gets a majority to join this ballot number –Learns the outcome of all smaller ballots from this majority In Phase 2, leader proposes either its initial value (request from client) or latest value it learned in Phase 1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 11 Failure-Free Message Flow S1 S2 Sn...... C S1 S2 Sn...... S1 S2 Sn...... (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) C Phase 1Phase 2 request response (“accept”, b, r, v)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 12 Message Flow: Take 2 S1 S2 Sn...... C S1 S2 Sn...... S1 S2 Sn...... C Phase 1 Phase 2 request response S1 (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) (“accept”, b, r, v)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 13 Optimization Run Phase 1 only when the leader changes –Phase 1 is called “view change” or “recovery mode” –Phase 2 is the “normal mode” Each message includes BallotNum (from the last Phase 1) and ReqNum –E.g., ReqNum = 7 when we’re trying to agree what the 7 th operation to invoke on the state machine should be Respond only to messages with the “right” BallotNum

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 14 Paxos Atomic Broadcast: Normal Mode Upon receive (“request”, v) from client if (I am not the leader) then forward to leader else /* propose v as request number n */ ReqNum  ReqNum +1; send (“accept”, BallotNum, ReqNum, v) to all Upon receive (“accept”, b, r, v) with b = BallotNum /* accept proposal for request number n */ AcceptNum[r]  b; AcceptVal[r]  v send (“accept”, b, r, v) to all (first time only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 15 Recovery Mode Run once per ballot, for all request numbers –No ReqNum in “prepare” and “ack” messages The new leader must learn the outcome of all the pending requests that have smaller BallotNums –The “ack” messages include AcceptNum[r] and AcceptVal[r] for all pending requests For each of the pending requests, the leader sends an “accept” message What if there are holes? –E.g., leader learns of request number 13 and not of 12 –Fill in the gaps with dummy “do nothing” requests

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 16 Practical Byzantine Fault-Tolerance Aka Byzantine Paxos

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 17 Reminder: Byzantine Faults Faulty process can behave arbitrarily, i.e., they don’t have to follow the protocol. E.g., –Can suffer benign failures – crash, timing –Can send bogus values in messages –Can send messages at the wrong time –Can send different messages to different processes, etc. Captures software bugs, hacker intrusions

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 18 Reminder: Authenticated (Byzantine) Model Authentication: The receiver of a message can ascertain its origin –An intruder cannot masquerade as someone else Integrity: The receiver of a message can verify that it has not been modified in transit; –An intruder cannot substitute a false message for a legitimate one Nonrepudiation: A sender cannot falsely deny later that he sent a message

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 19 Byzantine Fault-Tolerant Consensus: Overview of Results Synchronous t-resilient algorithm – –iff t < n with authentication and weak unanimity –iff t < n/2 with authentication and strong unanimity –iff t < n/3 without authentication Eventually synchronous (ES) t-resilient algorithm –iff t < n/3 with or without authentication Homework problem: show the lower bound

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 20 Overcoming Byzantine Failures With 3t+1 Processes Recall what we did for crash failures – –We gathered “votes” from a majority in every ballot –Since every two majorities intersect, for every two ballots, at least one process votes in both But now, a faulty process can lie about what it did in the other ballot –We want a correct process in the intersection –Since n-t ≥ 2t+1, two sets of size n-t intersect by at least one correct process –Gather n-t votes in a ballot, to ensure that for every two ballots, at least one correct process votes in both

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 21 Byzantine Paxos Setting State machine replication Structured like Paxos: –Updates are sent to the current leader –Leader uses a consensus algorithm to have all replicas agree on the order of updates –Our focus today is the Consensus algorithm Used to implement BFS – Byzantine Fault Tolerant NFS –Only 3% slower than un-replicated NFS

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 22 Model n processes: {1,…n} Up to t Byzantine failures, t < n/3 –For simplicity, assume n = 3t+1 Authentication (PKI) Reliable links, no recovery (for now)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 23 Reminder: Classic Paxos Phase I Periodically, until decision is reached do: if leader (by  ) then BallotNum   BallotNum.num+1, myId  send (“prepare”, BallotNum) to all Upon receive (“prepare”, b) from i if b  BallotNum then BallotNum  b send (“ack”, b, AcceptNum, AcceptVal) to i

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 24 Reminder: Classic Paxos Phase II Upon receive (“ack”, BallotNum, b, val) from n-t if all vals =  then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ Upon receive (“accept”, b, v) with b  BallotNum AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all (first time only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 25 How Can Byzantine Failures Cause Problems?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 26 Safety Problems: Leader Can Lie Problem 1: Leader can choose a value different than the highest accepted by n-t processes –Solution: Can “prove” he’s not lying by sending the signed “ack” (Phase 1) messages to all processes Problem 2: If no previous ballot was accepted, leader can send different new values to different processes

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 27 Solution to the 2 nd Problem Before accepting a value proposed by the leader, verify that the value was proposed to “enough” processes Byzantine Paxos Phases: –Phase 1: Prepare –Phase 2: Propose – echo leader’s proposal –Phase 3: Accept – now only if n-t proposed Add new variable: PropNum, initially 0

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 28 Safety Problems: Others Can Lie Problem 3: Faulty users can send invalid “accept” messages –Solution: Wait for n-t=2t+1 “accept” messages Problem 4: Faulty users can send invalid values with higher AcceptNums in “ack” messages –Solution: Can “prove” value is valid by forwarding signed “propose” messages –Add new variable: Proof, initially empty set

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 29 Liveness Problems Problem 5: Faulty leader can deadlock algorithm –Solution: Propose a new leader when the current does not deliver –Use rotating coordinator until one is correct, leader will be (BallotNum mod n)+1 Problem 6: Faulty processes may keep selecting new leaders all the time (livelock) –Solution: Accept a new ballot only if t+1 processes propose a new leader

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 30 And Now For Our Feature Presentation The Byzantine Paxos Consensus Algorithm

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 31 Byzantine Paxos Variables Int BallotNum, initially 0 Int PropNum, initially 0 Int AcceptNum, initially 0 Value  {  } AcceptVal, initially  Message Set Proof, initially empty Define: Leader = (BallotNum mod n)+1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 32 Byzantine Paxos Phase I: Prepare Upon timeout on Leader BallotNum  BallotNum +1 send (“prepare”, BallotNum) to all Upon receive (“prepare”, b) from t+1 if (b < BallotNum) then return if (b > BallotNum) then BallotNum  b send (“prepare”, BallotNum) to all send (“ack”, b, AcceptNum, AcceptVal, Proof) to Leader

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 33 Byzantine Paxos Phase II: Propose Upon receive (“ack”, BallotNum, b, val, proof) from n-t S = {received (signed) “ack” messages} if (all vals that have valid proofs in S are  then myVal  init value else myVal  val that has valid proof with highest b in S send (“propose”, BallotNum, myVal, S) to all Upon receive (“propose”, BallotNum, v, S) if (BallotNum  PropNum) then return if (v is not a valid choice given S) then return PropNum  BallotNum send (“propose”, BallotNum, v, S) to all

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 34 Byzantine Paxos Phase III: Accept Upon receive (“propose”, b, v, S) from n-t if (b < BallotNum) then return AcceptNum  b; AcceptVal  v Proof  set of n-t signed “propose” messages send (“accept”, b, v) to all Upon receive (“accept”, b, v) from n-t decide v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 35 In Failure-Free Runs accept 1 prepareackpropose 2 n...... 1 2 n...... 11 2 n...... 1 2 n...... 1 2 n...... All send prepare All echo propose

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 36 Some Optimizations Prepare and its “ack” can be merged into one message round Proofs don’t have to be sent with messages: processes can have the information to check the proofs locally because the original messages are multicast

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 37 Invariant If proposals (b,v) and (b, v’) are accepted by correct processes i and j, (possibly i = j ) then v’=v Proof: –An accepted proposal is proposed by n-t processes –Two sets of n-t = 2t+1 processes have at least one correct process in common –A correct process sends no more than one propose message with the same b

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 38 Lemma 1 If a proposal (b,v) is accepted by t+1 correct processes, then for every proposal (b’, v’) with b’>b that is proposed by a correct process, v’=v. Again, follows from Lemma 2…

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 39 Lemma 2 If a proposal (b,v) is proposed by a correct process, then there is a set S including at least t+1 correct processes such that either –(1) no correct p in S accepts a proposal ranked less than b; or –(2) v is the value of the highest-ranked proposal among proposals ranked less than b accepted by correct processes in S

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 40 Proving Lemma 1 from Lemma 2 Assume (b,v) is accepted by t+1 correct processes, and consider the lowest ranked proposal (b’, v’) with b’>b proposed by a correct process Since two sets of t+1 correct processes have at least one correct process in common, case (1) of Lemma 2 is impossible, and by case (2), v’=v Continue by induction on ballot number

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 41 Proving Agreement Let v be a decided value. The first process that decides v receives a n-t accept messages for v with some ballot b, i.e., (b,v) is accepted by at least t+1 correct processes No other value is accepted by a correct process with the same b. Why? Let (v 1,b 1 ) be the first proposal accepted by n-t By Lemma 1, v 1 is the only possible decision value

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 42 Liveness Is the current leader making progress? –If yes, some correct process decides. This process can periodically forward the “proof” for its decision to others so they will decide too. –If not, all timeout on the leader and start a new ballot. Once there is a correct leader. –The n-t correct processes will send all the needed messages. –The t faulty processes will not be able to force a new ballot.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 43 Atomic Broadcast: Issues Leader can propose invalid client requests Leader can refrain from proposing client requests Leader can lie to client about response Leader can refrain from sending client responses Solution: clients cannot trust a single server

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 44 Byzantine Message Flow accept S1 prepareackpropose S2 Sn...... S1 S2 Sn...... S1 S2 Sn...... S1 S2 Sn...... propose S1 S2 Sn...... S1 request S2 Sn...... C C response

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

Similar presentations

Presentation on theme: " Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

Similar presentations

Presentation on theme: " Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos."— Presentation transcript:

Similar presentations

About project

Feedback