 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

Slides:

Advertisements

Similar presentations

CS 542: Topics in Distributed Systems Diganta Goswami.

Advertisements

Teaser - Introduction to Distributed Computing

Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.

6.852: Distributed Algorithms Spring, 2008 Class 7.

Distributed Systems Overview Ali Ghodsi

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.

Consensus Hao Li.

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: Paxos Spring.

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.

1 Principles of Reliable Distributed Systems Lecture 12: Disk Paxos and Quorum Systems Spring 2009 Idit Keidar.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Paxos Spring.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.

1 Principles of Reliable Distributed Systems Lecture 11: Disk Paxos and Quorum Systems Spring 2007 Prof. Idit Keidar.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

Practical Byzantine Fault Tolerance

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

Building Dependable Distributed Systems, Copyright Wenbing Zhao

SysRép / 2.5A. SchiperEté The consensus problem.

Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

The consensus problem in distributed systems

CS 525 Advanced Distributed Systems Spring 2013

CS 525 Advanced Distributed Systems Spring 2018

Distributed Systems, Consensus and Replicated State Machines

CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed systems Consensus

Presentation transcript:

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos and Authenticated Byzantine Paxos Spring 2008 Prof. Idit Keidar

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Material The Part-Time Parliament Lamport, TOCS 1998 Practical Byzantine Fault-Tolerance Castro and Liskov, OSDI 1999 The ABCDs of Paxos Lampson 2001

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR (Atomic Broadcast) by Running a Sequence of Consensus Instances

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: State-Machine Replication (SMR) Data is replicated at n servers Operations are initiated by clients Operations need to be performed at all correct servers in the same order Servers need to agree upon the sequence of operations

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Paxos Algorithm for state machine replication with eventual synchrony (ES) –Uses  failure detector (leader election) Overcomes transient crashes & recoveries and message loss Main component: (one-shot) consensus protocol (aka Synod) –Last week

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The 2 Phases of Paxos (Synod) 11 2 n (“accept”, b, v) 1 2 n n (“prepare”, b) (“ack”, b, n’, v’) (“accept”, b, v) Phase 1: Learn about smaller ballotsPhase 2: Majority accepts v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR: Client-Server Interaction Leader-based: each process (client/server) has an estimate of who is the current leader A client sends a request to its current leader –E.g., “store X 100” The leader runs the Paxos (Synod) consensus algorithm to agree on the place of the request in the sequence –Input value: request + proposed sequence number –E.g., “store X 100” is the 7 th operation in the sequence The leader sends the response to the client –After invoking the operation on its copy

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Consensus per Request Number Many consensus instances are running at the same time, each for some request number, ReqNum Each node has unbounded arrays –AcceptNum[r], AcceptVal[r], r = 1,2, … –AcceptVal holds the client’s requested operation E.g., “store X 100” Invoke operations on the state machine –In order: AcceptVal[1], then AcceptVal[2], etc. –After the respective consensus algorithm decides –Then send outcome as response to client (leader only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Failure-Free Message Flow S1 S2 Sn C S1 S2 Sn S1 S2 Sn (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) C Phase 1Phase 2 request response (“accept”, b, r, v) Client’s request ReqNum

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Observation In Phase 1, no new consensus values sent: –Leader chooses largest unique ballot number –Gets a majority to join this ballot number –Learns the outcome of all smaller ballots from this majority In Phase 2, leader proposes either its initial value (request from client) or latest value it learned in Phase 1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Failure-Free Message Flow S1 S2 Sn C S1 S2 Sn S1 S2 Sn (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) C Phase 1Phase 2 request response (“accept”, b, r, v)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Message Flow: Take 2 S1 S2 Sn C S1 S2 Sn S1 S2 Sn C Phase 1 Phase 2 request response S1 (“accept”, b, r, v) (“prepare”, b) (“ack”, b, n’, v’) (“accept”, b, r, v)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Optimization Run Phase 1 only when the leader changes –Phase 1 is called “view change” or “recovery mode” –Phase 2 is the “normal mode” Each message includes BallotNum (from the last Phase 1) and ReqNum –E.g., ReqNum = 7 when we’re trying to agree what the 7 th operation to invoke on the state machine should be Respond only to messages with the “right” BallotNum

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Paxos Atomic Broadcast: Normal Mode Upon receive (“request”, v) from client if (I am not the leader) then forward to leader else /* propose v as request number n */ ReqNum  ReqNum +1; send (“accept”, BallotNum, ReqNum, v) to all Upon receive (“accept”, b, r, v) with b = BallotNum /* accept proposal for request number n */ AcceptNum[r]  b; AcceptVal[r]  v send (“accept”, b, r, v) to all (first time only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Recovery Mode Run once per ballot, for all request numbers –No ReqNum in “prepare” and “ack” messages The new leader must learn the outcome of all the pending requests that have smaller BallotNums –The “ack” messages include AcceptNum[r] and AcceptVal[r] for all pending requests For each of the pending requests, the leader sends an “accept” message What if there are holes? –E.g., leader learns of request number 13 and not of 12 –Fill in the gaps with dummy “do nothing” requests

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Practical Byzantine Fault-Tolerance Aka Byzantine Paxos

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Byzantine Faults Faulty process can behave arbitrarily, i.e., they don’t have to follow the protocol. E.g., –Can suffer benign failures – crash, timing –Can send bogus values in messages –Can send messages at the wrong time –Can send different messages to different processes, etc. Captures software bugs, hacker intrusions

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Authenticated (Byzantine) Model Authentication: The receiver of a message can ascertain its origin –An intruder cannot masquerade as someone else Integrity: The receiver of a message can verify that it has not been modified in transit; –An intruder cannot substitute a false message for a legitimate one Nonrepudiation: A sender cannot falsely deny later that he sent a message

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Fault-Tolerant Consensus: Overview of Results Synchronous t-resilient algorithm – –iff t < n with authentication and weak unanimity –iff t < n/2 with authentication and strong unanimity –iff t < n/3 without authentication Eventually synchronous (ES) t-resilient algorithm –iff t < n/3 with or without authentication Homework problem: show the lower bound

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Overcoming Byzantine Failures With 3t+1 Processes Recall what we did for crash failures – –We gathered “votes” from a majority in every ballot –Since every two majorities intersect, for every two ballots, at least one process votes in both But now, a faulty process can lie about what it did in the other ballot –We want a correct process in the intersection –Since n-t ≥ 2t+1, two sets of size n-t intersect by at least one correct process –Gather n-t votes in a ballot, to ensure that for every two ballots, at least one correct process votes in both

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Paxos Setting State machine replication Structured like Paxos: –Updates are sent to the current leader –Leader uses a consensus algorithm to have all replicas agree on the order of updates –Our focus today is the Consensus algorithm Used to implement BFS – Byzantine Fault Tolerant NFS –Only 3% slower than un-replicated NFS

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model n processes: {1,…n} Up to t Byzantine failures, t < n/3 –For simplicity, assume n = 3t+1 Authentication (PKI) Reliable links, no recovery (for now)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Classic Paxos Phase I Periodically, until decision is reached do: if leader (by  ) then BallotNum   BallotNum.num+1, myId  send (“prepare”, BallotNum) to all Upon receive (“prepare”, b) from i if b  BallotNum then BallotNum  b send (“ack”, b, AcceptNum, AcceptVal) to i

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Classic Paxos Phase II Upon receive (“ack”, BallotNum, b, val) from n-t if all vals =  then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ Upon receive (“accept”, b, v) with b  BallotNum AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all (first time only)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Can Byzantine Failures Cause Problems?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Safety Problems: Leader Can Lie Problem 1: Leader can choose a value different than the highest accepted by n-t processes –Solution: Can “prove” he’s not lying by sending the signed “ack” (Phase 1) messages to all processes Problem 2: If no previous ballot was accepted, leader can send different new values to different processes

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Solution to the 2 nd Problem Before accepting a value proposed by the leader, verify that the value was proposed to “enough” processes Byzantine Paxos Phases: –Phase 1: Prepare –Phase 2: Propose – echo leader’s proposal –Phase 3: Accept – now only if n-t proposed Add new variable: PropNum, initially 0

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Safety Problems: Others Can Lie Problem 3: Faulty users can send invalid “accept” messages –Solution: Wait for n-t=2t+1 “accept” messages Problem 4: Faulty users can send invalid values with higher AcceptNums in “ack” messages –Solution: Can “prove” value is valid by forwarding signed “propose” messages –Add new variable: Proof, initially empty set

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Liveness Problems Problem 5: Faulty leader can deadlock algorithm –Solution: Propose a new leader when the current does not deliver –Use rotating coordinator until one is correct, leader will be (BallotNum mod n)+1 Problem 6: Faulty processes may keep selecting new leaders all the time (livelock) –Solution: Accept a new ballot only if t+1 processes propose a new leader

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring And Now For Our Feature Presentation The Byzantine Paxos Consensus Algorithm

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Paxos Variables Int BallotNum, initially 0 Int PropNum, initially 0 Int AcceptNum, initially 0 Value  {  } AcceptVal, initially  Message Set Proof, initially empty Define: Leader = (BallotNum mod n)+1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Paxos Phase I: Prepare Upon timeout on Leader BallotNum  BallotNum +1 send (“prepare”, BallotNum) to all Upon receive (“prepare”, b) from t+1 if (b < BallotNum) then return if (b > BallotNum) then BallotNum  b send (“prepare”, BallotNum) to all send (“ack”, b, AcceptNum, AcceptVal, Proof) to Leader

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Paxos Phase II: Propose Upon receive (“ack”, BallotNum, b, val, proof) from n-t S = {received (signed) “ack” messages} if (all vals that have valid proofs in S are  then myVal  init value else myVal  val that has valid proof with highest b in S send (“propose”, BallotNum, myVal, S) to all Upon receive (“propose”, BallotNum, v, S) if (BallotNum  PropNum) then return if (v is not a valid choice given S) then return PropNum  BallotNum send (“propose”, BallotNum, v, S) to all

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Paxos Phase III: Accept Upon receive (“propose”, b, v, S) from n-t if (b < BallotNum) then return AcceptNum  b; AcceptVal  v Proof  set of n-t signed “propose” messages send (“accept”, b, v) to all Upon receive (“accept”, b, v) from n-t decide v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring In Failure-Free Runs accept 1 prepareackpropose 2 n n n n n All send prepare All echo propose

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Some Optimizations Prepare and its “ack” can be merged into one message round Proofs don’t have to be sent with messages: processes can have the information to check the proofs locally because the original messages are multicast

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Invariant If proposals (b,v) and (b, v’) are accepted by correct processes i and j, (possibly i = j ) then v’=v Proof: –An accepted proposal is proposed by n-t processes –Two sets of n-t = 2t+1 processes have at least one correct process in common –A correct process sends no more than one propose message with the same b

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemma 1 If a proposal (b,v) is accepted by t+1 correct processes, then for every proposal (b’, v’) with b’>b that is proposed by a correct process, v’=v. Again, follows from Lemma 2…

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemma 2 If a proposal (b,v) is proposed by a correct process, then there is a set S including at least t+1 correct processes such that either –(1) no correct p in S accepts a proposal ranked less than b; or –(2) v is the value of the highest-ranked proposal among proposals ranked less than b accepted by correct processes in S

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proving Lemma 1 from Lemma 2 Assume (b,v) is accepted by t+1 correct processes, and consider the lowest ranked proposal (b’, v’) with b’>b proposed by a correct process Since two sets of t+1 correct processes have at least one correct process in common, case (1) of Lemma 2 is impossible, and by case (2), v’=v Continue by induction on ballot number

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proving Agreement Let v be a decided value. The first process that decides v receives a n-t accept messages for v with some ballot b, i.e., (b,v) is accepted by at least t+1 correct processes No other value is accepted by a correct process with the same b. Why? Let (v 1,b 1 ) be the first proposal accepted by n-t By Lemma 1, v 1 is the only possible decision value

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Liveness Is the current leader making progress? –If yes, some correct process decides. This process can periodically forward the “proof” for its decision to others so they will decide too. –If not, all timeout on the leader and start a new ballot. Once there is a correct leader. –The n-t correct processes will send all the needed messages. –The t faulty processes will not be able to force a new ballot.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast: Issues Leader can propose invalid client requests Leader can refrain from proposing client requests Leader can lie to client about response Leader can refrain from sending client responses Solution: clients cannot trust a single server

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Message Flow accept S1 prepareackpropose S2 Sn S1 S2 Sn S1 S2 Sn S1 S2 Sn propose S1 S2 Sn S1 request S2 Sn C C response