Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.

Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.

6.852: Distributed Algorithms Spring, 2008 Class 7.

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

How to Choose a Timing Model? Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Timeliness, Failure Detectors, and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC.

1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.

UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.

© Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: Paxos Spring.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.

Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Paxos Spring.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

CS294, Yelick Consensus revisited, p1 CS Consensus Revisited

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

SysRép / 2.5A. SchiperEté The consensus problem.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

On the Performance of Consensus Algorithms: Theory and Practice Idit Keidar Technion & MIT.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Consensus, impossibility results and Paxos Ken Birman.

Alternating Bit Protocol

Distributed Consensus

Agreement Protocols CS60002: Distributed Systems

Distributed Systems, Consensus and Replicated State Machines

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed systems Consensus

Presentation transcript:

Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico

Sergio Rajsbaum 2006 Lecture 4 Consensus in partially synchronous systems, and failure detectors Part I: Realistic timing model and metric Part II: Failure detectors, algorithms Part III: this is the best possible Part IV: New directions and extensions

Sergio Rajsbaum 2006 CONSENSUS A fundamental Abstraction Each process has an input, should decide an output s.t. Agreement: correct processes’ decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide There are at least two possible input values 0 and 1. all possible vectors over the input values V

Sergio Rajsbaum 2006 The lecture in a nutshell Consensus solvability depends on how long connectivity preserved by a particular model In synchronous it is solvable, in asynchronous not. What about intermediate, more realistic models? X0X0 L(X 0 ) L 2 (X 0 ) Initial states states after one round states after 2 rounds Connectivity preserved Connectivity destroyed

Sergio Rajsbaum 2006 Basic Model Message passing (essentially equivalent to read/write shared memory model) Channels between every pair of processes Crash failures t 1 processes No message loss among correct processes

Sergio Rajsbaum 2006 Is consensus solvable? If so, how long does it take to solve it? It depends on what exactly the model is But what is a realistic model? And what are the common scenarios within the model? The nature of a distributed system is to include complex combinations of failures and delays

Sergio Rajsbaum 2006 How Fast Can We Solve Consensus? Depends on the timing model: Message delays Processing times Clocks And on the metric used: Worst case Average etc

Sergio Rajsbaum 2006 The Rest of This Lecture Part I: Realistic timing model and metric Part II: Upper bounds Part III: this is the best possible Part IV: New directions and extensions

Sergio Rajsbaum 2006 Part I: Realistic Timing Model

Sergio Rajsbaum 2006 First two simple models

Sergio Rajsbaum 2006 Asynchronous Model Unbounded message delay, processor speed Consensus impossible even for t=1 [FLP85]

Sergio Rajsbaum 2006 Round Synchronous Model Algorithm runs in synchronous rounds: –send messages to any set of processes, –receive messages from previous round, –do local processing (possibly decide, halt) If process i crashes in a round, then any subset of the messages i sends in this round can be lost

Sergio Rajsbaum 2006 Synchronous Consensus In a run with f failures (f<t) –Processes can decide in f+1 rounds [Lamport Fischer 82; Dolev, Reischuk, Strong 90] (early-deciding) 1 round with no failures In this talk deciding – halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]

Sergio Rajsbaum 2006 The Middle Ground Many real networks are neither synchronous nor asynchronous During long stable periods, delays and processing times are bounded –Like synchronous model Some unstable periods –Like asynchronous model

Sergio Rajsbaum 2006 Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88] Processes have clocks (with bounded drift)  upper bound on message delay , upper bound on processing time GST, global stabilization time –Until GST, unstable: bounds do not hold –After GST, stable: bounds hold –GST unknown

Sergio Rajsbaum 2006 Partial Synchrony in Practice For , , choose bounds that hold with high probability Stability forever? –We assume that once stable remains stable –In practice, has to last “long enough” for given algorithm to terminate –A commonly used model that alternates between stable and unstable times: Timed Asynchronous Model [Cristian, Fetzer 98]

Sergio Rajsbaum 2006 Consensus with Partial Synchrony Solvable requires t < n/2 [DLS88] Unbounded running time by [FLP85], because model can be asynchronous for unbounded time

Sergio Rajsbaum 2006 Exercise Prove that consensus is not solvable in the partially synchronous model, if t ≥ n/2 Prove that if t<n/2, it takes unbounded running time to be solved

Sergio Rajsbaum 2006 In a Practical System Can we say more than: consensus will be solved eventually ?

Sergio Rajsbaum 2006 Performance Metric Number of rounds in well-behaved runs Well-behaved: –No failures –Stable from the beginning Motivation: common case

Sergio Rajsbaum 2006 The Rest of This Lecture Part II: best known algorithms decide in 2 rounds in well-behaved runs –2  time (with delay bound , 0 processing time) Part III: this is the best possible Part IV: new directions and extensions

Sergio Rajsbaum 2006 Part II: Algorithms, and the Failure Detector Abstraction II.a Failure Detectors and Partial Synchrony II.b Algorithms -=

Sergio Rajsbaum 2006 Time-Free Algorithms Goal: abstract away time, get simpler algorithms We describe the algorithms using failure detector abstraction [Chandra, Toueg 96]

Sergio Rajsbaum 2006 Unreliable Failure Detectors [Chandra, Toueg 96] Each process has local failure detector oracle –Typically outputs list of processes suspected to have crashed at any given time Unreliable: failure detector output can be arbitrary for unbounded (finite) prefix of run

Sergio Rajsbaum 2006 Performance of Failure Detector Based Consensus Algorithms Implement a failure detector in the partial synchrony model Design an algorithm for the failure detector Analyze the performance in well-behaved runs of the combined algorithm

Sergio Rajsbaum 2006 A Natural Failure Detector Implementation in Partial Synchrony Model Implement failure detector using timeouts: –When expecting a message from a process i, wait  clock skew before suspecting i In well-behaved runs,  always hold, hence no false suspicions

Sergio Rajsbaum 2006 The resulting failure detector is <>P - Eventually Perfect Strong Completeness: From some point on, every faulty process is suspected by every correct process Eventual Strong Accuracy: From some point on, every correct process is not suspected

Sergio Rajsbaum 2006 Weakest Failure Detectors for Consensus <>S - Eventually Strong –Strong Completeness –Eventual Weak Accuracy: From some point on, some correct process is not suspected  - Leader –Outputs one trusted process –From some point, all correct processes trust the same correct process

Sergio Rajsbaum 2006 A Simple  Implementation Use <>P implementation Output lowest id non-suspected process In well-behaved runs: process 1 always trusted

Sergio Rajsbaum 2006 Exercise Write the algorithm code for this failure detector  and prove it is correct

Sergio Rajsbaum 2006 Relationships among Failure Detector Classes <>S is a subset of <>P <>S is strictly weaker than <>P <>S ~  [Chandra, Hadzilacos, Toueg 96] Food for thought: What is the weakest timing model where <>S and/or  are implementable but <>P is not?

Sergio Rajsbaum 2006 Relationships among Failure Detector Classes- Recent Results Partial Answer: In PODC’03 Aguilera et al present a system with synchronous processes S : –any number of them may crash, and –only the output links of an unknown correct process are eventually timely (all other links can be asynchronous and/or lossy) <>P is not implementable in S,  yes New proof that: <>S is strictly weaker than <>P

Sergio Rajsbaum 2006 Note on the Power of Consensus Consensus cannot implement <>P, interactive consistency, atomic commit, … So its “universality”, in the sense of –wait-free objects in shared memory [Herlihy 93] –state machine replication [Lamport 78; Schneider 90] does not cover sensitivity to failures, timing, etc.

Sergio Rajsbaum 2006 Other Failure Detector Implementations Food for thought: When is building <>P more costly than <>S or  ? Partial answer: Aguilera at al PODC’03 observe –any implementation of <>P (even in a perfectly synchronous system) requires all alive processes to send messages forever, while  can be implemented such that eventually only the leader sends messages

Sergio Rajsbaum 2006 Other Failure Detector Implementations Message efficient <>S implementation [Larrea, Fernández, Arévalo 00] QoS tradeoffs between accuracy and completeness [Chen, Toueg, Aguilera 00] Leader Election [Aguilera, Delporte, Fauconnier, Toueg 01] Adaptive <>P [Fetzer, Raynal, Tronel 01]

Sergio Rajsbaum 2006 Part II: Algorithms, and the Failure Detector Abstraction II.a Failure Detectors and Partial Synchrony II.b Algorithms

Sergio Rajsbaum 2006 Algorithms that Take 2 Rounds in Well-Behaved Runs <>S-based [ Schiper 97; Hurfin, Raynal 99 ; Mostefaoui, Raynal 99 ]  -based for t < n/3 [ Mostefaoui, Raynal 00]  -based for t < n/2 [ Dutta, Guerraoui 01] Paxos (optimized version) [Lamport 89; 96] –Leader-based (  ) –Also tolerates omissions, crash recoveries COReL - Atomic Broadcast [Keidar, Dolev 96] –Group membership based (<>P)

Sergio Rajsbaum 2006 Of This Laundry List, We Present Two Algorithms 1<>S-based [MR99] 2Paxos

Sergio Rajsbaum 2006 <>S-based Consensus [MR99] val  input v; est  null for r =1, 2, … do coord  (r mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, val) from coord OR suspect coord (by <>S)) if receive val from coord then est  val else est  null send (r, est) to all wait for (r,est) from n-t processes if any non-null est received then val  est if all ests have same v then send (“decide”, v) to all; return(v) od Upon receive (“decide”, v), forward to all, return(v) 1 2

Sergio Rajsbaum 2006 In Well-Behaved Runs 11 2 n (1, v 1 ) 1 2 n est = v 1 decide v 1

Sergio Rajsbaum 2006 In Case of Omissions The algorithm can block in case of transient message omissions, waiting for a specific round message that will not arrive

Sergio Rajsbaum 2006 Paxos [Lamport 88; 96; 01] Uses  failure detector Phase 1: prepare –A process who trusts itself tries to become leader –Chooses largest unique (using ids) ballot number –Learns outcome of all smaller ballots Phase 2: accept –Leader proposes a value with his ballot number. –Leader gets majority to accept his proposal. –A value accepted by a majority can be decided

Sergio Rajsbaum 2006 Paxos - Variables Type Rank –totally ordered set with minimum element r 0 Variables: Rank BallotNum, initially r 0 Rank AcceptNum, initially r 0 Value  {  } AcceptVal, initially 

Sergio Rajsbaum 2006 Paxos Phase I: Prepare Periodically, until decision is reached do: if leader (by  ) then BallotNum  (unique rank > BallotNum) send (“prepare”, rank) to all Upon receive (“prepare”, rank) from i if rank > BallotNum then BallotNum  rank send (“ack”, rank, AcceptNum, AcceptVal) to i

Sergio Rajsbaum 2006 Paxos Phase II: Accept Upon receive (“ack”, BallotNum, b, val) from n-t if all vals =  then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ Upon receive (“accept”, b, v) with b  BallotNum AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all (first time only)

Sergio Rajsbaum 2006 Paxos – Deciding Upon receive (“accept”, b, v) from n-t decide v periodically send (“decide”, v) to all Upon receive (“decide”, v) decide v

Sergio Rajsbaum 2006 In Well-Behaved Runs 11 2 n (“accept”,1,v 1 ) 1 2 n n (“prepare”,1) (“ack”,1,r 0,  ) decide v 1 (“accept”,1,v 1 ) Our  implementation always trusts process 1

Sergio Rajsbaum 2006 Optimization Allow process 1 (only!) to skip Phase 1 –use rank r 0 –propose its own initial value Takes 2 rounds in well-behaved runs Takes 2 rounds for repeated invocations with the same leader

Sergio Rajsbaum 2006 What About Message Loss? Does not block in case of a lost message –Phase I can start with new rank even if previous attempts never ended But constant omissions can violate liveness Specify conditional liveness: If n-t correct processes including the leader can communicate with each other then they eventually decide

Sergio Rajsbaum 2006 Synchronous Consensus In a run with f failures (f<t) –Processes can decide in f+1 rounds –And no less ! [Lamport Fischer 82; Dolev, Reischuk, Strong 90] (early-deciding) 1 round with no failures In this talk deciding – halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]

Sergio Rajsbaum 2006 Uniform Consensus Uniform agreement: decision of every two processes is the same Recall: with consensus, only correct processes have to agree (disagreement with the dead is OK) This version of consensus will be useful to extend the lower bound argument to asynchronous models

Sergio Rajsbaum 2006 Synchronous Uniform Consensus Every algorithm has a run with f failures (f<t-1), that takes at least f+2 rounds to decide [Charron-Bost, Schiper 00; KR 01] –as opposed to f+1 for consensus

Sergio Rajsbaum 2006 A Simple Proof of the Uniform Consensus Synchronous Lower Bound [Keidar, Rajsbaum IPL 02]

Sergio Rajsbaum 2006 Theorem: f+2 Lower Bound Assume n>t, and f < t-1 L f (X 0 ) - final states of runs with  f failures –connected –in any state in L f (X 0 ) exist at least 3 non-failed processes and 2 can fail Take z, z’  X 0 s.t. val(z)  val(z’), –let x, x’ be failure-free extensions of z, z’: x=z.(i,[0]) f  L f (X 0 )

Sergio Rajsbaum 2006 Exercise 1.Consider Modify the theorem and the proof of this talk for the consensus problem (instead of the uniform consensus problem)

Sergio Rajsbaum 2006 Upper Bounds From Part II We saw that there are algorithms that take 2 rounds to decide in well-behaved runs <>S-based,  -based, Paxos, COReL Presented two of them.

Sergio Rajsbaum 2006 Why are there no 1-Round Algorithms? There is a lower bound of 2 rounds in well- behaved executions –Similar bounds shown in [Dwork, Skeen 83; Lamport 00] We will show that the bound follows from a similar bound on Uniform Consensus in the synchronous model

Sergio Rajsbaum 2006 Uniform Consensus Uniform agreement: decision of every two processes is the same Recall: with consensus, only correct processes have to agree

Sergio Rajsbaum 2006 From Consensus to Uniform Consensus In partial synchrony model, any algorithm A for consensus solves uniform consensus [Guerraoui 95] Proof: Assume by contradiction that A does not solve uniform consensus –in some run, p,q decide differently, p fails –p may be non-faulty, and may wake up after q decides

Sergio Rajsbaum 2006 Synchronous Uniform Consensus Every algorithm has a well-behaved run that takes 2 rounds to decide More generally, it has a run with f failures (f<t-1), that takes at least f+2 rounds to decide [Charron-Bost, Schiper 00; KR 01] –as opposed to f+1 for consensus

Sergio Rajsbaum 2006 Bibliography Keidar and Rajsbaum, “A Simple Proof of the Uniform Consensus Synchronous Lower Bound,” in IPL, Vol. 85, pp , Keidar and Rajsbaum, “On the Cost of Fault-Tolerant Consensus When There Are No Faults” in Keidar’s page, including slides and papers. Moses, Rajsbaum, “A Layered Analysis of Consensus,” SIAM J. Comput. 31(4): , Mostéfaoui, Rajsbaum, Raynal: Conditions on input vectors for consensus solvability in asynchronous distributed systems. J. ACM, 2003

Sergio Rajsbaum 2006