The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.
N-Consensus is the Second Strongest Object for N+1 Processes Eli Gafni UCLA Petr Kuznetsov Max Planck Institute for Software Systems.
© 2005 P. Kouznetsov Computing with Reads and Writes in the Absence of Step Contention Hagit Attiya Rachid Guerraoui Petr Kouznetsov School of Computer.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
6.852: Distributed Algorithms Spring, 2008 Class 7.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
Composition Model and its code. bound:=bound+1.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
Consensus and Its Impossibility in Asynchronous Systems.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.
The consensus problem in distributed systems
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed systems Consensus
Presentation transcript:

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL

2 Outline r Impossibility results and failure detectors r Model: asynchronous system with failure detectors r The weakest failure detector question and the CHT proof r Determining the weakest failure detectors for various problems (implementing shared memory, solving consensus, solving non-blocking atomic commit, boosting consensus power of atomic objects)

3 Centralized computing Clients Centralized computing unit

4 Distributed computing Clients Distributed computing unit

5 Redundancy and synchronization Distributed computing unit The distributed implementation should create an illusion of a centralized one: The components (processes) must be synchronized in a consistent way.

6 Consensus Processes propose values and must agree on a common value in a non-trivial manner: r Agreement: no two correct processes decide differently r Validity: every decided value is a proposed value r Termination: every correct process eventually decides

7 Ideal computing The consistency and progress of the implementation are preserved even if: r Processes can fail by crashing r The system is asynchronous: Communication is not bounded Processing is not bounded (There is no bound Δ such that, taking local Δ steps, a process can surely “hear” from every correct process.)

8 FLP impossibility Consensus is impossible in an asynchronous system if at least one process might crash. [Fischer, Lynch and Paterson, 1985]

9 Adding (some) synchrony Consensus is impossible in a system with asynchronous processing or asynchronous communication if at least one process might crash. [Dolev, Dwork, Stockmeyer, 1987] (… in a shared memory system [Loui, Abu-Amara, 1987])

10 Why? It is impossible to distinguish a crashed process from a “sleeping” one, no matter how many steps you take. (1) (2) ? ?

11 Adding partial synchrony Assume that for in every execution there is an upper bound on time to execute a processing step and to communicate a message. Consensus is solvable if a majority of processes are correct. (If communication is synchronous and processing is partially synchronous, then consensus is solvable for any number of failures.) [Dwork, Lynch, Stockmeyer, 1988]

12 Adding less synchrony Assume we (eventually) have a leader, i.e., eventually all processes that take “enough” steps will “hear” from some correct process

13 Eventual leader abstraction Ω r At every process, Ω outputs a process identifier. r Eventually, the same correct process id is output at all processes

14 Ω is sufficient for consensus! Consensus is solvable in an asynchronous system equipped with Ω, where a majority of processes are correct. [Lam90,CT91] (If communication is synchronous, then consensus is solvable for any number of failures.) [DLS88,LH94]

15 The question What is the smallest amount of synchrony that must be introduced into the asynchronous system to solve an unsolvable problem?

16 Outline r Impossibility results and failure detectors r Model: asynchronous system with failure detectors r The weakest failure detector question and the CHT proof r Determining the weakest failure detectors for various problems (implementing shared memory, solving consensus, solving non-blocking atomic commit (NBAC), boosting consensus power of atomic objects)

17 General system model Processes p1,…,pn communicate through reliable message-passing channels. (*) In addition, every processes can query its failure detector module that produces some (maybe incomplete and inaccurate) information about failures. (*) Later we consider also registers and atomic objects of given power.

18 Failure detector modules p FD qr

19 Failure detectors FD p query Information on failures fail(q) r The information output to the processes depends only on failures

20 Example: perfect failure detector P At each process, P outputs a set of suspected process identifiers. r Eventually, every crashed process is suspected r No process is suspected before it crashes Ø {4} Ø Ø Ø Ø Ø

21 Example: failure signal failure detector FS At each process, FS outputs green or red. r If red is output, then a failure previously occurred. r If a failure occurs, then eventually red is output at all correct processes green red green

22 Environments An environment E specifies when and where failures might occur Examples: r Majority of processes are correct r At most one process crash

23 Failure detector reductions Failure detector D is weaker than failure detector D’ if D can be extracted from D’, i.e., there exists an algorithm that simulates D using D’. p D’D qr DD

24 The weakest failure detector D is the weakest failure detector to solve problem M in an environment E if and only if: D is sufficient for M in E: D can be used to solve M in E D is necessary for M in E: D is weaker than any failure detector D’ that can be used to solve M in E

25 The question Given a problem M and an environment E, what is the weakest failure detector for solving M in E?

26 Outline r Impossibility results and failure detectors r Model: asynchronous system with failure detectors r The weakest failure detector question and the CHT proof r Determining the weakest failure detectors for various problems (implementing shared memory, solving consensus, solving non-blocking atomic commit (NBAC), boosting consensus power of atomic objects)

27 The CHT result r The CHT Theorem: If a failure detector D implements consensus, then D implements  r Corollary:  is the weakest failure detector for consensus with a majority of correct processes [Chandra, Hadzilacos and Toueg, 1996]

28 The CHT (96) Proof Assume D implements consensus: i.e., there is some algorithm A that uses D to implement consensus We build an algorithm T that uses A to implement  NB. Implementing  means that every process trusts some process so that eventually all correct processes permanently trust the same correct process

29 Algorithm T in 5 acts (1) The exchange (2) The simulation (3) The tagging (4) The stabilization (5) The extraction

30 (1) The Exchange Every process periodically queries its failure detector module (D) and sends all outputs it has seen to all A process builds a growing DAG using the outputs provided by other processes A vertex of the DAG is a pair: (process, failure detector value) An edge (p1,d1) -> (p2,d2) means that p1 saw d1 before p2 saw d2

31 (1) The Exchange p1 d1 p2 d2 (p1,d1) (p2,d2) (p1,d1) (p2,d2) d3 d4 (p1,d1) (p2,d2) (p1,d3) (p2,d4) (p1,d1) (p2,d2) (p1,d3) (p2,d4)

32 (2) The Simulation Every process pi uses its DAG to simulate runs of A in the system, i.e., every process locally plays the role of all other processes Whenever pi updates its DAG, pi triggers runs of A for: All paths in the DAG All input vectors I0, I2,.. In, where Ij makes processes p1-pj propose 1 and the rest propose 0

33 (2) The Simulation p1 p2 p1 p2 p1 p2 I0 I1 I2 Decide(1 ) Decide(0 )

34 (3) The Tagging Periodically, every process pi looks at the results of the simulations, i.e., the outputs of the consensus simulations paths (runs of A) For every vector Ij, pi gathers all decisions and tags Ij: 0-valent if only 0 is decided starting from Ij 1-valent if only 1 is decided starting from Ij bivalent if both 0 and 1 can be decided (in different simulated runs)

35 (3) The Tagging Notice that: by validity of consensus, I0 is 0-valent and In is 1-valent an 0 or 1-valent input vector can only get bivalent a bivalent input vector stays bivalent forever

36 (3) The Tagging There is some index k in the sequence I1, …, In such that Ik-1 is 0-valent and Ik is not: k is called the critical index If Ik is 1-valent, then pi trusts pk (we do not consider here the more complicated case when Ik is bivalent)

37 (4) The Stabilization Eventually, the critical index at a given process does not change anymore: this is because the index can only decrease and cannot go lower than 1 All DAGs converge to the same infinite DAG and the same critical index k is eventually computed at all processes

38 (5) The Extraction Assume that k is such that Ik-1 is 0-valent and Ik is 1-valent Thus, eventually, all correct processes permanently trust pk Claim: pk is correct

39 (5) The Extraction Proof: (by contradiction) Assume pk is faulty Then there is a simulated run r of A starting form Ik in which pk takes no steps. Ik-1 and Ik differ only in the input value of pk. Then pi cannot distinguish r from a run starting from Ik-1. But Ik-1 is 0-valent and Ik is 1-valent – a contradiction.

40 (5) The Extraction Assume now that k is such that Ik-1 is 0- valent and Ik is bivalent Claim: there exists an algorithm that eventually deduces a correct process from simulated runs starting from Ik

41 Finally Eventually, all correct processes trust the same correct process: Ω is emulated !

42 Outline r Impossibility results and failure detectors r Model: asynchronous system with failure detectors r The weakest failure detector question and the CHT proof r Determining the weakest failure detectors for various problems

43 Problem: implementing a register A register is an object accessed through reads and writes r The write(v) stores v at the register and returns ok r The read returns the last value written at the register NB In an asynchronous system a register can be implemented if and only if a majority of processes are correct [ABD95].

44 Quorum failure detector Σ At each process, Σ outputs a set of processes r Any two sets (output at any times and at any processes) intersect. r Eventually every set contains only correct processes. NB Given a majority of correct processes, Σ can be implemented in an asynchronous system.

45 Σ is sufficient to implement registers r Adapt the “correct majority-based” algorithm of [ABD95] to implement (1 reader, 1 writer) atomic register using Σ: Substitute « process p waits until a majority of processes reply » with « process p waits until all processes in Σ reply »

46 Σ is necessary to implement registers Let A be any implementation of registers that uses some failure detector D. Must show that we can extract Σ from D. r Each write operation involves a set of “participants”: the processes that help the operation take effect (w.r.t. A and D) Claim: the set of participants includes at least one correct process

47 Extraction algorithm Every process p periodically: r writes in its register the participant sets of its previous writes r reads participant sets of other processes r outputs the participant set of its previous write, and for every known participant set S, one live process in S All output sets intersect and eventually contain only correct processes

48 Emulating Σ: the reduction algorithm Let Pi(k) be the set of participants in k-th write operation by process i Round k: Ei := {Pi(j)} j≤k write(Ei) to register Ri Ei := Ei U Pi(k) send (k,?) to all for every j=1,…,n, wait until received (k,ack) from at lest one process in every S read in register Rj current output of Σ := set of all processes from which (ack,k) plus Pi(k-1)

49 Emulating Σ: the proof intuition r For any round k, process i stores all Pi(k’) (k’<k) in Ri and includes Pi(k-1) to its emulated set Σi => Any process j that reads Ri afterwards will include at least one process from Pi(k-1) to its emulated set Σj => Every two emulated sets intersect r Eventually, only correct processes send acks => Eventually, the emulation set includes only correct processes

50 Registers: the weakest failure detector Σ is the weakest failure detector to implement atomic registers, in any environment

51 Consensus  registers +  r  can be used to solve consensus with registers, in any environment [LH94] r Consensus => Registers: any consensus algorithm can be used to implement registers, in any environment [Lam86,Sch90] r Consensus =>  :  can be extracted from any failure detector D that solves consensus, in any environment [CHT96]

52 Consensus: the weakest failure detector r Consensus  registers +  (in any environment) r Σ is the weakest FD to implement registers (in any environment) Thus, ( , Σ) is the weakest failure detector to solve consensus, in any environment

53 Problem: quittable consensus (QC) QC is like consensus except that if a failure occurs, then processes can agree either on one of the proposed values (as in consensus), or on the special value Q (« Quit »)

54 Quittable consensus (QC) propose(v) (v in {0,1}) returns a value in {0,1,Q} (Q stands for « quit ») r Agreement: no two processes return different values r Termination: every correct process eventually returns a value r Validity: only a value v in {0,1,Q} can be returned If v in {0,1}, then some process previously proposed v If v=Q, then a failure previously occurred

55 Failure detector Ψ r For some initial period of time Ψ outputs some predefined value Ø r Eventually, Ψ behaves like (Ω,Σ), or (only if a failure occurs) Ψ behaves like FS (outputs red) NB: If a failure occurs, Ψ can choose to behave like (Ω,Σ) or like FS (the choice is the same at all processes)

56 Ψ is sufficient to solve QC Propose(v) // v in {0,1} wait until Ψ ≠ Ø if Ψ = red then return Q // If Ψ behaves like FS d := ConsPropose(v) // If Ψ behaves like (Ω,Σ) // run a consensus algorithm return d

57 Ψ is necessary to solve QC Let A be a QC algorithm that uses a failure detector D. Must show that we can extract Ψ from A and D

58 Simulating runs of A Every process periodically samples D and exchanges its FD samples with other processes => using these FD samples, the process locally simulates runs of A [CHT96] p D Simulate A qr D D

59 Extracting Ψ Each process pi runs the simulation until, for every j=1,…,n, there is a simulated run starting from Ij in which pi decides. If pi decides Q in one of the simulated runs: propose 0 to QC. Otherwise, propose 1 to QC. If QC decides 0 or Q --- output red. Otherwise, it is possible to output (Ω,Σ).

60 Extracting (Ω,Σ) If there are “enough” simulated runs of A in which non- Q values are decided, then it is possible to extract (Ω,Σ). r Extracting Ω --- like in CHT, locating a critical index, etc. (by construction, a critical index exists) r Extracting Σ --- a novel technique

61 QC: the weakest failure detector Ψ is the weakest failure detector to solve QC, in any environment

62 Problem: NBAC A set of processes need to agree on whether to commit or to abort a transaction. Initially, each process votes Yes (“I want to commit”) or No (“We must abort”) Eventually, processes must reach a common decision (Commit or Abort).

63 Problem: NBAC r Agreement: no two processes return different values r Termination: every correct process eventually returns a value r Validity: a value in {Commit, Abort} is returned If Commit is returned, then every process voted Yes If Abort is returned, then some process voted no or a failure previously occurred

64 NBAC  QC + FS r NBAC => QC: Any algorithm for NBAC can be used to solve QC r NBAC => FS: Any algorithm for NBAC can be used to extract FS r QC+FS => NBAC: given (a) any algorithm for QC and (b) FS, we can solve NBAC

65 (QC,FS)  NBAC Given (a) any algorithm for QC and (b) FS, we can solve NBAC send v to all wait until received all votes or FS outputs red \\ wait until all votes received or \\ a failure occurs if all votes are received and are Yes then proposal := 1\\ propose to commit else proposal := 0\\ propose to abort if QC.Propose(proposal) returns 1 then return Commit else return Abort

66 NBAC: the weakest failure detector r NBAC  QC + FS (in any environment) r Ψ is the weakest FD to solve QC (in any environment) Thus, (Ψ,FS) is the weakest failure detector to solve NBAC, in any environment

67 Problem: boosting consensus power Assume that processes communicate through atomic (wait-free linearizable) objects. An object type specifies the interface of the object: r The set of states r The set of operations r The set of possible state transitions

68 Problem: boosting consensus power Consensus power [Herlihy, 1991] of an object type T is the maximum number of processes that can solve consensus using atomic objects of type T and registers. cons(Register)=1 cons(T&S)=2 cons(C&S)= infinity By definition, given a type T with consensus power n, n+1 processes cannot solve consensus using objects of type T and registers.

69 Problem: boosting consensus power r n + 1 processes r Registers r Shared objects of type T: cons(T) = n What is the weakest failure detector D to solve consensus?

70 Neiger’s conjecture [Nei95] Ω(k) outputs a set of at most k processes so that, Eventually, all correct processes detect the same set that includes at least one correct process r Ω(k+1) is weaker than Ω(k) r Ω(n) is sufficient to solve (n + 1)-process consensus using objects of T and registers. r Is Ω(n) necessary?

71 Partial response Yes, if T is one-shot deterministic. r Every operation triggers exactly one transition r At most one operation on an object of type T is allowed for every process

72 Partial response Theorem Ω(n) is necessary to implement wait- free (n + 1)-process consensus with registers and objects of a one-shot deterministic type T such that cons(T)≤ n. Corollary Ω(n) is necessary to implement (n + 1)-process consensus using registers and (n − 1)-resilient objects of any types.

73 The sources r C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, V. Hadzilacos, P. Kouznetsov, and S. Toueg The weakest failure detectors to solve certain fundamental problems in distributed computing PODC 2004 r R. Guerraoui and P. Kouznetsov Failure Detectors and Type Boosters DISC 2003 r C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, and P. Kouznetsov Mutual Exclusion in Asynchronous Systems with Failure Detectors To appear in JPDC 2005

74 Thank you!