1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.

Slides:



Advertisements
Similar presentations
1 © R. Guerraoui Universal constructions R. Guerraoui Distributed Programming Laboratory.
Advertisements

1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.
CS 542: Topics in Distributed Systems Diganta Goswami.
The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
1 © R. Guerraoui The Power of Registers Prof R. Guerraoui Distributed Programming Laboratory.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
What is ’’hard’’ in distributed computing? R. Guerraoui EPFL/MIT joint work with. Delporte and H. Fauconnier (Univ of Paris)
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
1 © R. Guerraoui - Shared Memory - R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.
What Can Be Implemented Anonymously ? Paper by Rachid Guerraui and Eric Ruppert Presentation by Amir Anter 1.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Distributed Systems Terminating Reliable Broadcast Prof R. Guerraoui Distributed Programming Laboratory.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
DISTRIBUTED ALGORITHMS By Nancy.A.Lynch Chapter 18 LOGICAL TIME By Sudha Elavarti.
Consensus with Partial Synchrony Kevin Schaffer Chapter 25 from “Distributed Algorithms” by Nancy A. Lynch.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
“Virtual Time and Global States of Distributed Systems”
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Distributed Algorithms (22903) Lecturer: Danny Hendler The Atomic Snapshot Object The Renaming Problem This presentation is based on the book “Distributed.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
1 © R. Guerraoui Set-Agreement (Generalizing Consensus) R. Guerraoui.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.
Computing with anonymous processes
Distributed systems Total Order Broadcast
Failure Detectors motivation failure detector properties
R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch
Computing with anonymous processes
Distributed systems Consensus
Distributed Systems Terminating Reliable Broadcast
R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch
Presentation transcript:

1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory

2 A modular approach We implement Wait-free Consensus (Consensus) through: Lock-free Consensus (L-Consensus) and Registers We implement L-Consensus through Obstruction-free Consensus (O-Consensus) and <>Leader (encapsulating timing assumptions and sometimes denoted by  )

3 A modular approach Consensus L-Consensus O-Consensus<>Leader <>Synchrony Registers

4 Consensus Wait-Free-Termination: If a correct process proposes, then it eventually decides Agreement: No two processes decide differently Validity: Any value decided must have been proposed

5 L-Consensus Lock-Free-Termination: If a correct process proposes, then at least one correct process eventually decides Agreement: No two processes decide differently Validity: Any value decided must have been proposed

6 O-Consensus Obstruction-Free-Termination: If a correct process proposes and eventually executes alone, then the process eventually decides Agreement: No two processes decide differently Validity: Any value decided must have been proposed

7 Example 1 P2 P1 prop(5) -> 5 prop(0) -> P3 prop(8) ->

8 Example 2 P2 P1 prop(5) -> 8 P3 prop(0) -> prop(8) ->

9 O-Consensus algorithm (idea) A process that is eventually « left alone » to execute steps, eventually decides Several processes might keep trying to concurrently decide until some unknown time: agreement (and validity) should be ensured during this preliminary period

10 O-Consensus algorithm (data) Each process pi maintains a timestamp ts, initialized to i and incremented by n The processes share an array of register pairs Reg  1,..,n  ; each element of the array contains two registers: Reg  i .T contains a timestamp (init to 0) Reg  i .V contains a pair (value,timestamp) (init to ( ,0))

11 O-Consensus algorithm (functions) To simplify the presentation, we assume two functions applied to Reg  1,..,N  highestTsp() returns the highest timestamp among all elements Reg  1 .T, Reg  2 .T,.., Reg  N .T highestTspValue() returns the value with the highest timestamp among all elements Reg  1 .V, Reg  2 .V,.., Reg  N .V

12 O-Consensus algorithm propose(v): while(true) Reg  i .T.write(ts); val := Reg  1,..,n .highestTspValue(); if val =  then val := v; Reg  i .V.write(val,ts); if ts = Reg  1,..,n .highestTsp() then return(val) ts := ts + n

13 O-Consensus algorithm propose(v): while(true) (1) Reg  i .T.write(ts); (2) val := Reg  1,..,n .highestTspValue(); if val =  then val := v; (3) Reg  i .V.write(val,ts); (4) if ts = Reg  1,..,n .highestTsp() then return(val) ts := ts + n

14 O-Consensus algorithm (1) pi announces its timestamp (2) pi selects the value with the highest timestamp (or its own if there is none) (3) pi announces the value with its timestamp (4) if pi’s timestamp is the highest, then pi decides (i.e., pi knows that any process that executes line 2 will select pi’s value)

15 L-Consensus We implement L-Consensus using <>leader (leader()) and the O- Consensus algorithm The idea is to use <>leader to make sure that, eventually, one process keeps executing steps alone, until it decides

16 <> Leader  One operation leader() which does not take any input parameter and returns, as an output parameter, a boolean  A process considers itself leader if the boolean is true Property: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader

17 Example P2 P1 leader() -> true P3 leader() -> true leader() -> false leader() -> true

18 L-Consensus propose(v): while(true) if leader() then Reg  i .T.write(ts); val := Reg  1,..,n .highestTspValue(); if val =  then val := v; Reg  i .V.write(val,ts); if ts = Reg  1,..,n .highestTsp() then return(val) ts := ts + n

19 From L-Consensus to Consensus (helping) Every process that decides writes its value in a register Dec (init to  ) Every process periodically seeks for a value in Dec

20 Consensus propose(v) while (Dec.read() =  ) if leader() then Reg  i .T.write(ts); val := Reg  1,..,n .highestTspValue(); if val =  then val := p; Reg  i .V.write(val,ts); if ts = Reg  1,..,n .highestTsp() then Dec.write(val) ts := ts + n; return(Dec.read())

21 <> Leader  One operation leader() which does not take any input parameter and returns, as an output parameter, a boolean  A process considers itself leader if the boolean is true Property: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader

22 <>Leader: algorithm  We assume that the system is <>synchronous There is a time after which there is a lower and an upper bound on the delay for a process to execute a local action, a read or a write in shared memory The time after which the system becomes synchronous is called the global stabilization time (GST) and is unknown to the processes  This model captures the practical observation that distributed systems are usually synchronous and sometimes asynchronous

23 <>Leader: algorithm (shared variables)  Every process pi elects (stores in a local variable leader) the process with the lowest identity that pi considers as non-crashed; if pi elects pj, then j < i  A process pi that considers itself leader keeps incrementing Reg  i  ; pi claims that it wants to remain leader  NB. Eventually, only the leader keeps incrementing the shared leader

24 <>Leader: algorithm (local variables)  Every process periodically increments local variables clock and check, as well as a local variable delay whenever its leader changes  Process pi maintains lasti  j  to record the last value of Reg  j  pi has read (pi can hence know whether pj has progressed)  The next leader is the one with the smallest id that makes some progress; if no such process pj such that j<i exists, then pi elects itself (noLeader is true)

25 <>Leader: algorithm (variables)  check, and delay are initialized to 1  lasti  j  and Reg  j  are initialized to 0  The next leader is the one with the smallest id that makes some progress; if no such process pj such that j<i exists, then pi elects itself (noLeader is true)

26 <>Leader: algorithm leader(): return(leader)  check, delay and leader init to 1  lasti  j  and Reg  j  init to 0;  Task:  while(true) do clock := 0; If (leader=self) then Reg  i .write(Reg  i .read()+1); clock := clock + 1; if(clock = check) then elect();

27 <>Leader: algorithm (cont’d) elect():  noLeader := true;  for j = 1 to (i-1) do if (Reg  j .read() > last  j  ) then last  j  := Reg  j .read(); if(leader ≠ pj) then delay:=delay*2; check := check + delay; leader:= pj; noLeader := false; break (for);  if (noLeader) then leader := self;

28 Consensus = Registers + <> Leader  <>Leader has one operation leader() which does not take any input parameter and returns, as an output parameter, a boolean; a process considers itself leader if the boolean is true Property: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader  <>Leader encapsulates the following synchrony assumption: there is a time after which a lower and an upper bound hold on the time it takes for every process to execute a step (eventual synchrony)

29 Minimal Assumptions  Consensus is impossible in an asynchronous system with Registers (FLP83, LA88)  Consensus is possible in an eventually synchronous system (i.e., <> Leader) with Registers (DLS88, LH95)  What is the minimal synchrony assumption needed to implement Consensus with Registers?  Is there any weaker timing abstraction than <>Leader that help Registers solve Consensus

30 Failure detector A failure detector is a distributed (wait-free) oracle that provides processes with information about the crashes of processes Examples: P, ◊P, ◊S, ◊W, , ◊Leader NB. A failure detector does only provide information about crashes (CT96)

31 Failure detector relations We say that a failure detector D implements abstraction A (e.g., object O) if there is an algorithm that implements A using D We say that a failure detector D is weaker than a failure detector D’ if D’ implements D (D ≤ D’) If D is weaker than D’ and D’ is not weaker than D, then D is said to be strictly weaker than D’ (D < D’) We say that two failure detectors are equivalent if each is weaker than the other (D  D’)

32 Failure detector  Failure detector  outputs a process q at every process p (we say that p trusts q) and ensures the following property : Eventually, the same correct process is permanently trusted by every process NB. Note that the process that is trusted might keep changing until some eventual time

33 <>Leader   To implement <> Leader using , every process simply returns true if it is leader (the process emulates the output of <> Leader) To implement <> Leader using , every process writes its name in a shared register L when leader() returns true; all processes periodically read L and elect the process in L (eventually, only one process is elected)

34 Failure detector example Failure detector  outputs a process q at every process p (we say that p trusts q) and ensures the following property : ◊ unique leader: eventually, the same correct process is permanently trusted by every process NB. Note that the process that is trusted might keep changing until some eventual time

35 Questions (1) Show that  is the weakest failure detector to implement consensus with Registers (i.e., give an algorithm that implements  with any failure detector that implements Consensus with Registers) (2) What is the weakest failure detector to implement Consensus with objects of consensus number k and Registers? (3) What is the weakest failure to implement an object with consensus number k using Registers?