Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Slides:



Advertisements
Similar presentations
Distributed systems Total Order Broadcast Prof R. Guerraoui Distributed Programming Laboratory.
Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Impossibility of Distributed Consensus with One Faulty Process
Prof R. Guerraoui Distributed Programming Laboratory
CS 542: Topics in Distributed Systems Diganta Goswami.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Teaser - Introduction to Distributed Computing
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
Distributed Systems Terminating Reliable Broadcast Prof R. Guerraoui Distributed Programming Laboratory.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Distributed Systems – CS425/CSE424/ECE428 – Fall Nikita Borisov — UIUC1.
CSE 486/586 Distributed Systems Failure Detectors
CS542: Topics in Distributed Systems Diganta Goswami.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Lecture 11 Failure Detectors (Sections 12.1 and part of 2.3.2) Klara Nahrstedt CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009)
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Distributed Systems Lecture 4 Failure detection 1.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586 Distributed Systems Failure Detectors
CSE 486/586 Distributed Systems Failure Detectors
CSE 486/586 Distributed Systems Failure Detectors
Distributed systems Total Order Broadcast
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
Presented By: Md Amjad Hossain
EEC 688/788 Secure and Dependable Computing
Distributed systems Reliable Broadcast
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed systems Consensus
CSE 486/586 Distributed Systems Failure Detectors
Broadcasting with failures
Distributed Systems Terminating Reliable Broadcast
Presentation transcript:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms

2 Acknowledgement  The slides for this lecture are based on ideas and materials from the following sources:  Introduction to Reliable Distributed Programming Guerraoui, Rachid, Rodrigues, Luís, 2006, 300 p., ISBN: (+ teaching material)  ID2203 Distributed Systems Advanced Course by Prof. Seif Haridi from KTH – Royal Institute of Technology (Sweden)  CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University  Distributed Systems : An Algorithmic Approach by Sukumar, Ghosh, 2006, 424 p.,ISBN: (+teaching material)  Various research papers  Course from F. Bongiovanni  A few slides from SARDAR MUHAMMAD SULAMAN

3 Failure detectors

4 System models - reminder  synchronous distributed system  each message is received within bounded time  each step in a process takes lb < time < ub  each local clock’s drift has a known bound  asynchronous distributed system  no bounds on process execution  no bounds on message transmission delays  arbitrary clock drifts the Internet is an asynchronous distributed system

5 Arbitrary (Byzantine) Crashes and recoveries Omissions Crashes Failure model - reminder  First we must decide what do we mean by failure?  Different types of failures  Crash-stop (fail-stop)  A process halts and does not execute any further operations  Crash-recovery  A process halts, but then recovers (reboots) after a while  Crash-stop failures can be detected in synchronous systems  Next: detecting crash-stop failures in asynchronous systems

6 What's a Failure Detector ? PiPi PjPj Crash failure Needs to know about P J 's failure

7 1. Ping-ack protocol PiPi PjPj Needs to know about P J 's failure - P i queries P j once every T time units - if P j does not respond within T time units, P i marks p j as failed If p j fails, within T time units, p i will send it a ping message, and will time out within another T time units. Detection time = 2 T ping ack - P j replies

8 2. Heart-beating protocol PiPi PjPj Needs to know about P J 's failure - if P i has not received a new heartbeat for the past T time units, P i declares P j as failed heartbeat - P j maintains a sequence number - P j send P i a heartbeat with incremented seq. number after T' (=T) time units

9 Failure Detectors  Basic properties  Completeness  Every crashed process is suspected  Accuracy  No correct process is suspected Both properties comes in two flavours  Strong and Weak

10 Failure Detectors  Strong Completeness  Every crashed process is eventually suspected by every correct process  Weak Completeness  Every crashed process is eventually suspected by at least one correct process  Strong Accuracy  No correct process is ever suspected  Weak Accuracy  There is at least one correct process that is never suspected Perfect Failure Detector (P) = Strong completeness + strong accuracy (difficult)

Perfect failure detector P  Assume synchronous system  Max transmission delay between 0 and δ time units  Every γ time units, each node:  Sends to all nodes  Each node waits γ+δ time units  If did not get from pi Detect

12 Correctness of P  PFD1 (strong completeness)  A crashed node doesn’t send  Eventually every node will notice the absence of  PFD2 (strong accuracy)  Assuming local computation is negligible  Maximum time between 2 heartbeats  γ + δ time units  If alive, all nodes will recv hb in time  No inaccuracy

An algorithm for P Upon event (HBTimeout) For all pi in P Send HeartBeat to pi startTimer (gamma, HBTimeout) Upon event Receive HeartBeat from pj alive:=alive pj Upon event (DetectTimeout) crashed := P \ alive for all pi in crashed Trigger (crashed, pi) alive := startTimer (delta+gamma, DetectTimeout) P: set of processes

Eventually perfect failure detector <>P  For asynchronous system  We suppose there is an unknown maximal transmission delay -- partially synchronous system  Every γ time units, each node:  Sends to all nodes  Each node waits T time units  If did not get from pi Indicate if pi is not in suspected Put pi in suspected set  If get from pi and pi is suspected Indicate remove pi from suspected Increase timeout T

15 Correctness of P  PFD1 (strong completeness)  Idem  PFD2 (strong accuracy)  Each time p is inaccurately suspected by a correct q  Timeout T is increased at q  Eventually system becomes synchronous, and T becomes larger than the unknown bound δ (T>γ+δ)  q will receive HB on time, and never suspect p again

isis An algorithm for <>P Upon event (HBTimeout) For all pi in P Send HeartBeat to pi startTimer (gamma, HBTimeout) Upon event Receive HeartBeat from pj alive:=alive pj Upon event (DetectTimeout) for all pi in P if pi not in alive and pi not in suspected suspected :=suspected pi Trigger (suspected, pi) if pi in alive and pi in suspected suspected :=suspected \ pi Trigger (restore, pi) T:=T+delta alive := startTimer (T, DetectTimeout) idem suspected initialized to

17 Exercise Eventually Perfect Failure Detector: an alternative algorithm

Exercise: is this a good algorithm? What is the delay between two heartbeats? At the begining? At any point in time? Can you find a formula for this depending on the number of failures suspected/recovered. Is there a maximal time before a failure is detected?(supposing there is a bound Delta on maximal communication time)

19 Consensus (agreement) 19  In the consensus problem, the processes propose values and have to agree on one among these values  Solving consensus is key to solving many problems in distributed computing (e.g., total order broadcast, atomic commit, terminating reliable broadcast) B A C

20 Consensus – basic properties  Termination  Every correct node eventually decides  Agreement  No two correct processes decide differently  Validity  Any value decided is a value proposed  Integrity:  A node decides at most once  A variant: UNIFORM CONSENSUS  Uniform agreement: No two processes decide differently

Consensus Events Request: Indication: Properties: C1, C2, C3, C4` algorithm I A P-based (fail-stop) consensus algorithm The processes exchange and update proposals in rounds and decide on the value of the non-suspected process with the smallest id [Gue95] Consensus algorithm II A P-based (i.e., fail-stop) uniform consensus algorithm The processes exchange and update proposal in rounds, and after n rounds decide on the current proposal value [Lyn96]

Consensus algorithm I The processes go through rounds incrementally (1 to n): in each round, the process with the id corresponding to that round is the leader of the round The leader of a round decides its current proposal and broadcasts it to all A process that is not leader in a round waits (a) to deliver the proposal of the leader in that round to adopt it, or (b) to suspect the leader

Consensus algorithm I Implements: Consensus (cons). Uses: BestEffortBroadcast (beb). PerfectFailureDetector (P). upon event do suspected := empty; round := 1; currentProposal := nil; broadcast := delivered[] := false;

upon event do currentProposal := value; delivered[round] := true; upon eventdelivered[round] = true or p round  suspected do round := round + 1; upon event p round =self and broadcast=false and currentProposal  nil do trigger ; broadcast := true; upon event do suspected := suspected U {pi}; upon event do if currentProposal = nil then currentProposal := v;

p1 p2 p3 propose(0)decide(0) propose(1) propose(0) Consensus algorithm I decide(0)

p1 p2 p3 propose(0) decide(0) propose(1) propose(0) decide(1) crash Consensus algorithm I

Failure – another example

Correctness argument Let pi be the correct process with the smallest id in a run R. Assume pi decides v. If i = n, then pn is the only correct process. Otherwise, in round i, all correct processes receive v and will not decide anything different from v. They are all located after i. Question: How do you ensure that a message does not arrive too late? (in the wrong round)

Algorithm II: Uniform consensus The “Hierarchical Uniform Consensus” algorithm uses a perfect failure-detector, a best-effort broadcast to disseminate the proposal, a perfect links abstraction to acknowledge the receipt of a proposal, and a reliable broadcast abstraction to disseminate the decision Every process maintains a single proposal value that it broadcasts in the round corresponding to its rank. When it receives a proposal from a more importantly ranked process, it adopts the value In every round of the algorithm, the process whose rank corresponds to the number of the round is the leader, i.e., the most importantly ranked process is the leader of round 1

Algorithm II: Uniform consensus (2) A round here consists of two communication steps: within the same round, the leader broadcasts a PROPOSAL message to all processes, trying to impose its value, and then expects to obtain an acknowledgment from all correct processes Processes that receive a proposal from the leader of the round adopt this proposal as their own and send an acknowledgment back to the leader of the round If the leader succeeds in collecting an acknowledgment from all processes except detected as crashed, the leader can decide. It disseminates the decided value using a reliable broadcast communication abstraction

Example – no failure

Example – failure (1)

Example – failure (2)

Correctness ??? Validity and Integrity follows from the properties of the underlying communication, and the algorithm Agreement Assume two processes decide differently, this can happens if two decisions were rbBroadcast Assume pi and pj, j > i, rbBroadcast two decisions vi and vj, because of accuracy of P, pj must have adopted the value vi

Exercise: uniform consensus What if process 2 fails? draw an example How many processes can fail (how many faults does the algorithm tolerate)? Is the reliable broadcast necessary?

Final words Can you write a distributed algorithm now? Study its properties? Study the required conditions for its safety? One word on formal methods

For next week Study the algorithm on the next slides: 1 - Show a failure free execution and 2 execution with faults 2 – is it a correct consensus? Why? 3 – is it a uniform consensus? Why?