Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Slides:



Advertisements
Similar presentations
Prof R. Guerraoui Distributed Programming Laboratory
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Consensus problem Agreement. All processes that decide choose the same value. Termination. All non-faulty processes eventually decide. Validity. The common.
1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failures and Consensus. Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Asynchronous Consensus
1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.
Outline Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A.
Distributed Systems Non-Blocking Atomic Commit
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus with Partial Synchrony Kevin Schaffer Chapter 25 from “Distributed Algorithms” by Nancy A. Lynch.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
1 © R. Guerraoui Set-Agreement (Generalizing Consensus) R. Guerraoui.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.
Distributed systems Total Order Broadcast
Distributed Systems, Consensus and Replicated State Machines
Presented By: Md Amjad Hossain
Distributed Algorithms for Failure Detection in Crash Environments
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Failure Detectors motivation failure detector properties
Distributed systems Consensus
Distributed Systems Terminating Reliable Broadcast
Presentation transcript:

Failure Detectors & Consensus

Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Unreliable Failure Detectors A distributed failure detector D consists of a local failure detector module D p at each process p When D p suspects a process j to have crashed it adds j to suspects p, if later on D p realizes it made a mistake it can remove j from suspects p Failure detectors are defined in terms of abstract properties. Namely, two classes of competence and four classes of accuracy.

Completeness Classes Strong Completeness  Eventually, every process that crashes is permanently suspected by every correct process Weak Completeness  Eventually, every process that crashes is permanently suspected by some correct process

Accuracy Classes Strong Accuracy  No process is suspected before it crashes Weak Accuracy  Some correct process is never suspected Eventual Strong Accuracy  There is a time after which correct processes are not suspected by any correct process Eventual Weak Accuracy  There is a time after which some correct process is never suspected by any correct process.

Failure Detectors Classes Strong Weak StrongWeakEventual StrongEventual Weak Accuracy Completeness Perfect P Q Strong S Weak W Eventually Perfect ◊P ◊Q◊Q Eventually Strong ◊S Eventually Weak ◊W

Reducibility A Distributed Algorithm T D→D’ transforms a failure detector D into a failure detector D’ if it maintains a variable output p at every process p which emulates the output of D’ T D→D’ is called a reduction algorithm and D’ is reducible to D, denoted D ≥ D’ (D’ is “weaker”) A simple T ◊S → ◊W ?

From Weak Completeness to Strong Completeness T ◊W → ◊S Code for process p output p ← Φ Task 1: repeat forever suspects p ← ◊W p send(p, suspects p ) to all Task 2: upon receiving (q, suspects q ) for some q output p ← (output p U suspects q ) – {q} ◊S≥◊W && ◊W≥◊S → ◊W=◊S

Consensus In the Consensus problem every process p i proposes a value v i and all correct processes have to decide on some value v, in relation to the set of proposed values. More formally, a distributed consensus algorithm must satisfy:  Termination: Every correct process eventually decides on some value.  Validity: If a process decides v, then v was proposed by some process (non triviality)  Agreement: No two correct processes decide differently It is impossible to solve consensus in asynchronous system even if only one process might crash [FLP]

Solving Consensus using ◊S Code for process p i 1 ≤ i ≤ n Task 1: r i ← 0; est i ← v i ; 1. while didn’t decide do 2. c ← (r i mod n) + 1; est_from_c i ← ∟; r i ← r i if (i = c) then est_from_c i ← est i 4. else wait until is received from p c or c is suspected 5. if received then est_from_c i ← v 6. send to all 7. wait until collected from a majority of processes 8. rec i ← {est_from_c | was received} 9. if rec i = {v} then decide v and send to all 10. if rec i = {v, ∟} then est i ← v Task 2: 1. Upon reception of decide v and send to all