Distributed Algorithms for Failure Detection in Crash Environments

Slides:



Advertisements
Similar presentations
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Advertisements

Teaser - Introduction to Distributed Computing
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failures and Consensus. Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Asynchronous Consensus
1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.
UPV / EHU Brief Announcement: An Efficient Failure Detector for Omission Environments R. Cortiñas, I. Soraluze, A. Lafuente, M. Larrea University of the.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
An Energy Efficient Leaser Election Algorithm for Mobile Ad Hoc Networks Paolo Cemim, Vinicius De Antoni Instituto de Informatica Universidade Federal.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Lecture 17: Leader Election
Distributed systems Total Order Broadcast
Implementing Consistency -- Paxos
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
FLP Impossibility & Weakest Failure Detector
Presented By: Md Amjad Hossain
ACM Transactions on Information and System Security, November 2001
PERSPECTIVES ON THE CAP THEOREM
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
Distributed systems Reliable Broadcast
EEC 688/788 Secure and Dependable Computing
Solving Equations 3x+7 –7 13 –7 =.
Synchronization (2) – Mutual Exclusion
Algorithms for Extracting Timeliness Graphs
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Failure Detectors motivation failure detector properties
Implementing Consistency -- Paxos
Distributed systems Consensus
Broadcasting with failures
Distributed Systems Terminating Reliable Broadcast
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Distributed Algorithms for Failure Detection in Crash Environments R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU

Guest Stars: P, S and Omega P: strong completeness, eventual strong accuracy Eventually every process that crashes is permanently suspected by every correct process There is a time after which correct processes are not suspected by any correct process S: strong completeness, eventual weak accuracy There is a time after which some correct process is never suspected by any correct process Omega: eventual leader election There is a time after which all the correct processes always trust the same correct process Master SIA – Sistemas Distribuidos

The First P Algorithm [CT96] Master SIA – Sistemas Distribuidos

Communication Optimality A ring arrangement of processes Master SIA – Sistemas Distribuidos

Communication Optimality Communication-efficient algorithms: n links are used forever Master SIA – Sistemas Distribuidos

Communication Optimality Communication-optimal algorithms: C links are used forever Master SIA – Sistemas Distribuidos

Communication-optimal P Master SIA – Sistemas Distribuidos

Communication-optimal Omega We also propose an optimal implementation of S, the weakest failure detector for solving Consensus: processes ordered: p1, ..., pn heartbeat strategy communication pattern: one-to-successors based on a trusted process (instead of a list of suspected processes) Master SIA – Sistemas Distribuidos

Communication-optimal Omega i) Initially, p1 starts sending messages periodically to the rest of processes, and all processes trust p1 p1 p2 p3 p4 p5 trusted1 = p1 trusted2 = p1 trusted3 = p1 trusted4 = p1 trusted5 = p1 Master SIA – Sistemas Distribuidos

Communication-optimal Omega ii) If a process does not receive a message within some timeout period from its trusted process pi, then it suspects pi and takes the next process pi+1 as its new trusted process p1 p2 p3 p4 p5 trusted1 = p1 trusted2 = p1 trusted3 = p1 timeout on p1 trusted4 = p2 trusted5 = p1 Master SIA – Sistemas Distribuidos

Communication-optimal Omega iii) If a process trusts itself, then it starts sending messages periodically to its successors p1 p2 p3 p4 p5 trusted1 = p1 timeout on p1 trusted2 = p2 trusted3 = p1 trusted4 = p2 trusted5 = p1 Master SIA – Sistemas Distribuidos

Communication-optimal Omega iv) If a process receives a message from a process pi preceding its trusted process, then it will trust pi again, increasing its timeout period with respect to pi p1 p2 p3 p4 p5 trusted1 = p1 message from p1 trusted2 = p1 timeout_period21++ trusted3 = p2 message from p1 trusted4 = p1 timeout_period41++ trusted5 = p1 Master SIA – Sistemas Distribuidos

Communication-optimal Omega Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p1, ..., pn This property trivially allows us to provide the properties of S: Eventual weak accuracy: by not suspecting the trusted process Strong completeness: by suspecting all the processes except the trusted process Master SIA – Sistemas Distribuidos

Communication-optimal Omega Master SIA – Sistemas Distribuidos