Algorithms for Extracting Timeliness Graphs

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Lecture 8: Asynchronous Network Algorithms
Teaser - Introduction to Distributed Computing
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Chapter 15 Basic Asynchronous Network Algorithms
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Introduction to Self-Stabilization Stéphane Devismes.
What is ’’hard’’ in distributed computing? R. Guerraoui EPFL/MIT joint work with. Delporte and H. Fauconnier (Univ of Paris)
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.
Ordering and Consistent Cuts Presented By Biswanath Panda.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
An Energy Efficient Leaser Election Algorithm for Mobile Ad Hoc Networks Paolo Cemim, Vinicius De Antoni Instituto de Informatica Universidade Federal.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
Composition Model and its code. bound:=bound+1.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Leader Election Algorithms for Mobile Ad Hoc Networks Presented by: Joseph Gunawan.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Exercises for Chapter 11: COORDINATION AND AGREEMENT
Coordination and Agreement
Khaled M. Alzoubi, Peng-Jun Wan, Ophir Frieder
CSE 486/586 Distributed Systems Leader Election
第1部: 自己安定の緩和 すてふぁん どぅゔぃむ ポスドク パリ第11大学 LRI CNRS あどばいざ: せばすちゃ てぃくそい
The countable character of uncountable graphs François Laviolette Barbados 2003.
Minimum Spanning Tree 8/7/2018 4:26 AM
Lecture 9: Asynchronous Network Algorithms
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
Leader Election (if we ignore the failure detection part)
Alternating Bit Protocol
Parallel and Distributed Algorithms
Distributed Systems, Consensus and Replicated State Machines
Fault-tolerant Consensus in Directed Networks Lewis Tseng Boston College Oct. 13, 2017 (joint work with Nitin H. Vaidya)
Presented By: Md Amjad Hossain
Robust Stabilizing Leader Election
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
EEC 688/788 Secure and Dependable Computing
Lecture 8: Synchronous Network Algorithms
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Algorithms for Failure Detection in Crash Environments
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Introduction to Self-Stabilization
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Snap-Stabilization in Message-Passing Systems
Locality In Distributed Graph Algorithms
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Algorithms for Extracting Timeliness Graphs Carole Delporte, LIAFA, Univ. D. Diderot Stéphane Devismes, VERIMAG, Univ. J. Fourier Hugues Fauconnier, LIAFA, Univ. D. Diderot Mikel Larrea, University of the Basque Country Delporte-Gallet et al SIROCCO'2010

Goals (In partially synchronous distributed systems) How to determine the timeliness relations between processes? That is, communication from p to q within a bounded delay Determine? Eventually all processes agrees on the timeliness of some links Delporte-Gallet et al SIROCCO'2010

Why? For example (leader) There exists a process that communicates in a timely way with all others -> leader election (tree) There exist timely paths from p to every other process -> routing (ring) There exists at least one timely ring linking all correct -> ring overlay Delporte-Gallet et al SIROCCO'2010

Also… Timeliness is often used to determine correct processes (p timely received messages from q => q is correct) Leader -> Ω Tree-Routing -> The source is correct (Ω) Ring -> exactly all correct processes (◊ P) (Failure Detectors) Delporte-Gallet et al SIROCCO'2010

Context… Processes : timely Some process crashes (bounds on the time to execute a step -> accurately measure the time) Some process crashes (correct / faulty) Communication: fully connected graph Communication: by messages Reliable links (no message loss) Delporte-Gallet et al SIROCCO'2010

Timeliness The link (p,q) is timely: There exists an unknown bound D: any message sent at time t by p cannot be received by q after time t+D (if (p,q) is not timely, the communication delays from p to q are unbounded) (there exists an unknown bound  eventually there exists an unknown bound) (Timeliness is a property that is defined to a given run) Delporte-Gallet et al SIROCCO'2010

Recalls In asynchronous systems, no hypothesis on the link timeliness In synchronous systems, all links are timely Asynchronous <-> no consensus Synchronous <-> consensus Partially synchronous: Some links are timely Delporte-Gallet et al SIROCCO'2010

Partially synchronous systems: Examples: There exists a process having all its outgoing links timely There exists a time from which all links are timely Remark: in both cases, consensus is possible (Ω can be implemented in the first one and ◊ P in the second one) Delporte-Gallet et al SIROCCO'2010

Timeliness: The timeliness graph of a given run r: T(r)=<S,E> Nodes: correct processes Oriented edges: (p,q) is an edge iff the link from p to q is timely in r Delporte-Gallet et al SIROCCO'2010

Basic tool: Watchdog q can test the link from p to q: p regularly sends "Alive” in the link (p,q) q loads a timer of period T, if it does not receive "ALIVE” from p within T time, q blames (p,q) and increases T If the link (p,q) is timely (and p correct) eventually, T is sufficiently large so that q never more blame (p,q) If the link (p,q) is not timely (and q correct), q will blame (p,q) infinitely often Timely link  Finite number of blaming (assumption: FIFO links) Delporte-Gallet et al SIROCCO'2010

r is in R(X) if there exists G in X that is compatible with T(r) Systems G=<S,E> is compatible a with T(r)=<Sr,Er> (1) S=Sr (2) All edges of E are timely in T(r). A system X is defined by a set of timeliness graphs: Let R(X) the set of run of X: r is in R(X) if there exists G in X that is compatible with T(r) Delporte-Gallet et al SIROCCO'2010

Some systems… ASYNC: G=<S,Æ> COMPLETE: all complete graphs STAR: all star graphs TREE: all out-trees RING: all rings SC: all strongly connected graphs PAIR: all cycles of two elements Delporte-Gallet et al SIROCCO'2010

Extraction Examples: We want (when it is possible) To build a star To build a (out-) tree To build a ring Moreover, we want: Only timely links All nodes must be (or almost be) correct processes Delporte-Gallet et al SIROCCO'2010

Almost? In the general case, it is not possible to ensure that all processes of the extracted graph are correct… (We can just evaluate the timeliness to know if a process is correct) However we can evaluate: if G satisfies G contains at least all the corrects We don’t know G[Correct] but… Delporte-Gallet et al SIROCCO'2010

Di-cut (directed cut) In the extracted graph, if there is no link outgoing from p supposed to be timely (e.g. p is a sink), no process can determine if p is correct… In the same way, if all the links from p lead to faulty processes. (X,Y) is a dicut of G=<S,E> iff (X,Y) is a partition of S such that there is no (directed) link from Y to X Delporte-Gallet et al SIROCCO'2010

Almost? In the general case, it is not possible to ensure that all nodes of the extracted graph are correct… However, we can ensure that: the extracted graph G satisfies G contains at least all the correct processes G[Correct] is either G or (Correct, F) is a dicut of G where F is a subset of faulty processes Delporte-Gallet et al SIROCCO'2010

Extraction: Algorithm for extracting a graph from X Each p has a variable Gp, for all run r there exists G in X: Convergence: for all correct process, there exists a time t from which Gp=G Compatibility: G[Correct(r)] is compatible with T(r) Closure: G[Correct(r)] is a dicut reduction of G (or G itself) Delporte-Gallet et al SIROCCO'2010

Some results: If G is extracted, (p,q) is an edge of G, and q is correct, then p is correct. If p0,…,pm such that p0 and pm are correct is a path of the extracted graph, then for 0≤i<m, (pi,pi+1) is timely and all pi are correct (in particular, we obtain a route from p0 to pm that only contains timely links) If G is strongly connected, G[correct]=G. Delporte-Gallet et al SIROCCO'2010

Main result: If a family of graph X is closed by dicut reduction (for G in X and (A,B) a dicut of G, we have G[A] is in X), then we can always extract a graph from X. If every graph of X is strongly connected, then the extracted graph G satisfies G[Correct]=G Delporte-Gallet et al SIROCCO'2010

Example In STAR, we extract a star graph whose center is a correct process (Ω) In TREE, we extract a out-tree whose root is a correct process p0 and such that for all correct process q, there exists a tree-path from p0 to q that only contains correct processes and timely links In RING, we extract a ring among all correct processes and containing only timely links (In contrast, for PAIR, there is no extraction algorithm) Delporte-Gallet et al SIROCCO'2010

Principles of the algorithm Watch and punish Regularly test (p,q): (p,q) timely  q blames (p,q) only a finite number of time For each (p,q)-blaming, punish all G containing (p,q): increase the counter of G For each process p, punish all G that does not contain p (reliably) broadcast the counters Choose the graph with the smallest counter value Any graph whose all links are timely and containing all correct in the run is only finitely blamed -> finite counter Any graph having at least one asynchronous link or that misses some correct will be blamed infinitely often -> infinite counter Delporte-Gallet et al SIROCCO'2010

Moreover… Enhancement: If there exists a spanning out-tree in all graph of X, eventually the messages are only sent through the links of the extracted graph Examples: STAR, TREE, RING, O(n) links are used (instead of O(n2)) Delporte-Gallet et al SIROCCO'2010

Conclusion and perspectives Timeliness <-> failures Timeliness allows to detect failures (the only way?) Timeliness is useful (independently of failures detection) Algorithm Complexity… Impossibility results Delporte-Gallet et al SIROCCO'2010

Delporte-Gallet et al SIROCCO'2010