Around Self-Stabilization Part 2: Strengthened Forms of Self-Stabilization Stéphane Devismes Post-Doc CNRS at the LRI (Paris VII)

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Teaser - Introduction to Distributed Computing
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Snap-stabilizing Committee Coordination Borzoo Bonakdarpour Stephane Devismes Franck Petit IEEE International Parallel and Distributed Processing Symposium.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Snap-Stabilizing Detection of Cutsets Alain Cournier, Stéphane Devismes, and Vincent Villain HIPC’2005, December , Goa (India)
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Introduction to Self-Stabilization Stéphane Devismes.
Byzantine Generals Problem: Solution using signed messages.
From Self- to Snap- Stabilization Alain Cournier, Stéphane Devismes, and Vincent Villain SSS’2006, November 17-19, Dallas (USA)
Self-Stabilization: An approach for Fault-Tolerance in Distributed Systems Stéphane Devismes 16/12/2013MAROC'2013.
CPSC 668Set 1: Introduction1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Selected topics in distributed computing Shmuel Zaks
1 A Mutual Exclusion Algorithm for Ad Hoc Mobile networks Presentation by Sanjeev Verma For COEN th Nov, 2003 J. E. Walter, J. L. Welch and N. Vaidya.
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Why do we need models? There are many dimensions of variability in distributed systems. Examples: interprocess communication mechanisms, failure classes,
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Snap-Stabilizing PIF and Useless Computations Alain Cournier, Stéphane Devismes, and Vincent Villain ICPADS’2006, July , Minneapolis (USA)
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Consensus and Its Impossibility in Asynchronous Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 1: Introduction 1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
A Self-Stabilizing O(n)-Round k-Clustering Algorithm Stéphane Devismes, VERIMAG.
Dissecting Self-* Properties Andrew Berns & Sukumar Ghosh University of Iowa.
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.
Weak vs. Self vs. Probabilistic Stabilization Stéphane Devismes (CNRS, LRI, France) Sébastien Tixeuil (LIP6-CNRS & INRIA, France) Masafumi Yamashita (Kyushu.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
SysRép / 2.5A. SchiperEté The consensus problem.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
CS 542: Topics in Distributed Systems Self-Stabilization.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Self-Stabilizing Algorithm with Safe Convergence building an (f,g)-Alliance Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Self-stabilizing (f,g)-Alliances with Safe Convergence Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Snap-Stabilizing Committee Coordination Borzoo Bonakdarpour, Stéphane Devismes, and Frank Petit.
Snap-Stabilizing Depth-First Search on Arbitrary Networks Alain Cournier, Stéphane Devismes, Franck Petit, and Vincent Villain OPODIS 2004, December
CSE-591: Term Project Self-stabilizing Network Algorithms by Tridib Mukherjee ASU ID :
When Is Agreement Possible
第1部: 自己安定の緩和 すてふぁん どぅゔぃむ ポスドク パリ第11大学 LRI CNRS あどばいざ: せばすちゃ てぃくそい
New Variants of Self-Stabilization
CS60002: Distributed Systems
A Snap-Stabilizing DFS with a Lower Space Requirement
Robust Stabilizing Leader Election
Algorithms for Extracting Timeliness Graphs
Introduction to Self-Stabilization
Corona Robust Low Atomicity Peer-To-Peer Systems
Snap-Stabilization in Message-Passing Systems
Presentation transcript:

Around Self-Stabilization Part 2: Strengthened Forms of Self-Stabilization Stéphane Devismes Post-Doc CNRS at the LRI (Paris VII)

06/08/2008Computer Science Department, University of Osaka1 Roadmap 1. Self-Stabilization (recall) 2. Motivation 3. Tolerating more types of fault 4. FTSS 5. Enhance the convergence 6. Snap-Stabilization 7. Conclusion

06/08/2008Computer Science Department, University of Osaka2 Self-Stabilization (recall) [Dijkstra 1974] General approach for recovering from the effect of any transient faults

06/08/2008Computer Science Department, University of Osaka3 Motivation Self-Stabilization includes several advantages: 1.Tolerance to any transient fault: No hypothesis on the nature of extent of transient faults Recovers from the effects of those faults in a unified manner 2.No initialization: Large scale systems 3.Dynamicity: Self-organization in sensor and ad hoc networks

06/08/2008Computer Science Department, University of Osaka4 Motivation But also several drawbacks: 1.Impossibility results Some fundamental problems have no self-stabilizing solution 2.Overhead Self-stabilizing protocols can make use of a large amount of resources 3.Usually not tolerant for other kinds of fault 4.Eventual safety During the convergence, almost nothing is guaranteed Weakened Forms Strengthened Forms

06/08/2008Computer Science Department, University of Osaka5 Motivation Strengthened Forms for:  Tolerating more types of faults  Enhance the convergence property Converging quickly in some (frequent) cases Ensure some weak safety property when there are faults

06/08/2008Computer Science Department, University of Osaka6 Tolerating more types of faults Types of faults:  Transient  Intermittent  Crash  Byzantine

06/08/2008Computer Science Department, University of Osaka7 Tolerating more types of fault Transient Faults:  Usually treated by the Self-Stabilization  Duration: finite  Periodicity: rare  Effect: alter the contain of some component(s) of the network (processes and/or links)  E.g., memory/message corruption, crash-recover, lose of messages…

06/08/2008Computer Science Department, University of Osaka8 Tolerating more types of fault Intermittent Faults:  Duration: finite  Periodicity: frequent  Effect: alter the contain of some component(s) of the network (processes and/or links)  E.g., memory/message corruption, crash-recover, lose of messages…  Some paper deals with both self-stabilization and certain types of intermittent fault, e.g., [Delaët and Tixeuil, JPDC’02] Fair lose of message + finite number of message corruption

06/08/2008Computer Science Department, University of Osaka9 Tolerating more types of fault Crash Failures:  Duration: definitive  Effect: some component(s) of the network (processes and/or links) definitively stops working  E.g., process crash, link removal  Fault-Tolerant Self-Stabilization (FTSS) [ Gopal and Perry, PODC’93] Usually consider process crash only.

06/08/2008Computer Science Department, University of Osaka10 Tolerating more types of fault Byzantine Failures:  Duration: unlimited  Effect: some component(s) of the network (usually processes) work in an arbitrary manner  E.g., processes hit by an attack  Byzantine-Tolerant Self-Stabilization [Dolev and Welch, PODC’95 ] Restriction on the number of Byzantine processes and/or Some synchrony assumptions

Robust Stabilizing Leader Election Carole Delporte-Gallet (LIAFA) Stéphane Devismes (CNRS, LRI) Hugues Fauconnier (LIAFA) LIAFA

06/08/2008Computer Science Department, University of Osaka12 Topics Designing Leader Election protocols in message- passing model that are 1.Crash tolerant 2.Self-Stabilizing 3.Communication-Efficient 4.With weak synchrony assumption

06/08/2008Computer Science Department, University of Osaka13 Model Fully-connected network Communications using messages Link :  Unidirectional  No order on the delivers  May be synchronous Process :  Synchronous or crashed  With identifier  State initially arbitrary 1234

06/08/2008Computer Science Department, University of Osaka14 Communication-Efficiency [Larrea, Fernandez, and Arevalo, 2000]: « An algorithm is communication-efficient if it eventually only uses n - 1 unidirectional links » 1234

06/08/2008Computer Science Department, University of Osaka15 Related Works [Gopal and Perry, PODC’93] [Anagnostou and Hadzilacos, WDAG’93] [Beauquier and Kekkonen-Moneta, JSS’97] Communication-Efficiency never considered

06/08/2008Computer Science Department, University of Osaka16 Self-Stabilizing Leader Election in a full timely network? Yes + communication-efficiency

06/08/2008Computer Science Department, University of Osaka17 Algorithm (1/4) Each process p periodically sends ALIVE,p to each other if Leader = p Leader=1 Leader=2 ALIVE,2 ALIVE,1

06/08/2008Computer Science Department, University of Osaka18 Algorithm (2/4) When an alive process p such that Leader = p receives ALIVE from process q,  Leader := q if q < p Leader=1 Leader=2 ALIVE,2 ALIVE,1 Leader=1 4

06/08/2008Computer Science Department, University of Osaka19 Algorithm (3/4) Each alive process q such that Leader ≠ q always chooses as leader the process from which it receives ALIVE the most recently Leader=1 Leader=2Leader=1 ALIVE,1 Leader=1 4

06/08/2008Computer Science Department, University of Osaka20 Algorithm (4/4) On Time out, each alive process p sets Leader to p Leader=3 Leader=2Leader=4 ALIVE,2 ALIVE,1 Leader=1 Leader=2 4

06/08/2008Computer Science Department, University of Osaka21 Communication-Efficient Self-Stabilizing Leader Election in a system where at most one link is asynchronous? No

06/08/2008Computer Science Department, University of Osaka22 Impossibility of Communication-Efficiency in a system with at most one asynchronous link Claim: Any process p such that Leader ≠ p must periodically receive messages within a bounded time otherwise it chooses another leader The process chooses another leader

06/08/2008Computer Science Department, University of Osaka23 Self-Stabilizing (non communication-efficient) Leader Election in a system where some links are asynchronous? Yes

06/08/2008Computer Science Department, University of Osaka24 Self-Stabilizing Leader Election in a system with a timely routing overlay For each pair of alive processes (p,q), there exists at least two paths of timely links:  From p to q  From q to p

06/08/2008Computer Science Department, University of Osaka25 Algorithm Each process computes the set of alive processes and chooses as leader the smallest process of this set To compute the set: 1.Each process p periodically sends ALIVE,p to every other process 2.Any ALIVE,p message is repeated n - 1 times (any other process periodically receives such a message)

06/08/2008Computer Science Department, University of Osaka26 Self-Stabilizing Leader Election in a system without timely routing overlay ? No

06/08/2008Computer Science Department, University of Osaka27 Conclusion Obtaining algorithms that are both self-stabilizing and crash tolerant is highly desirable But designing communication-efficient solution requires strong synchrony assumption even if the network is fully-connected Solution: FTPS (Fault-Tolerant Pseudo-Stabilization)

06/08/2008Computer Science Department, University of Osaka28 Enhance The Convergence Fault-containing Self-Stabilization Time-Adaptive Self-Stabilization Safe-Converging Self-Stabilization Superstabilization Snap-Stabilization

06/08/2008Computer Science Department, University of Osaka29 Fault-Containing Self-Stabilization [Ghosh et al, PODC’96] Self-stabilizing + if there is a few number of faults:  Spatial containment: a few number of processes can be contaminated by the faults  Fast convergence time

06/08/2008Computer Science Department, University of Osaka30 Time-Adaptive Self-Stabilization [Kutten & Patt-Shamir, PODC’97] Self-stabilizing and if f<k processes are faulty:  The output of the algorithm stabilizes in O(f) Faults hit f processesThe output is stabilized The state is stabilized

06/08/2008Computer Science Department, University of Osaka31 Safe-Converging Self-Stabilization [Kakugawa & Masuzawa, IPDPS’06] Self-stabilizing and fast convergence to a weaker (useful) predicate E.g. Minimal Dominating Set (MDS): Arbitrary initial configuration DS MDS

06/08/2008Computer Science Department, University of Osaka32 Superstabilization [Dolev & Herman, CJTCS’97] A Superstabilizing Algorithm  Must be self-stabilizing  Must preserve a “passage predicate”  Passage Predicate - Defined with respect to a class of topology changes (A topology change falsifies legitimacy and therefore the passage predicate must be weaker than legitimacy but strong enough to be useful). Topological change Passage Predicate

06/08/2008Computer Science Department, University of Osaka33 Passage Predicate - Example In a token ring: A processor crash can lose the token but still not falsify the passage predicate Passage PredicateLegitimate State At most one token exists in the system. (e.g. the existence of 2 tokens isn’t legal) Exactly one token exists in the system.

06/08/2008Computer Science Department, University of Osaka34 Snap-Stabilization [Bui et al, WSS’99] A snap-stabilizing algorithm immediately operates correctly after the end of the faults Request-based algorithm and user-centric point of view:  Each time a user initiates a request, it obtain a correct result for its request

06/08/2008Computer Science Department, University of Osaka35 Snap-Stabilization

06/08/2008Computer Science Department, University of Osaka36 Self vs. Snap 1.X 2.X N.X

06/08/2008Computer Science Department, University of Osaka37 Self vs. Snap 1.X

Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil (LIP6)

06/08/2008Computer Science Department, University of Osaka39 Message-Passing Model Network bidirectional and fully-connected Communications by messages Links asynchronous, fair, and FIFO Ids on processes Transient faults m1m1 m2m2 m3m3 m3m3 mama mbmb mama mbmb 1234 m

06/08/2008Computer Science Department, University of Osaka40 Related Works in message-passing (reliable communication in self-stabilization) [Gouda & Multari, 1991]  Deterministic + Unbounded Capacity => Unbounded Counter  Deterministic + Bounded Capacity => Bounded Counter [Afek & Brown, 1993]  Probabilistic + Unbounded Capacity + Bounded Counter ? ?

06/08/2008Computer Science Department, University of Osaka41 Related Works in message-passing (self-stabilization) [Varghese, 1993]  Deterministic + Bounded Capacity [Katz & Perry, 1993]  Unbounded Capacity, deterministic, infinite counter [Delaët et al]  Unbounded Capacity, deterministic, finite memory  Silent tasks

06/08/2008Computer Science Department, University of Osaka42 Related Works (snap-stabilization) Nothing in the Message-Passing Model Only in State Model:  Locally Shared Memory  Composite Atomicity [Cournier et al, 2003]

Snap-Stabilization in Message-Passing Systems

06/08/2008Computer Science Department, University of Osaka44 Case 1: unbounded capacity links Impossible for safety-distributed specifications

06/08/2008Computer Science Department, University of Osaka45 B A Safety-distributed specification p q Example : Mutual Exclusion

06/08/2008Computer Science Department, University of Osaka46 A Safety-distributed specification p spsp m1m1 m2m2 m3m3 m4m4 m5m5 B q sqsq m’ 1 m’ 2 m’ 3 m’ 4

06/08/2008Computer Science Department, University of Osaka47 A Safety-distributed specification p spsp m1m1 m2m2 m3m3 m4m4 m5m5 B q sqsq m’ 1 m’ 2 m’ 3 m’ 4

06/08/2008Computer Science Department, University of Osaka48 Case 2: bounded capacity links Problem to solve: Reliable Communication Starting from any configuration, if Tintin sends a question to Captain Haddock, then: Tintin eventually receives good answers Tintin only delivers the good answers ? ?

06/08/2008Computer Science Department, University of Osaka49 Case 2: bounded capacity links Case Study: Single-Message Capacity 0 or 1 message

06/08/2008Computer Science Department, University of Osaka50 Case 2: bounded capacity links Sequence number State  {0,1,2,3,4} p q State p State q 0 NeigState p NeigState q ? ?? 0 1 Until State p = 4 ?

06/08/2008Computer Science Department, University of Osaka51 Case 2: bounded capacity links Pathological Case: p q State p State q 0 NeigState p NeigState q ? 1?

06/08/2008Computer Science Department, University of Osaka52 Generalizations Arbitrary Bounded Capacity  2xC max +3 values p q C max values 1 value

06/08/2008Computer Science Department, University of Osaka53 Generalizations PIF in fully-connected network m m m AmAm AmAm AmAm

06/08/2008Computer Science Department, University of Osaka54 Application Mutual Exclusion in a fully-connected & identified network using the PIF

06/08/2008Computer Science Department, University of Osaka55 Mutual Exclusion Specification:  Any process that requests the CS enters in the CS in finite time (Liveness)  If a requesting process enters in the CS, then it executes the CS alone (Safety) N.b. Some non-requesting processes may be initially in the CS

06/08/2008Computer Science Department, University of Osaka56 Principles (1/6) Let L be the process with the smallest ID L decides using Value L which is authorized to access the CS 1.if Value L = 0, then L is authorized 2.if Value L = i, then the i th neighbour of L is authorized When a process learns that it is authorized by L to access the CS: 1.It ensures that no other process can execute the CS 2.It executes the CS, if it requests it 3.It notifies L when it terminates Step 2 (so that L increments Value L )

06/08/2008Computer Science Department, University of Osaka57 Principles (2/6) Each process sequentially executes 4 phases infinitely often A requesting process p can enter in the CS only after executing Phases 1 to 4 consecutively  The CS is in Phase 4

06/08/2008Computer Science Department, University of Osaka58 Principles (3/6) Process p evaluates the IDs 5 2 Id? Leader=2 Phase=1

06/08/2008Computer Science Department, University of Osaka59 Principles (4/6) Process p asks if Value q = p to each other process q 5 2 Ok? No Yes No 3 8 Leader=2 Value= Value=2Value= Ok=true Phase=2

06/08/2008Computer Science Department, University of Osaka60 Principles (5/6) If Winner(p) then p broadcasts EXIT to every other process 5 2 Exit Ok 3 8 Leader=2 Value= Value=2 Value= Ok=true Phase=3 Winner(5)=true Winner(2)=? Winner(3)=? Winner(8)=? Phase=? Leader=? Ok=? Phase=? Leader=? Ok=? Phase=? Leader=? Ok=? Phase=1

06/08/2008Computer Science Department, University of Osaka61 Principles (6/6) If Winner(p) then CS; If p≠L, then p broadcasts ExitCS, else p increments Value p 5 2 ExitCS Ok 3 8 Leader=2 Value= Value=2 Value= Ok=true Phase=4 Winner(5)=true Winner(2)=? Winner(3)=? Winner(8)=? Leader=? Ok=? Leader=? Ok=? Leader=? Ok=? Phase=1 Value=3

06/08/2008Computer Science Department, University of Osaka62 Conclusion Snap-Stabilization in message-passing is no more an open question

06/08/2008Computer Science Department, University of Osaka63 Extensions Apply snap-stabilization in message-passing to:  Other topologies (tree, arbitrary topology)  Other problems  Other failure patterns Space requirement

まいど おおきに !