Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Teaser - Introduction to Distributed Computing
Snap-stabilizing Committee Coordination Borzoo Bonakdarpour Stephane Devismes Franck Petit IEEE International Parallel and Distributed Processing Symposium.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Fabian Kuhn, Microsoft Research, Silicon Valley
Snap-Stabilizing Detection of Cutsets Alain Cournier, Stéphane Devismes, and Vincent Villain HIPC’2005, December , Goa (India)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Lecture 4: Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield.
Stéphane Devismes VERIMAG UMR 5104 Univ. Joseph Fourier Grenoble, France Optimal Exploration of Small Rings Talk by Franck Petit, Univ. Pierre et Marie.
Introduction to Self-Stabilization Stéphane Devismes.
From Self- to Snap- Stabilization Alain Cournier, Stéphane Devismes, and Vincent Villain SSS’2006, November 17-19, Dallas (USA)
Self-Stabilization: An approach for Fault-Tolerance in Distributed Systems Stéphane Devismes 16/12/2013MAROC'2013.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar MIT Lab for Computer Science.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
LSRP: Local Stabilization in Shortest Path Routing Anish Arora Hongwei Zhang.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Leader Election in Rings
Performance Comparison of Existing Leader Election Algorithms for Dynamic Networks Mobile Ad Hoc (Dynamic) Networks: Collection of potentially mobile computing.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
GS 3 GS 3 : Scalable Self-configuration and Self-healing in Wireless Networks Hongwei Zhang & Anish Arora.
Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Selected topics in distributed computing Shmuel Zaks
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa.
Snap-Stabilizing PIF and Useless Computations Alain Cournier, Stéphane Devismes, and Vincent Villain ICPADS’2006, July , Minneapolis (USA)
Franck Petit INRIA, LIP Lab. Univ. / ENS of Lyon France Optimal Probabilistic Ring Exploration by Semi-Synchronous Oblivious Robots Joint work with Stéphane.
1 Self-stabilizing Algorithms and Frequency Assignment Problems.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
Brief Announcement: Sorting on Skip Chains Ajoy K. Datta, Lawrence L. Larmore, and Stéphane Devismes.
Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa.
Fault Tolerance Computer Programs that Can Fix Themselves! Prof’s Paul Sivilotti and Tim Long Dept. of Computer & Info. Science The Ohio State University.
A Self-Stabilizing O(n)-Round k-Clustering Algorithm Stéphane Devismes, VERIMAG.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
Self Stabilizing Smoothing and Counting Maurice Herlihy, Brown University Srikanta Tirthapura, Iowa State University.
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Weak vs. Self vs. Probabilistic Stabilization Stéphane Devismes (CNRS, LRI, France) Sébastien Tixeuil (LIP6-CNRS & INRIA, France) Masafumi Yamashita (Kyushu.
Fault Management in Mobile Ad-Hoc Networks by Tridib Mukherjee.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
CS 542: Topics in Distributed Systems Self-Stabilization.
Sorting on Skip Chains Ajoy K. Datta, Lawrence L. Larmore, and Stéphane Devismes.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Self-stabilization. Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward error recovery). Guarantees.
Self-Stabilizing Algorithm with Safe Convergence building an (f,g)-Alliance Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Hwajung Lee.  Technique for spontaneous healing.  Forward error recovery.  Guarantees eventual safety following failures. Feasibility demonstrated.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
ITEC452 Distributed Computing Lecture 15 Self-stabilization Hwajung Lee.
Self-stabilizing (f,g)-Alliances with Safe Convergence Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Snap-Stabilizing Depth-First Search on Arbitrary Networks Alain Cournier, Stéphane Devismes, Franck Petit, and Vincent Villain OPODIS 2004, December
Around Self-Stabilization Part 2: Strengthened Forms of Self-Stabilization Stéphane Devismes Post-Doc CNRS at the LRI (Paris VII)
CSE-591: Term Project Self-stabilizing Network Algorithms by Tridib Mukherjee ASU ID :
Computer Science 425/ECE 428/CSE 424 Distributed Systems (Fall 2009) Lecture 20 Self-Stabilization Reading: Chapter from Prof. Gosh’s book Klara Nahrstedt.
Self-stabilizing Overlay Networks Sukumar Ghosh University of Iowa Work in progress. Jointly with Andrew Berns and Sriram Pemmaraju (Talk at Michigan Technological.
第1部: 自己安定の緩和 すてふぁん どぅゔぃむ ポスドク パリ第11大学 LRI CNRS あどばいざ: せばすちゃ てぃくそい
New Variants of Self-Stabilization
Parallel and Distributed Algorithms
CS60002: Distributed Systems
A Snap-Stabilizing DFS with a Lower Space Requirement
Robust Stabilizing Leader Election
Introduction to Self-Stabilization
Snap-Stabilization in Message-Passing Systems
Presentation transcript:

Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil

Join Work ICDCN, 04/01/2013, Mumbia Ajoy K. Datta & Lawrence L. Larmore Sébastien Tixeuil

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia A fault = a process state corruption

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia Recover after any number of transient faults

Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in anonymous networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia

Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in Anonymous Networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system Self-Stabilization: Ω(D) rounds ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia

When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia

Fault-Containment Pros –Self-stabilizing –If f ≤ k faults, stabilization time in O(f) rounds –Containment radius –Fault gap is small Cons (currently) –k=1, or –Surrounded by a majority of correct processes, or –Synchronous setting, or – Probabilistic recovery ICDCN, 04/01/2013, Mumbia

Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate ≥ Fault gap O(f)O(f)

Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate < fault gap >Ω(D)

Time-Adaptive Self-stabilization Self-Stabilization If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), –“output” stabilization in O(f) rounds ICDCN, 04/01/2013, Mumbia

Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults

Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults The fault gap depends on global parameters

k-Stabilization (first definition) ICDCN, 04/01/2013, Mumbia If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous, the system eventually recovers Otherwise no guarantee

k-Stabilization (first definition) Pros –Can solve more problems than self-stabilization –Usually, only-k-dependent stabilization time –Usually, only-k-dependent fault gap Cons –Not self-stabilizing –Static faults: f ≤ k faults should occur in a single transition ICDCN, 04/01/2013, Mumbia

Our definition of k-stabilization Faulty transition = one process state corruption Dynamic faults: –if f ≤ k faulty transitions occur in an arbitrary manner The system eventually recovers ICDCN, 04/01/2013, Mumbia

Our definition of k-stabilization ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate 1 fault f ≤ k faults

Our contribution Leader recovery protocol –On an anonymous (yet oriented) ring –Asynchronous atomic read/write –k-stabilizing if n ≥ 18k + 1 –Stabilization time O(k 2 ) rounds –Log(k) bits per process –This problem is unsolvable in self-stabilizing setting ICDCN, 04/01/2013, Mumbia

Our contribution ICDCN, 04/01/2013, Mumbia The system stars in a legitimate configuration where one process is elected

Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner

Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation

Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation

Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds

Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds

Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds

Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds

Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds

Fault gap ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate f ≤ k faulty transition f ≤ k faulty transitions 0 0 O(k 2 ) rounds

Main ideas of the algorithm ICDCN, 04/01/2013, Mumbia

Vote = Relative Address ∈ {- 3k..3k} ∪ { ⊥ } ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥ 3k3k Interval of relevance: 6+1 votes

After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥

After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥

After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ ⊥ At most 3k processes change their votes

After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ ⊥ At most 3k processes change their votes Always a majority of votes for the previous leader

Rumors ICDCN, 04/01/2013, Mumbia 1 1 Vote Rumor In a legitimate state, Vote = Rumor, for all process Main idea: Vote: hard to change Rumor: easy to change

Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor If Rumor ≠ Vote If Rumor ≠ ⊥ Candidate ← Rumor Else Candidate ← Vote Initiate Query(Candidate)

Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and Count the votes for the candidate

Query Return If at least 3k+1 votes for the Candidate –If Rumor ≠ ⊥ ≠ Candidate Initiate a Denial of rumor in its interval of relevance –Vote←Candidate –Rumor←Candidate Else –If Rumor = Candidate, then Rumor← ⊥ –Initiate a Denial of Candidate in its interval of relevance –If Vote = Candidate, then Vote← ⊥ ICDCN, 04/01/2013, Mumbia

Query Tracks ICDCN, 04/01/2013, Mumbia

Other tracks Denial (to kill a rumor) To manage lost queries –Probe wave –Report (see the paper) ICDCN, 04/01/2013, Mumbia

Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers ICDCN, 04/01/2013, Mumbia

Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query ICDCN, 04/01/2013, Mumbia

Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query ICDCN, 04/01/2013, Mumbia

Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries ICDCN, 04/01/2013, Mumbia

Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries So, n > n/2+9k, that is n ≥ 18k + 1 ICDCN, 04/01/2013, Mumbia

Conclusion Less restrictive definition of k-stabilization Using this definition, we solve a problem having no self-stabilizing solution: –Leader recovery protocol On an anonymous (yet oriented) ring Only-k-dependent complexity: –Stabilization time O(k 2 ) rounds –Log(k) bits per process ICDCN, 04/01/2013, Mumbia

Thank You! ICDCN, 04/01/2013, Mumbia