Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil
Join Work ICDCN, 04/01/2013, Mumbia Ajoy K. Datta & Lawrence L. Larmore Sébastien Tixeuil
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia A fault = a process state corruption
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia Recover after any number of transient faults
Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in anonymous networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in Anonymous Networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
Fault-Containment Pros –Self-stabilizing –If f ≤ k faults, stabilization time in O(f) rounds –Containment radius –Fault gap is small Cons (currently) –k=1, or –Surrounded by a majority of correct processes, or –Synchronous setting, or – Probabilistic recovery ICDCN, 04/01/2013, Mumbia
Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate ≥ Fault gap O(f)O(f)
Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate < fault gap >Ω(D)
Time-Adaptive Self-stabilization Self-Stabilization If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), –“output” stabilization in O(f) rounds ICDCN, 04/01/2013, Mumbia
Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults
Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults The fault gap depends on global parameters
k-Stabilization (first definition) ICDCN, 04/01/2013, Mumbia If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous, the system eventually recovers Otherwise no guarantee
k-Stabilization (first definition) Pros –Can solve more problems than self-stabilization –Usually, only-k-dependent stabilization time –Usually, only-k-dependent fault gap Cons –Not self-stabilizing –Static faults: f ≤ k faults should occur in a single transition ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization Faulty transition = one process state corruption Dynamic faults: –if f ≤ k faulty transitions occur in an arbitrary manner The system eventually recovers ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate 1 fault f ≤ k faults
Our contribution Leader recovery protocol –On an anonymous (yet oriented) ring –Asynchronous atomic read/write –k-stabilizing if n ≥ 18k + 1 –Stabilization time O(k 2 ) rounds –Log(k) bits per process –This problem is unsolvable in self-stabilizing setting ICDCN, 04/01/2013, Mumbia
Our contribution ICDCN, 04/01/2013, Mumbia The system stars in a legitimate configuration where one process is elected
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
Fault gap ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate f ≤ k faulty transition f ≤ k faulty transitions 0 0 O(k 2 ) rounds
Main ideas of the algorithm ICDCN, 04/01/2013, Mumbia
Vote = Relative Address ∈ {- 3k..3k} ∪ { ⊥ } ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥ 3k3k Interval of relevance: 6+1 votes
After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥
After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ ⊥
After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ ⊥ At most 3k processes change their votes
After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ ⊥ At most 3k processes change their votes Always a majority of votes for the previous leader
Rumors ICDCN, 04/01/2013, Mumbia 1 1 Vote Rumor In a legitimate state, Vote = Rumor, for all process Main idea: Vote: hard to change Rumor: easy to change
Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor If Rumor ≠ Vote If Rumor ≠ ⊥ Candidate ← Rumor Else Candidate ← Vote Initiate Query(Candidate)
Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and Count the votes for the candidate
Query Return If at least 3k+1 votes for the Candidate –If Rumor ≠ ⊥ ≠ Candidate Initiate a Denial of rumor in its interval of relevance –Vote←Candidate –Rumor←Candidate Else –If Rumor = Candidate, then Rumor← ⊥ –Initiate a Denial of Candidate in its interval of relevance –If Vote = Candidate, then Vote← ⊥ ICDCN, 04/01/2013, Mumbia
Query Tracks ICDCN, 04/01/2013, Mumbia
Other tracks Denial (to kill a rumor) To manage lost queries –Probe wave –Report (see the paper) ICDCN, 04/01/2013, Mumbia
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers ICDCN, 04/01/2013, Mumbia
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query ICDCN, 04/01/2013, Mumbia
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query ICDCN, 04/01/2013, Mumbia
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries ICDCN, 04/01/2013, Mumbia
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries So, n > n/2+9k, that is n ≥ 18k + 1 ICDCN, 04/01/2013, Mumbia
Conclusion Less restrictive definition of k-stabilization Using this definition, we solve a problem having no self-stabilizing solution: –Leader recovery protocol On an anonymous (yet oriented) ring Only-k-dependent complexity: –Stabilization time O(k 2 ) rounds –Log(k) bits per process ICDCN, 04/01/2013, Mumbia
Thank You! ICDCN, 04/01/2013, Mumbia