Download presentation
Presentation is loading. Please wait.
Published byPatrick Ward Modified over 9 years ago
1
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil
2
Join Work ICDCN, 04/01/2013, Mumbia Ajoy K. Datta & Lawrence L. Larmore Sébastien Tixeuil
3
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
4
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia A fault = a process state corruption
5
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
6
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
7
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
8
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
9
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
10
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia Recover after any number of transient faults
11
Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in anonymous networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
12
Price of the Versatility 1.Several impossibility results –E.g., Leader Election and Token Circulation in Anonymous Networks 2.The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
13
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds ICDCN, 04/01/2013, Mumbia
14
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
15
When a few number of faults hit the system Self-Stabilization: Ω(D) rounds Stronger forms: –Fault Containment [Ghosh et al, Dist Comp 2007] –k-adaptive Self-Stabilization [Burman et al, OPODIS’05] Weakened forms: –k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
16
Fault-Containment Pros –Self-stabilizing –If f ≤ k faults, stabilization time in O(f) rounds –Containment radius –Fault gap is small Cons (currently) –k=1, or –Surrounded by a majority of correct processes, or –Synchronous setting, or – Probabilistic recovery ICDCN, 04/01/2013, Mumbia
17
Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate ≥ Fault gap O(f)O(f)
18
Fault gap The minimum time between consecutive faulty transitions to have O(f) recovery time ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate < fault gap >Ω(D)
19
Time-Adaptive Self-stabilization Self-Stabilization If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), –“output” stabilization in O(f) rounds ICDCN, 04/01/2013, Mumbia
20
Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults
21
Output vs. State Stabilization ICDCN, 04/01/2013, Mumbia Legitimate Correct Output O(f)O(f) >Ω(D) Illegitimate f ≤ k faults The fault gap depends on global parameters
22
k-Stabilization (first definition) ICDCN, 04/01/2013, Mumbia If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous, the system eventually recovers Otherwise no guarantee
23
k-Stabilization (first definition) Pros –Can solve more problems than self-stabilization –Usually, only-k-dependent stabilization time –Usually, only-k-dependent fault gap Cons –Not self-stabilizing –Static faults: f ≤ k faults should occur in a single transition ICDCN, 04/01/2013, Mumbia
24
Our definition of k-stabilization Faulty transition = one process state corruption Dynamic faults: –if f ≤ k faulty transitions occur in an arbitrary manner The system eventually recovers ICDCN, 04/01/2013, Mumbia
25
Our definition of k-stabilization ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate 1 fault f ≤ k faults
26
Our contribution Leader recovery protocol –On an anonymous (yet oriented) ring –Asynchronous atomic read/write –k-stabilizing if n ≥ 18k + 1 –Stabilization time O(k 2 ) rounds –Log(k) bits per process –This problem is unsolvable in self-stabilizing setting ICDCN, 04/01/2013, Mumbia
27
Our contribution ICDCN, 04/01/2013, Mumbia The system stars in a legitimate configuration where one process is elected
28
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner
29
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation
30
Our contribution ICDCN, 04/01/2013, Mumbia Some faulty transitions occurs in an arbitrary manner Fault propagation
31
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
32
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
33
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
34
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
35
Our contribution ICDCN, 04/01/2013, Mumbia If n ≥ 18k + 1, the system recovers the same leader in O(k 2 ) rounds
36
Fault gap ICDCN, 04/01/2013, Mumbia Legitimate Illegitimate f ≤ k faulty transition f ≤ k faulty transitions 0 0 O(k 2 ) rounds
37
Main ideas of the algorithm ICDCN, 04/01/2013, Mumbia
38
Vote = Relative Address ∈ {- 3k..3k} ∪ { ⊥ } ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ 3 2 1 -1 -2-2 -3-3 ⊥ 3k3k Interval of relevance: 6+1 votes
39
After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ 3 2 1 -1 -2-2 -3-3 ⊥
40
After k faults ICDCN, 04/01/2013, Mumbia 0 ⊥ ⊥ 3 0 1 -1 -2-2 -3-3 ⊥
41
After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ 3 0 1 0 -2-2 -3-3 ⊥ At most 3k processes change their votes
42
After k faults ICDCN, 04/01/2013, Mumbia 1 ⊥ ⊥ 3 0 1 0 -2-2 -3-3 ⊥ At most 3k processes change their votes Always a majority of votes for the previous leader
43
Rumors ICDCN, 04/01/2013, Mumbia 1 1 Vote Rumor In a legitimate state, Vote = Rumor, for all process Main idea: Vote: hard to change Rumor: easy to change
44
Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor If Rumor ≠ Vote If Rumor ≠ ⊥ Candidate ← Rumor Else Candidate ← Vote Initiate Query(Candidate)
45
Rumors ICDCN, 04/01/2013, Mumbia 1 2 Vote Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and Count the votes for the candidate
46
Query Return If at least 3k+1 votes for the Candidate –If Rumor ≠ ⊥ ≠ Candidate Initiate a Denial of rumor in its interval of relevance –Vote←Candidate –Rumor←Candidate Else –If Rumor = Candidate, then Rumor← ⊥ –Initiate a Denial of Candidate in its interval of relevance –If Vote = Candidate, then Vote← ⊥ ICDCN, 04/01/2013, Mumbia
47
Query Tracks ICDCN, 04/01/2013, Mumbia
48
Other tracks Denial (to kill a rumor) To manage lost queries –Probe wave –Report (see the paper) ICDCN, 04/01/2013, Mumbia
49
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers ICDCN, 04/01/2013, Mumbia
50
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query ICDCN, 04/01/2013, Mumbia
51
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query ICDCN, 04/01/2013, Mumbia
52
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries ICDCN, 04/01/2013, Mumbia
53
Deadlock Prevention Each two neighboring processes share a resource –Think of chopstick between 2 philosophers Only a process that holds both its left and right resources can initiate a query So, at any time at most n/2 pending initiated query Now, we can have up to 9k rogue queries, i.e., non- initiated queries So, n > n/2+9k, that is n ≥ 18k + 1 ICDCN, 04/01/2013, Mumbia
54
Conclusion Less restrictive definition of k-stabilization Using this definition, we solve a problem having no self-stabilizing solution: –Leader recovery protocol On an anonymous (yet oriented) ring Only-k-dependent complexity: –Stabilization time O(k 2 ) rounds –Log(k) bits per process ICDCN, 04/01/2013, Mumbia
55
Thank You! ICDCN, 04/01/2013, Mumbia
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.