Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Self-stabilizing Distributed Systems Sukumar Ghosh Professor, Department of Computer Science University of Iowa.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Lecture 4: Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield.
Introduction to Self-Stabilization Stéphane Devismes.
Byzantine Generals Problem: Solution using signed messages.
Chapter 8 - Self-Stabilizing Computing1 Chapter 8 – Self-Stabilizing Computing Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Chapter 4 - Self-Stabilizing Algorithms for Model Conservation4-1 Chapter 4: roadmap 4.1 Token Passing: Converting a Central Daemon to read/write 4.2 Data-Link.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
LSRP: Local Stabilization in Shortest Path Routing Hongwei Zhang and Anish Arora Presented by Aviv Zohar.
Termination Detection Presented by: Yonni Edelist.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
LSRP: Local Stabilization in Shortest Path Routing Anish Arora Hongwei Zhang.
Chapter 4 - Self-Stabilizing Algorithms for Model Conversions iddistance 22 iddistance iddistance iddistance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
Chapter 4 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of October 2003 Shlomi Dolev, All Rights Reserved ©
Self Stabilization Classical Results and Beyond… Elad Schiller CTI (Grece)
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Chapter 3 - Motivating Self-Stabilization3-1 Chapter 3 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
Selected topics in distributed computing Shmuel Zaks
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa.
Snap-Stabilizing PIF and Useless Computations Alain Cournier, Stéphane Devismes, and Vincent Villain ICPADS’2006, July , Minneapolis (USA)
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa.
1 Chapter 10 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Dissecting Self-* Properties Andrew Berns & Sukumar Ghosh University of Iowa.
Issues concerning the interpretation of statistical significance tests.
6.852: Distributed Algorithms Spring, 2008 Class 25-1.
Self Stabilizing Smoothing and Counting Maurice Herlihy, Brown University Srikanta Tirthapura, Iowa State University.
Autonomic distributed systems. 2 Think about this Human population x10 9 computer population.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Fault Management in Mobile Ad-Hoc Networks by Tridib Mukherjee.
University of Iowa1 Self-stabilization. The University of Iowa2 Man vs. machine: fact 1 An average household in the developed countries has 50+ processors.
Hwajung Lee. Well, you need to capture the notions of atomicity, non-determinism, fairness etc. These concepts are not built into languages like JAVA,
Self-stabilization. What is Self-stabilization? Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward.
CS 542: Topics in Distributed Systems Self-Stabilization.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Self-stabilization. Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward error recovery). Guarantees.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Self-Stabilizing Algorithm with Safe Convergence building an (f,g)-Alliance Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Hwajung Lee.  Technique for spontaneous healing.  Forward error recovery.  Guarantees eventual safety following failures. Feasibility demonstrated.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
ITEC452 Distributed Computing Lecture 15 Self-stabilization Hwajung Lee.
Distributed Error- Confinement Shay Kutten (Technion) with Boaz Patt-Shamir (Tel Aviv U.) Yossi Azar (Tel Aviv U.)
Self-stabilizing (f,g)-Alliances with Safe Convergence Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
New Variants of Self-Stabilization
Student: Fang Hui Supervisor: Teo Yong Meng
CS60002: Distributed Systems
Presentation transcript:

Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All Rights Reserved ©

Chapter 7 - Local Stabilization2 Chapter 7: roadmap 7.1 Superstabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair

Chapter 7 - Local Stabilization3 Dynamic System Algorithms for dynamic systems are designed to cope with failures of processors with no global re-initialization. Such algorithms consider only global states reachable from a predefined initial state under a restrictive sequence of failures and attempt to cope with such failures with as few adjustments as possible. Self Stabilization Self-stabilizing algorithms are designed to guarantee a particular behavior finally. Traditionally, changes in the communications graph were ignored. Dynamic System & Self Stabilization Superstabilizing algorithms combine the benefits of both self-stabilizing and dynamic algorithms

Chapter 7 - Local Stabilization4 Definitions A Superstabilizing Algorithm: Must be self-stabilizing Must preserve a “passage predicate” Should exhibit fast convergence rate Passage Predicate - Defined with respect to a class of topology changes (A topology change falsifies legitimacy and therefore the passage predicate must be weaker than legitimacy but strong enough to be useful).

Chapter 7 - Local Stabilization5 Passage Predicate - Example In a token ring: A processor crash can lose the token but still not falsify the passage predicate Passage PredicateLegitimate State At most one token exists in the system. (e.g. the existence of 2 tokens isn’t legal) Exactly one token exists in the system.

Chapter 7 - Local Stabilization6 Evaluation of a Super-Stabilizing Algorithm a.Time complexity The maximal number of rounds that have passed from a legitimate state through a single topology change and ends in a legitimate state b.Adjustment measure The maximal number of processors that must change their local state upon a topology change, in order to achieve legitimacy

Chapter 7 - Local Stabilization7 Motivation for Super-Stabilization A self-stabilizing algorithm that does not ignore the occurrence of topology changes (“events”) will be initialized in a predefined way and react better to dynamic changes during execution Question: Is it possible, for the algorithm that detects a fault, when it occurs, to maintain a “nearly legitimate” state during convergence?

Chapter 7 - Local Stabilization8 Motivation for Super-Stabilization While transient faults are rare (but harmful), a dynamic change in the topology may be frequent. Thus, a super-stabilizing algorithm has a lower worst- case time measure for reaching a legitimate state again, once a topology change occurs. In the following slides we present a self-stabilizing and a super-stabilizing algorithm for the graph coloring task.

Chapter 7 - Local Stabilization9 Graph Coloring a.The coloring task is to assign a color value to each processor, such that no two neighboring processors are assigned the same color. b.Minimization of the colors number is not required. The algorithm uses Δ+1 colors, where Δ is an upper bound on a processor’s number of neighbors. c.For example:

Chapter 7 - Local Stabilization10 If P i has the color of one of its neighbors with a higher ID, it chooses another color and writes it. Graph Coloring - A Self-Stabilzing Algorithm 01Do forever 02GColors := 0 03For m:=1 to δ do 04lr m :=read(r m ) 05If ID(m)>i then 06 GColors := GColors U lr m.color 07od 08If color i GColors then 09 color i :=choose(\\ GColors) 10Write r i.color := color 11od Colors of P i ’s neighbors Gather only the colors of neighbors with greater ID than P i ’s.

Chapter 7 - Local Stabilization11 Id = 3 Id = 1 Id = 5 Id = 4Id = 2 Graph Coloring - Self-Stabilzing Algorithm - Simulation GColors = { Blue } GColors = {} Phase I

Chapter 7 - Local Stabilization12 Id = 3Id = 1 Id = 5 Id = 4Id = 2 GColors = { Green } GColors = {} GColors = { Blue, green, Red } GColors = { Blue, green } Phase II Graph Coloring - Self-Stabilzing Algorithm - Simulation

Chapter 7 - Local Stabilization13 Id = 3Id = 1 Id = 5 Id = 4Id = 2 Stabilized GColors = { Green } GColors = {} GColors = { Blue, green, Red } GColors = { Blue, green } Phase III Graph Coloring - Self-Stabilzing Algorithm - Simulation

Chapter 7 - Local Stabilization14 Graph Coloring - Self-Stabilizing Algorithm (continued) What happens when a change in the topology occurs ? If a new neighbor is added, it is possible that two processors have the same color. It is possible that during convergence every processor will change its color. Example: i=4 GColors {blue} i=5 GColors ø i=1 GColors {blue} i=2 GColors {red} i=3 GColors {red} i=2 GColors {blue} i=1 GColors {red} Stabilized But in what cost ?

Chapter 7 - Local Stabilization15 Graph Coloring – Super-Stabilizing Motivation a.Every processor changed its color but only one processor really needed to. b.If we could identify the topology change we could maintain the changes in its environment. c.We’ll add some elements to the algorithm: a.AColor – A variable that collects all of the processor neighbors’ colors. b.Interrupt section – Identify the problematic area. c. - A symbol to flag a non-existing color.

Chapter 7 - Local Stabilization16 01Do forever 02AColors :=  03GColors :=  04For m:=1 to δ do 05lr m :=read(r m ) 06AColors := AColors U lr m.color 07If ID(m)>i then GColors := GColors U lr m.color 08od 09If color i = ┴ or color i  GColors then 10 color i :=choose(\\ AColors) 11Write r i.color := color 12od 13Interrupt section 14If recover ij and j > i then 15Color i := ┴ 16Write r i.color := ┴ All of P i neighbors’ colors Graph Coloring – A Super-Stabilizing Algorithm Activated after a topology change to identify the critical processor recover i,j is the interrupt which P i gets upon a change in the communication between P i and P j

Chapter 7 - Local Stabilization17 Graph Coloring - Super-Stabilizing Algorithm - Example Notice that the new algorithm stabilizes faster than the previous one. Let us consider the previous example, this time using the super-stabilizing algorithm: i=4 GColors = {blue} AColors = {blue,red} Stabilized In O(1). Color 4 = r 4.color =

Chapter 7 - Local Stabilization18 Graph Coloring – Super-Stabilizing Proof Lemma 1: This algorithm is self-stabilizing. Proof by induction: a.After the first iteration: a.The value doesn’t exist in the system. b.P n has a fixed value. b. Assume that P k has a fixed value i<k<n. If P i has a P k neighbor then P i does not change to P k ’s color, but chooses a different color. Due to the assumptions we get that P i ’s color becomes fixed for 1≤i≤n, so the system stabilizes.

Chapter 7 - Local Stabilization19 Graph Coloring – Super-Stabilizing Passage Predicate – The color of a neighboring processor is always different in every execution that starts in a safe configuration, in which only a single topology change occurs before the next safe configuration is reached

Chapter 7 - Local Stabilization20 Graph Coloring – Super-Stabilizing Super-stabilizing Time – Number of cycles required to reach a safe configuration following a topology change. Super-stabilizing vs. Self-Stabilizing O(1) O(n)

Chapter 7 - Local Stabilization21 Graph Coloring – Super-Stabilizing Adjustment Measure – The number of processors that changes color upon a topology change. The super-stabilizing algorithm changes one processor color, the one which had the single topology change

Chapter 7 - Local Stabilization22 Chapter 7: roadmap 7.1 Superstabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair

Chapter 7 - Local Stabilization23 Self-Stabilizing Fault-Containing Algorithms  Fault model : Several transient faults in the system. This fault model is less severe than dynamic changes faults, and considers the case where f transient faults occurred, changing the state of f processors.  The goal of self-stabilizing fault containing algorithms : a)From any arbitrary configuration, a safe configuration is reached. b)Starting from a safe configuration followed by transient faults that corrupt the state of f processors, a safe configuration is reached within O(f) cycles.

Chapter 7 - Local Stabilization24 A Self-Stabilizing Algorithm for Fixed Output Tasks  Our Goal: to design a self-stabilizing fault- containing algorithm for fixed-output fixed-input tasks.  Fixed Input: the algorithm has a fixed input (like its fixed local topology), I i will contain the input for processor P i  Fixed Output: the variable O i will contain the output of processor P i, the output should not change over time.  A version of the update algorithm is a self stabilizing algorithm for any fixed-input fixed- output task.

Chapter 7 - Local Stabilization25 Fixed-output algorithm for Processor P i 1. upon a pulse 2. ReadSet i := Ø 3. forall P j  N(i) do 4. ReadSet i := ReadSet i  read(Processors j ) 5. ReadSet i := ReadSet i \\ 6. ReadSet i := ReadSet i ReadSet i := ReadSet i  { } 8. forall P j  processors(ReadSet i ) do 9. ReadSet i := ReadSet i \\ NotMinDist(P j, ReadSet i ) 10. write Processors i := ConPrefix(ReadSet i ) 11. write O i := ComputeOutput(Inputs(Processors i ))

Chapter 7 - Local Stabilization26 Explaining the Algorithm  The algorithm is a version of the self-stabilizing update algorithm, it has an extra I i variable in each tuple which contains the fixed input of the processor P i  Just like in the update algorithm, it is guaranteed that eventually each processor will have a tuple for all other processors  Each processor will have all the inputs and will compute the correct output.  But is this algorithm fault containing?

Chapter 7 - Local Stabilization27 Does this Algorithm have the Fault- containment Property?  Error Scenario (assuming output is OR of inputs): In a safe configuration P 5 has a tuple A fault has occurred and changed it to Error propagation :  Conclusion : It doesn’t have this property. The system stabilizes only after O(d/2) cycles cycle : 0 P1P1 P2P2 P3P3 P4P4 P5P5 I=0 O 1 =0 O 2 =0 O 3 =0O 4 =0 O 5 =0 1 O 4 =0 O 5 =1 2 O 4 =1 O 5 =0 3 O 4 =0 O 5 =1 4 O 5 =0

Chapter 7 - Local Stabilization28 Fault Containment – Naive Approach  A processor that is about to change its output waits for d cycles before it does so  This approach ensures : Stabilization from every arbitrary state Starting in a safe configuration followed by several simultaneous transient faults, only the output of the processors that experienced a fault can be changed and each such change is a change to correct output value.  During this interval correct input values propagate towards the faulty processor

Chapter 7 - Local Stabilization29 Fault Containment – Naive Approach (cont.)  The above approach ensures self-stabilization, but has a serious drawback : The time it takes for the correct input values to propagate to the faulty processors and correct its output is O(d), which contradicts the second requirement of self stabilizing fault-containment This requirement states that a safe configuration is reached within O(f) cycles  Lets consider a more sophisticated approach to meet all fault-containment requirements

Chapter 7 - Local Stabilization30 Designing a Fault-containing Algorithm  Evidence : The values of all the I i fields in Processors i. Each tuple will contain evidence in addition to its other 3 fields  The additional A i variable enables the processors that experienced a fault to learn quickly about the inputs of the remote processors  When a processor experiences a fault, the A i variables of most of the processors within distance 2f+1 or less from it are the set of correct inputs  We should maintain this as an invariant throughout the algorithm for a time sufficient to let the faulty processors regain the correct input values I i of the other processors

Chapter 7 - Local Stabilization31 Self-stabilizing Fault-containing Algorithm for Processor P i 1. upon a pulse 2. ReadSet i := Ø 3. forall P j  N(i) do 4. ReadSet i := ReadSet i  read(Processors j ) 5. if (RepairCounter i ≠ d + 1) then //in repair process 6. RepairCounter i := min(RepairCounter i, read(RepairCounter j )) 7. od 8. ReadSet i := ReadSet i \\ 9. ReadSet i := ReadSet i ReadSet i := ReadSet i  11. forall P j  processors(ReadSet i ) do 12. ReadSet i := ReadSet i \\ NotMinDist(P j, ReadSet i ) 13. write Processors i := ConPrefix(ReadSet i )

Chapter 7 - Local Stabilization32 Self-stabilizing Fault-containing Algorithm – (cont.) 14. if (RepairCounter i = d + 1) then //not in repair process 15. if (O i ≠ ComputeOutput i (Inputs(Processors i ))) or 16. (   Processors i | A ≠ Inputs(Processors i )) 17. then RepairCounter i := 0 //repair started 18. else //in repair process 19. RepairCounter i := min(RepairCounter i + 1, d + 1) 20. write O i := ComputeOutput i (RepairCounter i, 21. MajorityInputs(Processors i )) 22. if (RepairCounter i = d + 1) then //repair over 23. A i := Inputs(Processors i )

Chapter 7 - Local Stabilization33 Explaining the Algorithm  Initial state: RepairCounter i = d + 1. A change in the RepairCounter i variable occurs when P i detects an error in its state, or when an error from another processor propagates towards it during the repair process  ComputeOutput is applied on the majority of inputs of processors with distance ≤ RepairCounter i from P i, thus assuring that eventually the distance 2f+1 is reached and that O i is set correcly  The value of A i doesn’t change throughout the repair process, thus maintaining the invariant needed for the faulty processors to regain the correct input values and eventually present a correct output

Chapter 7 - Local Stabilization34 cycle : P1P1 P2P2 P3P3 P4P4 P5P5 I=0 O=0 0 P6P6 I=0 A 2 :i 1 = 0 A 4 :i 1 = 0 A 3 :i 1 = 0 A 4 :i 1 = 0 A 5 :i 1 = 0  Error scenario : Input of P 1 at (P 2 and P 4 ) erroneously alters to 1 Input of P 1 at (A 2 and A 4 ) evidences erroneously alters to 1 The erroneous input propagates to P 3 and so do the evidences, causing it to calculate erroneous output based on majority inputs at distance ≤ 1 The output is fixed in P 3 once the distance grows to ≤ 2. A 2 :i 1 = 0 A 1 :i 1 = 0 A 2 :i 1 = 1 A 4 :i 1 = 1 1 A 4 :i 1 = 1 2 A 2 :i 1 = 1 3 O=1 4 O=0

Chapter 7 - Local Stabilization35 Error Scenario  Initially the network graph is in a safe configuration  In the first cycle red processors experience several faults : The evidence considering the blue processor is erroneous. The distance to the blue processor and its input are also erroneous.  In the next cycles the error propagates throughout the graph  The output becomes erroneous at many processors, but convergence back to the safe state is quick

Chapter 7 - Local Stabilization36 Feel the Power (example) … … … Cycle: Regular Wrong evidence Source (blue) Wrong output Wrong evidence and output (repairCounter)

Chapter 7 - Local Stabilization37 Algorithm Analysis  A i of processor P i stays unchanged for (d+1) cycles since the repair process started, which ensures that the fault factor doesn’t grow  The majority-based output calculation is applied during the repair process After 2f+1 cycles the majority of inputs around the faulty processor is correct, applying correct output computation In that manner, after 2f+1 cycles the system’s output stabilizes with correct values, despite the continuing changes in processors’ internal states (for d+1 cycles)

Chapter 7 - Local Stabilization38 Conclusions  Compared to the naive implementation, this algorithm significantly shortens the system stabilization time  The price we pay for this improvement : Network load grows because each tuple includes an additional variable A i of size O(n) (n – number of processors). During the algorithm execution, faulty output is allowed for non-faulty processors during short periods of time

Chapter 7 - Local Stabilization39 Chapter 7: roadmap 7.1 Superstabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair