UBE529 Distributed Algorithms Self Stabilization.

Slides:



Advertisements
Similar presentations
Chapter 11 Trees Graphs III (Trees, MSTs) Reading: Epp Chp 11.5, 11.6.
Advertisements

Chapter 13 Leader Election. Breaking the symmetry in system Similar to distributed mutual exclusion problems, the first process to enter the CS can be.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Self-Stabilization in Distributed Systems Barath Raghavan Vikas Motwani Debashis Panigrahi.
Lecture 7: Synchronous Network Algorithms
Chapter 18 Self-Stabilization. Introduction Self-stabilization: Tolerate ‘data faults’  Example: Parent pointers in a spanning tree getting corrupted.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Lecture 4: Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
CPSC 668Set 7: Mutual Exclusion with Read/Write Variables1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Chapter 8 - Self-Stabilizing Computing1 Chapter 8 – Self-Stabilizing Computing Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Chapter 4 - Self-Stabilizing Algorithms for Model Conservation4-1 Chapter 4: roadmap 4.1 Token Passing: Converting a Central Daemon to read/write 4.2 Data-Link.
LSRP: Local Stabilization in Shortest Path Routing Hongwei Zhang and Anish Arora Presented by Aviv Zohar.
CS 536 Spring Global Optimizations Lecture 23.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Concurrency in Distributed Systems: Mutual exclusion.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
Chapter 11 Detecting Termination and Deadlocks. Motivation – Diffusing computation Started by a special process, the environment environment sends messages.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Chapter 14 Synchronizers. Synchronizers : Introduction Simulate a synchronous network over an asynchronous underlying network Possible in the absence.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Selected topics in distributed computing Shmuel Zaks
1 A Mutual Exclusion Algorithm for Ad Hoc Mobile networks Presentation by Sanjeev Verma For COEN th Nov, 2003 J. E. Walter, J. L. Welch and N. Vaidya.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
Defining Programs, Specifications, fault-tolerance, etc.
THIRD PART Algorithms for Concurrent Distributed Systems: The Mutual Exclusion problem.
By J. Burns and J. Pachl Based on a presentation by Irina Shapira and Julia Mosin Uniform Self-Stabilization 1 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5.
The Complexity of Distributed Algorithms. Common measures Space complexity How much space is needed per process to run an algorithm? (measured in terms.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
6.852: Distributed Algorithms Spring, 2008 Class 25-1.
Autonomic distributed systems. 2 Think about this Human population x10 9 computer population.
Stabilization Presented by Xiaozhou David Zhu. Contents What-is Motivation 3 Definitions An Example Refinements Reference.
University of Iowa1 Self-stabilization. The University of Iowa2 Man vs. machine: fact 1 An average household in the developed countries has 50+ processors.
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
Self-stabilization. What is Self-stabilization? Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
CS 542: Topics in Distributed Systems Self-Stabilization.
Internal and External Sorting External Searching
Self-stabilizing energy-efficient multicast for MANETs.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Self-stabilization. Technique for spontaneous healing after transient failure or perturbation. Non-masking tolerance (Forward error recovery). Guarantees.
Hwajung Lee.  Technique for spontaneous healing.  Forward error recovery.  Guarantees eventual safety following failures. Feasibility demonstrated.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
ITEC452 Distributed Computing Lecture 15 Self-stabilization Hwajung Lee.
1 Chapter 11 Global Properties (Distributed Termination)
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Self-Stabilizing Systems
Model and complexity Many measures Space complexity Time complexity
Self-stabilization.
Enumerating Distances Using Spanners of Bounded Degree
CS60002: Distributed Systems
Lecture 8: Synchronous Network Algorithms
Abstraction.
Presentation transcript:

UBE529 Distributed Algorithms Self Stabilization

2 Self-Stabilization Formalizing the notion of self-stabilization A toy problem: “Rotating Privilege on A Ring” n The very first self-stabilization algorithm n More for theoretical interest A practical problem: “Self-Stabilizing Spanning Tree Construction” n Very useful in multicast (BitTorrent-style data streaming)

3 Introduction Self-stabilization: Tolerate ‘data faults’ n Example: Parent pointers in a spanning tree getting corrupted Assume that the code does not get corrupted System state: legal or illegal Faults may result in an illegal system state Self-Stabilizing system: Irrespective of the initial state always reaches a legal state in finite time

4 parent value of these two nodes no longer valid Motivation for Self-Stabilization (Motivation on the book is not very practical) Distributed systems can get into illegal state due to n Topology changes n Failures / reboots n Malicious processes n Generally called “faults” A multicast tree: Each node records who its parent and children are

5 Motivation for Self-Stabilization Distributed systems can get into illegal state due to n Topology changes n Failures / reboots n Malicious processes n Generally called “faults” Mobile ad hoc networks: Maintaining the shortest route back to sink A A should now have an improved route back to sink

6 Defining Self-Stabilization The state (i.e., data state in all processes) of a distributed system is either legal or illegal n Definition based on application semantics n The code on each process is assumed to be correct all the time A distributed algorithm is self-stabilizing if n Starting from any (legal or illegal) state, the protocol will eventually reach a legal state if there are no more faults n Once the system is in a legal state, it will only transit to other legal states unless there are faults Intuitively, will always recover from faults and once recovered, will stay recovered forever Self-stabilizing algorithm typically runs in background and never stops

7 Mutual Exclusion Legal state: Exactly one machine in the system is ‘privileged’ Assume there are N machines 0 … N-1 Each machine is a K-State machine n Label the possible states from the set {0…K-1} There is one special machine called the bottom machine L, S, R = States of left machine, self, right machine respectively

8 Algorithm Bottom: Privileged if L=S Other machines: Privileged if L  S

9 Algorithm: A move by bottom machine

10 Algorithm: A move by a normal machine

11 Another Example

12 Implementation Each process needs to query its left neighbor Instead of periodic queries use a TOKEN for message efficiency What if the token gets lost ? n Bottom machine maintains a timer n If it does not receive a token for a long time it regenerates the token n Multiple tokens do not affect the correctness of the algorithm

13 //Program for the bottom node public class StableBottom extends Process implements Lock { int myState = 0; int leftState = 0; int next; Timer t = new Timer(); boolean tokenSent = false; public StableBottom(Linker initComm) { super(initComm); next = (myId + 1) % N; } public synchronized void initiate() { t.schedule(new RestartTask(this), 1000, 1000); } public synchronized void requestCS() { while (leftState != myState) myWait(); } public synchronized void releaseCS() { myState = (leftState + 1) % N; } public synchronized void sendToken() { if (!tokenSent) { sendMsg(next, "token", myState); tokenSent = true; } else tokenSent = false; } public synchronized void handleMsg(Message m, int src, String tag) { if (tag.equals("token") ) { leftState = m.getMessageInt(); notify(); Util.mySleep(1000); sendMsg(next, "token", myState); tokenSent = true; } else if (tag.equals("restart") ) sendToken() }

14 //Program for a normal node public class StableNormal extends Process implements Lock { int myState = 0; int leftState = 0; public StableNormal(Linker initComm) { super(initComm); } public synchronized void requestCS() { while (leftState == myState) myWait(); } public synchronized void releaseCS() { myState = leftState; sendToken(); } public synchronized void sendToken() { int next = (myId + 1) % N; sendMsg(next, "token", myState); } public synchronized void handleMsg(Message m, int src, String tag) { if (tag.equals("token")) { leftState = m.getMessageInt(); notify(); Util.mySleep(1000); sendToken(); }

15 Diskstra’s 2nd Algorithm for Mutual Exclusion Bottom : if (B + 1 = R) then B := B + 2 ; Normal : if (L=S+1) or (R=S+1) then S := S+1; Top : if (L=B) and (T!=B+1) then T :=B+1; 3 states per machine {0,1,2} An array of processors

16 Second Alg. Example

17 Use of First Alg : The Rotating Privilege Problem A ring of n processes, each process can only communicate with neighbors There is a privilege in the system n At any time, only one node may have the privilege (you can think of this as a token) n The node with the privilege may for example, have exclusive access to some resource n The privilege needs to “rotate” among the nodes so that each node has a chance

18 The Rotating Privilege Algorithm Each process i has a local integer variable V_i n 0  V_i  k where k is some constant no smaller than n Example: n = 5 and k = 12

Red process’s action: Retrieve value L of my clockwise neighbor; Let V be my value; if (L == V) { // I have the privilege // complete whatever I want to do; V = (V+1) % k; } Each process executes each action repeatedly – we will assume each action happens instantaneously (for this algorithm only) Green process’s action: Retrieve value L of my clockwise neighbor; Let V be my value; if (L != V) { // I have the privilege // complete whatever I want to do; V = L; }

21 What’s Interesting about the Algorithm This problem is mainly for theoretical interests What is interesting about it: n Regardless of the initial values of the processes, eventually the system will get into a legal state and stay in legal states n Self-stabilizing!

22 Legal States We say that a process makes a “move” if it has the privilege and changes its value System in legal state if exactly one machine can make a move n Easy to prove that in any state, at least one machine can make move Lemma: The following are legal states and are the only legal state n All n values same OR n Only two different values forming two consecutive bands, and one band starts from the red process To prove these are the only legal states, consider the value V of the red process and the value L of its clockwise neighbor n Case I: V=L n Case II: V  L

23 Legal States  Legal States Theorem: If the system is in a legal state, then it will stay in legal states n Our assumption on instantaneous actions will simplify this proof n We can consider actions one by one

24 Illegal States  Legal States Lemma: Let P be a green process, and let Q be P’s clockwise neighbor. If Q makes i moves, then P can make at most i+1 move. Lemma: Let Q be the red process. If Q makes i moves, then system-wide there can be at most the following number of moves: Lemma: Let Q be the red process, and consider a sequence of n^2 moves in the system. Q makes at least one move in the sequence

25 Illegal States  Legal States Lemma 1: Regardless of the starting state, the system eventually reach a state T where the red process has a different value from all other process (though the system may not stay in such states) n Proof: Let Q be the red process. If in the starting state Q has the same value as some other process, then there must be an integer j (0  j  k-1) that is not the value of any process. Q will eventually take j as its value. n (It takes Q any most n moves to do so.)

26 Illegal States  Legal States Lemma 2: If the system is in a state T where the red process has a different value from all other process, then the system will eventually each a state where all processes have the same value (though the system make not stay in such states) Theorem: Regardless of the initial states of the system, the system will eventually reach a legal state. n Proof: From Lemma 1 and Lemma 2.

27 Self-stabilizing Dominating Partition (Hedetniemi) R1 : if x(i) = 0  (  j E N(i)) (x(j) = 0) then x(i) = 1 R2 : if x(i) = 1  (  j E N(i)) (x(j) = 1) then x(i) = 0

28 Hedetniemi Example All transformations are by R1

29 Hedetniemi MIS Algorithm R1 : if s(i) = 0  (  j E N(i)) (s(j) = 0) then s(i) = 1 R2 : if s(i) = 1  (There exists j E N(i)) (s(j) = 1) then s(i) = 0

30 Self Stabilizing Spanning Tree Algorithm Given n processes connected by an undirected graph and one special process P1, construct a spanning tree rooted at P1 n Not all processes can communicated with all processes directly A very useful / practical algorithm n Can also be used to compute shortest path P1

31 Self Stabilizing Spanning Tree Algorithm Each process maintains two variables n parent: Who my parent is n dist: My distance to root Runs in the background n parent and dist are continuously updated At any given point of time, the values of the two variables can be wrong n Due to “faults” such as topology change resulted from node movement P1

32 Self Stabilizing Spanning Tree Algorithm (3, P5) (9, P3) P1 P2 P3 P4 P5 P6 P7 P8 (8, P4) (6, P7) (5, P7) (1, P1) (2, P3) Red values are initially incorrect values; Green values are values that have become correct (0, P8) On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed)

33 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (9, P3) P1 P2 P3 P4 P5 P6 P7 P8 (8, P4) (6, P7) (5, P7) (1, P1) (2, P3) Red values are initially incorrect values; Green values are values that have become correct (0, P8)

34 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (1, P1) P1 P2 P3 P4 P5 P6 P7 P8 (1, P1) (6, P7) (5, P7) (1, P1) Red values are initially incorrect values; Green values are values that have become correct (0, P8)

35 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (1, P1) P1 P2 P3 P4 P5 P6 P7 P8 (1, P1) (2, P6) (1, P7) (1, P1) Red values are initially incorrect values; Green values are values that have become correct (0, P8)

36 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (1, P1) P1 P2 P3 P4 P5 P6 P7 P8 (1, P1) (2, P6) (1, P7) (1, P1) Red values are initially incorrect values; Green values are values that have become correct (2, P4)

37 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (1, P1) P1 P2 P3 P4 P5 P6 P7 P8 (1, P1) (2, P6) (2, P5) (1, P1) Red values are initially incorrect values; Green values are values that have become correct (2, P4)

38 Self Stabilizing Spanning Tree Algorithm On P1 (executed periodically): n dist = 0; parent = -1; On all other processes (executed periodically): n Retrieve dist from all neighbors n Set my own dist = 1 + (the smallest dist received) n Set my own parent = my neighbor with the smallest dist (tie break if needed) (0, -1) (1, P1) P1 P2 P3 P4 P5 P6 P7 P8 (1, P1) (2, P6) (2, P5) (3, P4) (1, P1) Red values are initially incorrect values; Green values are values that have become correct (3, P4)

39 Self-stabilizing spanning tree Maintain a spanning tree rooted at the ‘root’ node A data fault may corrupt the ‘parent’ pointer at any node Recalculate parent pointers regularly

40 Algorithm dist maintains the distance of a node from the root

41 Algorithm The root periodically sets parent to -1(null) and dist to 0 A non-root reads dist from all neighbors and points its parent to the node with the least distance from the root

42 Correctness Proof Define a phase to be the minimum time period where each process has executed its code at least once (called “has taken an action”) n Some process may execute its code more than once n The definition of a phase here is different from a round in synchronous systems ! Let A_i to be the length of the shortest path from process i to the root, let dist_i to be the value of dist on process i n dist_i is not allowed to be negative

43 Correctness Proof Lemma: At the end of phase 1, dist_1 = 0 and dist_i  1 for any i  2 Lemma: At the end of phase 2, n Any process i whose A_i = 0, we have dist_i = 0; n Any process i whose A_i = 1, we have dist_i = 1; n Any process i whose A_i  2, we have dist_i  2;

44 Correctness Proof Lemma: At the end of phase r, n Any process i whose A_i  r-1, we have dist_i = A_i; n Any process i whose A_i  r, we have dist_i  r; Prove by induction: assume the lemma holds at phase r, now consider phase r+1, we need to prove n Any process i whose A_i  r-1, we have dist_i = A_i; n Any process i whose A_i = r, we have dist_i = A_i; n Any process i whose A_i  r+1, we have dist_i  r+1;

45 Correctness Proof Consider all t actions taken during phase r+1 n We will use an induction on t This proof is tricky if this is your first self-stabilization proof n A process may take multiple actions in a phase ! n Processes may take actions in parallel – cannot assume a serialization of all actions ! The proof technique is typical for proving self-stabilization n Step 1: Prove that the t actions will not roll back what is already achieved so far (no backward move) n Step 2: Prove that at some point, each node will achieve more (forward move) n Step 3: Prove that the t actions will not roll back the effects of the forward move (no backward move after the forward move)

46 A_i  r - 1 A_i = r A_i  r - 1 …… for nodes with already know: phase r want to show: phase r+1 A_i  r-1 dist_i = A_i A_i = r dist_i  r dist_i = A_i A_i  r+1dist_i  rdist_i  r+1 Step 1: The t actions will not change the green conditions Proof: Induction on t and consider action (t+1) by some process. (Cannot assume action (t+1) happens after the t actions.) Regardless of what values the process draws from its neighbors, the action will not end up violating the condition.

47 A_i  r - 1 A_i = r A_i  r - 1 …… for nodes with already know: phase r want to show: phase r+1 A_i  r-1 dist_i = A_i A_i = r dist_i  r dist_i = A_i A_i  r+1dist_i  rdist_i  r+1 Step 1: The t actions will not change the green conditions satisfied at the beginning of phase r Proof (continued): True because a level A_i process only have neighbors from level A_i – 1, A_i, and A_i + 1.

48 A_i  r - 1 A_i = r A_i  r - 1 …… for nodes with already know: phase r want to show: phase r+1 A_i  r-1 dist_i = A_i A_i = r dist_i  r dist_i = A_i A_i  r+1dist_i  rdist_i  r+1 Step 2: For each process, at some point during phase r+1, it will satisfy the red conditions Proof: By definition of a phase, each process will take at least one action during phase r+1

49 A_i  r - 1 A_i = r A_i  r - 1 …… for nodes with already know: phase r want to show: phase r+1 A_i  r-1 dist_i = A_i A_i = r dist_i  r dist_i = A_i A_i  r+1dist_i  rdist_i  r+1 Step 3: For each process, after it first satisfies the red condition, it will continue to satisfy the red condition for the remainder of the phase Proof: Trivial – but do need to enumerate three cases

50 Correctness Proof Theorem: After H rounds, A_i = dist_i on all processes n H being the length of the shortest path from the most far away process to the root Theorem: After H rounds, the dist and parent values on all processes are correct n Proof: Each process has a single parent pointer except the root. So the graph has n nodes and n-1 edges. Each process has a path to the root, thus the graph is connected.

51 Acknowledgements This part is heavily dependent on the course : CS4231 Parallel and Distributed Algorithms, NUS by Dr. Haifeng Yu and Vijay Hargh Elements of Distributed Computing Book.