Presentation is loading. Please wait.

Presentation is loading. Please wait.

Autonomic distributed systems. 2 Think about this Human population 1980199020002010 5 4 6 7 x10 9 computer population.

Similar presentations


Presentation on theme: "Autonomic distributed systems. 2 Think about this Human population 1980199020002010 5 4 6 7 x10 9 computer population."— Presentation transcript:

1 Autonomic distributed systems

2 2 Think about this Human population 1980199020002010 5 4 6 7 x10 9 computer population

3 3 Think about this Machines will fail from time to time, regardless of how carefully they are designed. But who will manage these systems? Even if everyone joins IT, it is not enough! Isn’t this a crisis? Systems have to take care of themselves. Self-help is the best help.

4 4 What does it mean? These are many such desirable self-- properties that be added to the Wish list. These properties collectively called self-* properties characterize an Autonomic System. Self-help Self-healing Self-organizing Self-optimizing Self-protecting Self-managing Self-stabilizing

5 5 Self-healing The Spirit Mars rover has a radiation-hardened R6000 CPU from Lockheed-Martin Federal Systems. One day, while performing a crucial task, Spirit Mars Rover fell silent, alone on the emptiness of Mars. What next? Courtesy: Jet Propulsion Lab

6 6 Self-healing The problem was eventually remotely detected by ground control. The operating system tried to allocate more files than the RAM-based directory structure could accommodate. It caused an exception that suspended the task that attempted the allocation. NASA ground control deleted some files, and reformatted the entire flash memory system. On February 6, 2004 the rover was restored to its original working condition, and science activities resumed. It would have been nice if the detection and repair could be done by the rover itself … Courtesy: Jet Propulsion Lab

7 Self-stabilization Technique for spontaneous restoration of a system predicate. Forward error recovery (memoryless) -- does not bother about the impact of the failure as long as the recovery is guaranteed. Guarantees eventual safety following failures. Feasibility demonstrated by Dijkstra (CACM 1974)

8 Self-stabilizing systems Starting from any initial configuration, the system is guaranteed to recover to a legitimate configuration (L is true) in a bounded number of steps, as long as the codes are not corrupted.

9 Self-stabilizing systems Transient failures perturb the global state. The ability to spontaneously recover from any initial state implies that no initialization is ever required. State space legal

10 Self-stabilizing systems Self-stabilizing systems exhibits non-masking fault-tolerance. It satisfies the following two criteria fault 1.Convergence 2.Closure Not L L convergence closure

11 Adaptive Distributed Systems System behavior spontaneously changes when the environment changes A traffic control system AM / PM AM  L AM holds PM  L PM holds L = (AM  L AM )  (PM  L PM ) defines the system invariant

12 Example 1: Stabilizing mutual exclusion 0 1 62 4 7 53 N-1 Consider a unidirectional ring of processes. In the legal configuration, exactly one token will circulate in the network

13 A solution 14320 {Process 0} repeat x[0] = x[N-1]  x[0] := x[0]  N 1 forever {Process j > 0} repeat x[j] ≠ x[j -1]  x[j] := x[j-1] forever The state of process j is x[j]  {0, 1, 2, K-1}, and N > K TOKEN = ENABLED GUARD Guard or condition action 0n

14 Does it work? First, be convinced that it works. Then think about why it will work.

15 Example 2: Stabilizing spanning tree Given a connected graph G = (V,E) and a root r, design an algorithm for maintaining a spanning tree in presence of transient failures that may corrupt the local states of processes. Let n = |V|

16 A solution Each process i has two variables L(i) and P(i): L(i) = Distance from the root via tree edges P(i) = parent of process i By definition L(r) = 0, and P(r) is undefined. In a legal state  i  V | i ≠ r : L(i) ≠ n  L(i) = L(P(i)) +1.

17 Sample case 0 1 2 5 4 3 0 1 2 5 4 3 1 2 3 4 5 P(2) is corrupted

18 The algorithm (R0) (L(i) ≠ n)  (L(i) ≠ L(P(i)) +1)  (L(P) ≠ n)  L(i) :=L(P(i)) +1 (R1) (L(i)  n)  (L(P(i)) =n)  L(i):=n (R2) (L(i) =n)  (  k  Neighbors(i):L(k) < n-1)  L(i) :=L(k)+1; P(i):=k The algorithm has three rules R0, R1, R2:

19 Proof of stabilization Define an edge from i to P(i) to be well-formed, when L(i) ≠ n, L(P(i) ≠ n and L(i) = L(P(i)) +1. In any configuration, the well-formed edges form a spanning forest. Delete all edges that are not well-formed. Designate each tree T(k) in the forest by the lowest value of L in it.

20 Example In the sample graph shown earlier.T(0) = {0, 1, T(2) = {2, 3, 4, 5} Let F(k) denote the number of T(k)’s in the forest. Define a tuple F = (F(0), F(1), F(2) …, F(n)). For the sample graph, F = (1, 0, 1, 0, 0, 0) after node 2 had the transient failure that changed P(2) from 2 to 4.

21 Skeleton of the proof Minimum F = (1,0,0,0,0,0) {legal configuration} Maximum F = (1, n-1, 0, 0, 0, 0). With each action, F decreases lexicographically. Verify the claim! This proves that eventually F becomes (1,0,0,0,0,0) and the spanning tree stabilizes.


Download ppt "Autonomic distributed systems. 2 Think about this Human population 1980199020002010 5 4 6 7 x10 9 computer population."

Similar presentations


Ads by Google