Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa
Preview Weak stabilization (Gouda 2001) guarantees reachability of the legal configuration from any configuration. closure of the legal configuration under system action Once “stable”, if there is a minor perturbation, no recovery guarantee exists under a weakly fair scheduler, let alone “efficient recovery”. We take a weakly stabilizing leader election algorithm, and add fault-containment to it.
Our contributions An exercise in adding fault-containment to a weakly stabilizing leader election algorithm on a line topology. Processes are anonymous. Expected recovery time from all single failures is O(1) Lim m ∞ (contamination number) is O(1) (precisely 4), where m is a tuning parameter (Contamination number = max. no. of non-faulty processes that change their states during recovery)
The big picture leader
Model and Notations Consider n processes in a line topology N(i) = neighbors of process i Variable P(i) = {N(i) U ⊥ } (parent of i) Macro C(i) = {q ∈ N(i): P(q) = i} (children of i) Predicate Leader(i) ≡ (P(i)= ⊥ ) Legal configuration: 1.For exactly one process i: P(i) = ⊥ 2. j ≠ i: P(j) = k P(k) ≠ j Node i P(i) C(i) Leader
Model and Notations Shared memory model and central scheduler Weak fairness of the scheduler Guarded action by a process: g A Computation is a sequence of (global) states and state transitions Node i P(i) C(i) Leader
Stabilization A stable (or legal) configuration satisfies a predicate LC defined in terms of the primary variables p that are observable by the application. However, fault-containment often needs the use of secondary variables (a.k.a auxiliary or state variables) s. Thus, Local state of process i = (p i, s i ) Global state of the system = (p, s), where p = the set of all p i, and s = the set of all s i (p, s) LC p LC p and s LC s
Definitions Containment time is the maximum time needed to establish LC p from a 1-faulty configuration Containment in space means the primary variables of O(1) processes changing their state during recovery from any 1- faulty configuration Fault-gap is the time to reach LC (both LC p and LC S ) from any 1-faulty configuration LC p restored LC s restored Fault gap
Weakly stabilizing leader election We start from the weakly stabilizing leader election algorithm by Devismes, Tixeuil,Yamashita [ICDCS 2007], and then modify it to add fault-containment. Here is the DTY algorithm for an array of processes. DTY algorithm: Program for any process in the array Guarded actions: R1 :: not leader ∧ N(i) = C(i)→ be a leader R2 :: not leader ∧ N(i) \ {C(i) U P(i)} ≠ → switch parent R3 :: leader ∧ N(i) ≠ C(i) → parent := k : k C(i)
Recovery from a single failure With a randomized scheduler, weakly stabilizing systems recover to a legal configuration with probability 1. However, If a single failure occurs, the recovery time can be as large as n (consider situations similar to Gambler’s ruin). For fault-containment, we need something better.
Our strategy We bias a randomized scheduler to achieve our goal. The technique was first illustrated in [Dasgupta, Ghosh, Xiao: SSS 2007]. Here we show that the technique is indeed powerful enough to solve a larger class of problems.
Biasing a random scheduler For fault-containment, each process i uses a secondary variable x(i). A node i updates its primary variable P(i) when the following conditions hold: 1.The guard involving the primary variables is true 2.The randomized scheduler chooses i 3.x(i) ≥ x(k), where k N(i)
Biasing a random scheduler i jk x(i)=10x(k)=7x(j)=8 ijk x(i)=13x(k)=7x(j)=8 (Let m = 5) ijk x(i)=10x(k)=7x(j)=8 ijk x(i)=10x(k)=8x(j)=8 After the action, x(i) is incremented as x(i) := max q ∈ N(i) x(q) + m, m ∈ Z+ (call it update x(i), here m is a tuning parameter). When x(i) < x(k) but conditions 1-2 hold, the primary variable P(i) remains unchanged -- only x(i) is incremented by 1 UPDATE x(i) INCREMENT x(i) ** **
The Algorithm Algorithm 1 (containment) : program for process i Guarded actions: R1 :: (P(i) ≠ ⊥ ) ∧ (N(i) = C(i)) → P(i) := ⊥ R2 ::(P(i) = ⊥ ) ∧ ( ∃ k ∈ N(i) \ C(i)) → P(i) := k R3a ::(P(i) = j) ∧ (P(j) ≠ ⊥ ) ∧ ( ∃ k ∈ N(i) : P(k) ≠ i or ⊥ ) ∧ x(i) ≥ x(k) → P(i) := k; update x(i) R3b ::(P(i) = j) ∧ (P(j) ≠ ⊥ ) ∧ ( ∃ k ∈ N(i) : P(k) ≠ i or ⊥ ) ∧ x(i) < x(k) → increment x(i) R4a :: (P(i) = j) ∧ ( ∃ k ∈ N(i) : P(k) = ⊥ ) ∧ x(i) ≥ x(k) → P(i) := k R4b :: (P(i) = j) ∧ ( ∃ k ∈ N(i) : P(k) = ⊥ ) ∧ x(i) < x(k) → increment x(i) R5 :: (P(i) = j) ∧ (P(j) = ⊥ ) ∧ ( ∃ k ∈ N(i) : P (k) ≠ i or ⊥ } → P(i) := k
Analysis of containment Consider six cases 1. Fault at the leader 2. Fault at distance-1 from the leader 3. Fault at distance-2 from the leader 4. Fault at distance-3 from the leader 5. Fault at distance-4 from the leader 6. Fault at distance-5 or greater from the leader
Case 1: fault at leader node R1 applied by node 5 R1 applied by node 4: node 4 is the new leader ** R1 :: (P(i) ≠ ⊥ ) ∧ (N(i) = C(i)) → P(i) := ⊥
Case 2: fault at distance-1 from the leader node R1: node R2: node 5 ** ** * R2 :: (P(i) = ⊥ ) ∧ ( ∃ k ∈ N(i) \ C(i)) → P(i) := k
Case 5: fault at distance-4 from the leader node ** ** R4a(2): x(2)>x(1) R5 (4) ** * R2(5) * R3a(3): x(3)>x(2) stable Non-faulty processes up to distance 4 from the faulty node being affected R4a :: (P(i) = j) ∧ ( ∃ k ∈ N(i) : P(k) = ⊥ ) ∧ x(i) ≥ x(k) → P(i) := k
Case 6: fault at distance ≥ 5 from the leader node ** ** R4a(2): x(2)>x(1) R3a (3); R5 (2) ** * R2 (1) * R3a(3): x(3)>x(2), x(4) With a high m, it is difficult for 4 to change its parent, but 3 can easily do it Recovery complete Current leader
Fault-containment in space Theorem 1. As m ∞, the effect of a single failure is restricted within distance-4 from the faulty process i.e., algorithm is spatially fault-containing. Proof idea. Uses the exhaustive case-by-case analysis. The worst case occurs when a node at distance-4 from the leader node fails as shown earlier.
Fault-containment in time Theorem 2. The expected number of steps needed to contain a single fault is independent of n. Hence algorithm containment is fault-containing in time. Proof idea. Case by case analysis. When a node beyond distance-4 from the leader fails, its impact on the time complexity remains unchanged. A summary of these calculation follows:
Fault-containment in time ** Recovery completed in a single move regardless of whether node 3 or 4 executes a move. C ase 1 : leader fails C ase 2 : A node i at distance -1 from the leader fails. (a) P(i) becomes ⊥ : recovery completed in one step (b) P(i) switches to a new parent: recovery time = 2 +∑ ∞ n=1 n/2 n = 4
Fault-containment in time Summary of expected containment times Fault at leader-1 Fault at dist-114 Fault at dist-22151/108 Fault at dist-3131/54115/36 Fault at dist-410/929/27 Fault at dist ≥ 433/32115/36 P(i) ⊥ P(i) switches Thus, the expected containment time is O(1)
Proof idea of weak stabilization DTY algorithmOur algorithm R1 R2 R3 R1 R2 R3 R4 R5 Executes the same action (P(i) :=k) as in DTY, but the guards are biased differently Equivalent to adding “different delays” in different paths Every computation in our algorithm is a computation of the DTY algorithm too. Since DTY algorithm is weakly stabilizing, so is our algorithm
Stabilization from multiple failures Theorem 3. When m → ∞, the expected recovery time from multiple failures is O(1) if the faults occur at distance 9 or more apart. Proof sketch. Since the contamination number is 4, no non-faulty process is influenced by both failures. 44 Fault
Conclusion 1.With increasing m, the containment in space is tighter, but stabilization from arbitrary initial configurations slows down. 2.LC s = true, so the systems is ready to deal with the next single failure as soon as LC p holds. This reduces the fault-gap and increases system availability. 3.The unbounded secondary variable x can be bounded using the technique discussed in [Dasgupta, Ghosh, Xiao SSS 2007] paper. 4.It is possible to extend this algorithm to a tree topology (but we did not do it here)
Questions?
Proof of convergence Theorem 3. The proposed algorithm recovers from all single faults to a legal configuration. Proof (Using martingale convergence theorem) A martingale is a sequence of random variables X 1, X 2, X 3, … s.t. ∀ n 1.E(|X n |) < ∞, and 2.E(X n+1 |X 1 … X n ) = X n (for super-martingale use ≤ for =, and for sub-martingale, use ≥ for =) We use the following corollary of Martingale convergence theorem: Corollary. If X n ≥ 0 is a super-martingale then as n → ∞, X n converges to X with probability 1, and E(X) ≤ E(X 0 ).
Proof of convergence (continued) Let X i be the number of processes with enabled guards in step i. After 0 or 1 failure, X can be 0, 2, or 3 (exhaustive enumeration). When X i = 0, X i+1 = 0 (already stable) When X i = 2, E(X i+1 )= 1/2 x 1 + 1/2 x 2 = 1 ≤ 2 When X i = 3, E(X i+1 )= 1/3 x 0 + 1/3 x 2 + 1/3 x 4 = 2 ≤ 3 Thus X 1, X 2, X 3, … is a super-martingale. Using the Corollary, as n → ∞, E(X n ) ≤ E(X 0 ). Since X is non-negative by definition, X n converges to 0 with probability 1, and the system stabilizes.