LSRP: Local Stabilization in Shortest Path Routing Anish Arora Hongwei Zhang
Motivations Local fault containment is important in large-scale systems Stability, availability, and scalability Self-stabilization is desirable in the presence of unanticipated faults Even simple faults (such as node crash and message loss ) can drive a network protocol into arbitrary states Local containment and local self-stabilization in routing remain unsolved Only consider D-V routing RIP, BGP (path-vector), DSDV, AODV …
Outline Network and fault model Definitions & problem statement LSRP design & analysis Related work Summary
Network model A network is a connected graph G=(V, E) Each node has a unique ID There is a clock at each node, with a single constraint “the ratio of clock rates between any two neighboring nodes is bounded from above by (not caring about the absolute value)”
Fault model Fail-stop: node and link Join: node and link State corruption
Outline Network and fault model Definitions & problem statement LSRP design & analysis Related work Summary
Definitions Perturbations size Range of contamination F-local stabilizing problem specific & algorithm independent algorithm dependent
Perturbation size: definition Problem-specific variables E.g., “next-hop” in routing Perturbation size at a network state q, denoted as P(q), is the minimum number of up nodes where some transient faults have occurred or the values of whose problem-specific variables have to be changed in order for the network to stabilize to a legitimate state It characterizes the minimum amount of work needed in order for a network to stabilize
0 Perturbation size: examples Perturbation size: 0Perturbation size: 1Perturbation size:
Range of contamination When a network self-stabilizes to a legitimate state q’ from an arbitrary state q, the range of contamination during stabilization is the maximum distance from any node, that has changed state at least once during stabilization but whose state is the same at q’ and q, to the set of nodes that change state from q’ to q G RcRc
F -local stabilizing A network is F-local stabilizing if starting at an arbitrary state q, the network self-stabilizes to a legitimate state within F(P(q)) time, where F is a function and P(q) is the perturbation size at state q. “ A network is F-local stabilizing” implies that the range of contamination during stabilization is O(F(P(q))).
Problem statement: local stabilization in shortest path routing Design a protocol that, given a network G(V, E) and a destination node r, constructs and maintains a spanning tree T (called shortest path tree) of G such that r is the root of T for every node i V, the path from i to r in T is a shortest path between i and r in G the network is F-local stabilizing
Outline Network and fault model Definitions & problem statement LSRP design & analysis Related work Summary
Fault propagation in existing D-V protocols
LSRP design The cause for fault propagation: “correction” action always lags behind “fault propagation” action Solution: the “source of fault propagation (such as node 8)” detects the fault propagation, and initiates a “containment” action that catches up with and stops the “fault propagation” action avoid forming cycles during stabilization, and remove existing cycles fast
Approach: layering of diffusing waves Use three diffusing waves such that Each diffusing wave has different propagation speed Speed is controlled by introducing delay in action execution A mistakenly initiated layer-i wave W i is contained and prevented from propagating unbounded by a layer-(i+1) wave that is initiated at the same node which has initiated W i The top-layer wave self-stabilizes itself locally upon perturbations Specifically, V2V2 V1V1 Super-containment Wave Stabilization Wave Containment Wave V0V0 V 1 > V 0 V 2 > V 1 > V 0
Stabilization wave Implements the basic distributed Bellman-Ford algorithm, with slight changes to interact with containment wave (no interaction with super-containment wave) Variables: (p.i, d.i) for each node i Actions: :: ( i is the dest. node i initiated a cont. wave) p.i ≠ i p.i := i [] :: i prop. SW from j j is not in CW d.i, p.i := d.j+1, j ghost.i := false Can be mistakenly initiated and cause fault propagation thus calls for containment wave ··· Stabilization Wave ··· V0V0
Containment wave Prevents a mistakenly initiated stabilization wave from propagating faults unbounded Additional variable: ghost.i for each node i Actions: :: ghost.i (i is a source of fault prop. i prop. CW from p.i) ghost.i := true; if i is a source of fault prop. p.i := i fi [] :: ghost.i no other node using the corrupted state of i ghost.i := false; set (d.i, p.i) Catch up with and stop corresponding stabilization wave Can be mistakenly initiated thus call for super-containment wave V1V1 ··· Stabilization Wave Containment Wave V0V0
Super-containment wave Prevents a mistakenly initiated containment wave from propagating unbounded No additional variables needed (stateless) Action :: ghost.i (i is not a source of fault prop. p.i is not in CW) ghost.i := false Catch up with and stop corresponding containment wave Self stabilizes locally stateless: trivial stabilization (no action needed) no unbounded propagation: constrained by the range of containment wave (which is a function of perturbation size) V2V2 V1V1 Super-containment Wave Stabilization Wave Containment Wave V0V0
Example revisited C1 enabled at node 8 S2 enabled at nodes 6 and 5 C1 executed at node 8 first, which disables S2 at nodes 6 and 5 C2 executed at node 8, and network self-stabilizes 0 2
Protocol analysis LSRP is F-local stabilizing, where F is a linear function: starting at an arbitrary state q 0, a network reaches a state where the shortest path tree is formed within O(P(q0)) time the range of contamination is O(MAXP), where MAXP denotes the number of nodes in the largest perturbed region at q 0 and is no greater than P(q 0 ). perturbed regions that are far away from one another (i.e. half- distance is w(MAXP)) self-stabilizes in parallel Quick loop removal: existing loops are removed within a small constant (i.e.,d sc +U) time Loop freedom: no new loop is formed during stabilization
Outline Network and fault model Definitions & problem statement LSRP design & analysis Related work Summary
Related work Ghosh, Gupta, Herman, and Pemmaraju (PODC ’96) [4] Algorithms for locally containing a single state-corruption during stabilization of a shortest path tree Not deal with such cases of multiple faults and node or link fail-stop Ghosh and He (WSS ’99) [5] Fault-containing self-stabilizing algorithm for a consensus problem Only considers the case of linear topology, and the range of contamination can be exponential in the perturbation size Zhang and Arora (PODC ‘02) [16] Local stabilizing algorithm for clustering and shortest path routing in wireless sensor networks The approach is based on different model assumptions: dense node distribution, and knowledge of geometric information
Outline Network and fault model Definitions & problem statement LSRP design & analysis Related work Summary
Conclusion Formulated concepts of perturbation size, range of contamination, and F-local stabilization Designed LSRP for linear-local stabilization in shortest path routing quick loop removal and loop freedom are automatically guaranteed by local stabilization Faults are regarded as state corruption, and dealt with by way of self-stabilization