Lecture 3: State, Detection Anish Arora CSE 763 1
The Stability Detection Problem A stable property of a distributed system is one that persists: once a stable property is true it remains true thereafter Examples: “the computation has terminated” “the system is deadlocked” “all tokens in a token ring have disappeared” Solution Determine the global state of the system Test the global state to see if the stable property holds
Termination Detection Processes 0..N-1 arbitrarily connected by channels Each process either idle or active An active process can become idle spontaneously An idle process can become active only upon receiving a message The Problem : Detect that all processes are idle and all channels are empty
Program and Proof (hand-in-hand) Design Step 0 : How to count messages in channels. process j {send msg} c.j := c.j + 1 ▯ {receive msg} c.j := c.j - 1 Proof : Invariant I1 (Sum j :: c.j) = # of messages in channels
Refining the program Step 1 : How to detect that all processes are idle. Consider a logical ring 0 -> … N-1 -> … 0 and pass a token Let t denote the location of the token process j {send msg} c.j := c.j + 1 ▯ {receive msg} c.j := c.j - 1 ▯ {propagate token} t := t – 1 j 0 t = j idle.j ; q := q + c.j ▯ {retransmit token} t := N – 1 j = 0 t = j idle.j ; q := 0 (q + c.0 = 0)
Refining the proof Proof : We begin with an idealized Invariant I1 Q, where Q (j : t<j j<N : idle.j) (q = (Sum j : t<j j<N : c.j)) However Q is not preserved by one of the actions (the receive action for j, t < j j < N) But when Q is violated, R becomes true, where R q + (Sum j : 0 j j t : c.j) > 0 So, we weaken Invariant I1 (Q R) However R is not preserved by one of the actions (the receive action for j, 0 j and j t)
Refining the program again Step 2 : How to abort a detection when unsure that the token traversal was uninterrupted. process j {send msg} c.j := c.j + 1 ▯ {receive msg} c.j := c.j – 1; ; blacken j ▯{propagate token} t := t – 1 j 0 t = j idle.j ; q := q + c.j ; whiten j ▯{retransmit token} t := N – 1 j = 0 t = j idle.j ; q := 0 (q + c.0 = 0 0 is white) ; whiten j
Iterated refinement ▯ {retransmit token} t := N – 1 Proof : Invariant I1 (Q R S) where S (j:0 j jt:j is black) However S is not preserved by one of the actions (the propagate action at a black node) So we introduce a color for the token and get the final program program of process j {send msg} c.j := c.j + 1 ▯ {receive msg} c.j := c.j – 1; ; blacken j ▯ {propagate token} t := t – 1 j 0 t = j idle.j ; q := q + c.j ; if black j then blacken token ; whiten j ▯ {retransmit token} t := N – 1 j = 0 t = j idle.j ; q := 0 (q + c.0 = 0 ; whiten token token is white 0 is white) ; whiten j
Termination Detection Predicate Termination (j :: idle.j) # of msgs sent - # of msgs received = 0 Invariant (Sum j:: c.j) = # of msgs sent - # of msgs received (Q R S T) Q (j : t<j j<N : idle.j) (q=( j : t<j j<N : c.j)) R q + ( j : 0 j j t : c.j) > 0 S (j : 0 j j t : j is black) T token is black
Proof of correctness Invariant t=0 O is white idle.0 q+c.0=0 token is white Termination Invariant Termination leads-to t = 0 0 is white idle.0 q + c.0 = 0 token is white
Termination Detection Proof of (1): O is white t = 0 S q + c.0 = 0 t = 0 R token is white T Hence the antecedent implies Invariant Q q + c.0 = 0 i.e., the antecedent implies Termination Proof of (2): If termination has occurred, only the propagation and retransmission actions can execute After the first complete traversal of the ring by the token, all processes are white and the token is white At the end of the next traversal, when t = 0, the algorithm detects the termination of the underlying computation