Failure Detectors & Consensus
Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
Unreliable Failure Detectors A distributed failure detector D consists of a local failure detector module D p at each process p When D p suspects a process j to have crashed it adds j to suspects p, if later on D p realizes it made a mistake it can remove j from suspects p Failure detectors are defined in terms of abstract properties. Namely, two classes of competence and four classes of accuracy.
Completeness Classes Strong Completeness Eventually, every process that crashes is permanently suspected by every correct process Weak Completeness Eventually, every process that crashes is permanently suspected by some correct process
Accuracy Classes Strong Accuracy No process is suspected before it crashes Weak Accuracy Some correct process is never suspected Eventual Strong Accuracy There is a time after which correct processes are not suspected by any correct process Eventual Weak Accuracy There is a time after which some correct process is never suspected by any correct process.
Failure Detectors Classes Strong Weak StrongWeakEventual StrongEventual Weak Accuracy Completeness Perfect P Q Strong S Weak W Eventually Perfect ◊P ◊Q◊Q Eventually Strong ◊S Eventually Weak ◊W
Reducibility A Distributed Algorithm T D→D’ transforms a failure detector D into a failure detector D’ if it maintains a variable output p at every process p which emulates the output of D’ T D→D’ is called a reduction algorithm and D’ is reducible to D, denoted D ≥ D’ (D’ is “weaker”) A simple T ◊S → ◊W ?
From Weak Completeness to Strong Completeness T ◊W → ◊S Code for process p output p ← Φ Task 1: repeat forever suspects p ← ◊W p send(p, suspects p ) to all Task 2: upon receiving (q, suspects q ) for some q output p ← (output p U suspects q ) – {q} ◊S≥◊W && ◊W≥◊S → ◊W=◊S
Consensus In the Consensus problem every process p i proposes a value v i and all correct processes have to decide on some value v, in relation to the set of proposed values. More formally, a distributed consensus algorithm must satisfy: Termination: Every correct process eventually decides on some value. Validity: If a process decides v, then v was proposed by some process (non triviality) Agreement: No two correct processes decide differently It is impossible to solve consensus in asynchronous system even if only one process might crash [FLP]
Solving Consensus using ◊S Code for process p i 1 ≤ i ≤ n Task 1: r i ← 0; est i ← v i ; 1. while didn’t decide do 2. c ← (r i mod n) + 1; est_from_c i ← ∟; r i ← r i if (i = c) then est_from_c i ← est i 4. else wait until is received from p c or c is suspected 5. if received then est_from_c i ← v 6. send to all 7. wait until collected from a majority of processes 8. rec i ← {est_from_c | was received} 9. if rec i = {v} then decide v and send to all 10. if rec i = {v, ∟} then est i ← v Task 2: 1. Upon reception of decide v and send to all