Download presentation
Presentation is loading. Please wait.
1
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer
2
2 Reminder: ◊P and ◊S Failure Detectors ◊P - Eventually Perfect: –Strong Completeness: From some point on, every faulty process is suspected by every correct process –Eventual Strong Accuracy: From some point on, no correct process is suspected ◊S - Eventually Strong: –Strong Completeness –Eventual Weak Accuracy: There exists some correct process that is not suspected by any correct process from some point on Processes do not know who this process is
3
3 Our Model n processes 1,…,n Reliable links between correct processes Asynchronous –Messages can be delayed arbitrarily Non-assumption –Processes take steps at asynchronous times No clocks ◊S failure detector t<n/2 crash failures –Optimal for ◊S (Chandra, Toueg JACM 96)Chandra, Toueg JACM 96 –A process that crashes at any point in a run is faulty in that run
4
4 ◊S-based Consensus: MR Algorithm [ Mostefaoui, Raynal 99 ] Asynchronous rounds: –Each process locally progresses through rounds r = 1, 2, 3, … –Different processes can progress at different times Rotating coordinator –Process i mod n is the coordinator of round i Each round consists of two phases
5
5 val input; est || for r =1, 2, … do coord (r-1 mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, v) from coord OR suspect coord (by ◊S)) if receive v from coord then est v else est send (r, est) to all wait for (r,e) from n-t processes if any non- value e received then val e if all received e’s have same non- value v then send (“decide”, v) to all return(v) || Upon receive (“decide”, v), forward to all; return(v) 1 2 <>S-based Consensus [ Mostefaoui, Raynal 99 ] [ Mostefaoui, Raynal 99 ] Note: the only values sent are the coord’s val and Note: return is like decide and halt
6
6 MR Principles: Phase 1 The purpose of the 1 st phase: –Ensure that for every p i, est i {val coord, } Progress: why does the 1 st phase terminate? –By Strong Completeness property of <>S, if the coordinator crashes, then every correct process will eventually either receive a message from the coordinator, or suspect the crashed coordinator Note: Because of asynchrony, and since the failure detector is unreliable, some of the processes may have est = null while others have est = val coord
7
7 MR Principles: Phase 2 A process p i finishes the 2 nd phase when it has received (r, est) from a majority of processes Why is the majority important? –Every two majority sets intersect –If one process got n-t values of v: if all received e’s have same non- value v then send (“decide”, v) to all return(v) –then some other process got at least one value of v: if any non- value e received then val e –Thus, If process p i decides v during r, and if process p j progresses to r+1, then p j does it with est = v The purpose of the 2 nd phase: –Ensure that the Agreement property is never violated Progress: why does the 2 nd phase terminate? –Since there are at least n-t correct processes
8
8 Second Phase (cont’d) Notation: –v = val coord –rec i – the set of received est values at the end of phase II. –rec i = { } or {v} or {v, } Consider three cases: 1.rec i = {v} (rec j = {v}) or (rec j = { , v}) decide v –pi knows that all correct processes also know v, and would either decide v or their est = v 2.rec i = { } (rec j = { }) or (rec j = { , v}) skip to the next round –pi knows that all other processes include NULL in their rec j. No other process can decide 3.rec i = {v, } (rec j = {v}) or (rec j = { }) or (rec j = { , v}) update est to v (why?) –Some process might have decided v, so EVERY pi proceeds to the next round with est=v
9
9 val input; est || for r =1, 2, … do coord (r-1 mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, v) from coord OR suspect coord (by ◊S)) if receive v from coord then est v else est send (r, est) to all wait for (r,e) from n-t processes if any non- value e received then val e if all received e’s have same non- value v then send (“decide”, v) to all return(v) || Upon receive (“decide”, v), forward to all; return(v) Do all processes decide at the same time? 1 2 Why do we need “send (“decide”, v) to all”?
10
10 n=4, t=1 v v v v v v v v v return(v) Round 1 Round 2 Suspect p1 They are stuck waiting for n – t = 3 messages
11
11 Disseminating the decision Q: ok, so we need the “ send (“decide”, v) to all ”. But why “forward to all”? –|| Upon receive (“decide”, v), forward to all; return(v) A: to prevent a process from blocking forever. A process that decides uses reliable broadcast to disseminate its decision value.
12
12 n=4, t=1 v v v v v v v v v v Round 1 Round 2 Suspect p1 v return(v) The “decide” message reaches only one process since the sender crashes. We need the receiver to forward to all, i.e., reliable broadcast Stuck again…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.