Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Algorithms (22903)

Similar presentations


Presentation on theme: "Distributed Algorithms (22903)"— Presentation transcript:

1 Distributed Algorithms (22903)
The wait-free hierarchy and the universality of consensus Lecturer: Danny Hendler This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch

2

3

4

5 Formally: the Consensus Object
Supports a single operation: decide Each process pi calls decide with some input vi from some domain. decide returns a value from the same domain. The following requirements must be met: Agreement: In any execution E, all decide operations must return the same value. Validity: The values returned by the operations must equal one of the inputs.

6 Wait-free consensus can be solved easily by compare&swap
Comare&swap(b,old,new) atomically v read from b if (v = old) { b new return success } else return failure; Motorola 680x0 IBM 370 Sun SPARC 80X86 MIPS PowerPC DECAlpha

7 Would this consensus algorithm from reads/writes work?
Initially decision=null Decide(v) ; code for pi, i=0,1 if (decision = null) decision=v return v else return decision

8 A proof that wait-free consensus for 2 or more processes cannot be solved by registers.

9 A FIFO queue Supports 2 operations: q.enqueue(x) – returns ack
q.dequeue – returns the first item in the queue or empty if the queue is empty.

10 FIFO queue + registers can implement 2-process consensus
Initially Q=<0> and Prefer[i]=null, i=0,1 Decide(v) ; code for pi, i=0,1 Prefer[i]:=v qval=Q.deq() if (qval = 0) then return v else return Prefer[1-i] There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers

11 A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers)

12 The wait-free hierarchy
We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y with X and registers in a system with more than m processes.

13 The wait-free hierarchy (cont’d)
registers 1 FIFO queue, stack, test-and-set 2 Compare-and-swap

14 The universality of conensus
An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes An algorithm is nonblocking if it guarantees that some operation terminates after some finite total number of steps performed by processes. The nonblocking progress property is weaker than wait-freedom.

15 A nonblocking universal algorithm using CAS
Each operation is represented by a shared record of type opr. typedef opr structure { inv ;the operation invocation, including its parameters new-state ;the new state of the object, after applying the operation response ;The response of the operation before ;A pointer to the record of the previous operation on the object Head inv new-state response before inv new-state response before inv new-state response before

16 A nonblocking universal algorithm using CAS (cont’d)
Head anchor inv new-state response before inv new-state response before inv new-state response before Initially Head points to the anchor record. Head.newstate is initialized with the implemented object’s initial state. When inv occurs point:=new opr, point.inv:=inv repeat h:=Head point.new-state, point.response=apply(inv, h.new-state) until compare&swap(Head, h, point)=h return point.response

17 A nonblocking universal algorithm using consensus
Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) Head anchor seq seq=1 seq inv new-state response after inv=null new-state=init response=null after inv new-state response after

18 … A nonblocking universal algorithm using consensus (cont’d) Head
anchor seq seq=1 seq inv new-state response after inv=null new-state=init response=null after inv new-state response after Initially all Head entries points to the anchor record. When inv occurs point:=new opr, point.inv:=inv for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] repeat win:=decide(Head[i].after,point) ; try to thread your operation win.seq:=Head[i].seq+1 win.new-state, win.response:=apply(win.inv, Head[i].new-state) Head[i]=win ; point to the following record until win=point return point.response

19 inv new-state response after
A wait-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) We add a helping mechanism Announce inv new-state response after seq When performing operation with sequence-num j, try to help process (j mod n)

20 A wait-free universal algorithm using consensus (cont’d)
Initially all Head and Announce entries point to the anchor record. When inv occurs Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0 for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] while Announce[i].seq=0 do priority:=Head[i].seq+1 mod n ; ID of process with priority if Announce[priority].seq=0 ; If help is needed then point:=Announce[priority] ; help the other process else point:=Announce[i] ; perform own operation win:=decide(Head[i].after, point) win.new-state,win.response:=apply(win.inv,Head[i].new-state) win.seq:=Head[i].seq+1 Head[i]:=win return Anounce[i].response

21 A proof that the universal algorithm using consensus is wait-free

22 What is the number of records needed by the algorithm?
A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records Each process allocates records from its private pool A record is recycled once we’re sure it will not be referenced anymore We don’t need this mechanism if we use a language with a GC (such as Java)

23 A bounded-memory wait-free universal algorithm using consensus (cont’d)
When can we recycle record #k? No process trying to thread record (k+n+1) or higher will access record k. After all the processes that thread records k…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k…k+n – it can be recycled.

24 A bounded-memory wait-free universal algorithm using consensus: data structures
Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) prev ;A pointer to the previous record released[1..n] initially true Head anchor inv new-state response before after seq inv new-state response before after seq inv new-state response before after seq

25 A bounded-memory wait-free universal algorithm using consensus (cont’d)
Initially all Head and Announce entries point to the anchor record. When inv occurs point:=a free record from private pool, point.inv:=inv,point.seq:=0 for r:=1 to n do point.released[r]:=false, Announce[i]:=point for j=0 to n-1 ; find a record with the maximum sequenece number if Head[j].seq > Head[i].seq then Head[i]=Head[j] while Announce[i].seq=0 do priority:=Head[i].seq+1 mod n ; ID of process with priority if Announce[priority].seq=0 ; If help is needed then point:=Announce[priority] ; help the other process else point:=Announce[i] ; perform own operation win:=decide(Head[i].after, point) win.new-state,win.reponse:=apply(win.inv,Head[i].new-state) win.before:=Head[i] win.seq:=Head[i].seq+1 Head[i]=win temp:=Announce[i].before for r:=1 to n do if temp<> anchor then before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp return Announce[i].response

26 How many records are required by the algorithm?
Each incomplete operation may waste n distinct records There may be up to n incomplete operations At any point in time, up to n2 non-recycable records All non-recycable records may belong to same process! Each pool should have n2+1 records, O(n3) total records needed


Download ppt "Distributed Algorithms (22903)"

Similar presentations


Ads by Google