Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.

Slides:



Advertisements
Similar presentations
IBM T. J. Watson Research Center Conditions for Strong Synchronization Maged Michael IBM T J Watson Research Center Joint work with: Martin Vechev, Hagit.
Advertisements

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.
1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Universality of Consensus The Art of Multiprocessor Programming Spring 2007.
Prof. Jennifer Welch 1. FIFO Queue Example 2  Sequential specification of a FIFO queue:  operation with invocation enq(x) and response ack  operation.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Critical Section chapter3.
1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.
CPSC 668Set 19: Asynchronous Solvability1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
A Mile-High View of Concurrent Algorithms Hagit Attiya Technion.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
What Can Be Implemented Anonymously ? Paper by Rachid Guerraui and Eric Ruppert Presentation by Amir Anter 1.
Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus.
Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus.
Art of Multiprocessor Programming 1 Universality of Consensus Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 19: Asynchronous Solvability 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
1 Chapter 10 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
Wait-Free Consensus CPSC 661 Fall 2003 Supervised by: Lisa Higham Presented by: Wei Wei Zheng Nuha Kamaluddeen.
Common2 extended to stacks and unbound concurrency By:Yehuda Afek Eli Gafni Adam Morrison May 2007 Presentor: Dima Liahovitsky 1.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic.
1 Consensus Hierarchy Part 2. 2 FIFO (Queue) FIFO Object headtail.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Gal Milman Based on Chapter 10 (Concurrent Queues and the ABA Problem) in The Art of Multiprocessor Programming by Herlihy and Shavit Seminar 2 (236802)
Priority Queues Dan Dvorin Based on ‘The Art of Multiprocessor Programming’, by Herlihy & Shavit, chapter 15.
Distributed Algorithms (22903) Lecturer: Danny Hendler The Atomic Snapshot Object The Renaming Problem This presentation is based on the book “Distributed.
An algorithm of Lock-free extensible hash table Yi Feng.
Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Distributed Algorithms (22903) Lecturer: Danny Hendler Shared objects: linearizability, wait-freedom and simulations Most of this presentation is based.
Distributed Algorithms (22903) Lecturer: Danny Hendler Lock-free stack algorithms.
1 © R. Guerraoui Set-Agreement (Generalizing Consensus) R. Guerraoui.
Distributed Algorithms (22903) Lecturer: Danny Hendler Approximate agreement This presentation is based on the book “Distributed Computing” by Hagit attiya.
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Algorithms (22903)
Lesson Objectives Aims
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Multiprocessor Synchronization Algorithms ( )
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Nir Shavit Multiprocessor Synchronization Spring 2003
Presentation transcript:

Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch

2

3

4

5 Formally: the ConsensusObject -Supports a single operation: decide -Each process p i calls decide with some input v i from some domain. decide returns a value from the same domain. -The following requirements must be met: - Agreement: In any execution E, all decide operations must return the same value. - Validity: The values returned by the operations must equal one of the inputs.

6 Wait-free consensus can be solved easily by compare&swap Comare&swap(b,old,new) atomically v read from b if (v = old) { b new return success } else return failure; MIPS PowerPC DECAlpha Motorola 680x0 IBM 370 Sun SPARC 80X86 How?

7 Would this consensus algorithm from reads/writes work? Decide(v) ; code for p i, i=0,1 1.if (decision = null) 2. decision=v 3. return v 4.else 5. return decision Initially decision=null

8 A proof that wait-free consensus for 2 or more processes cannot be solved by registers.

9 A FIFO queue Supports 2 operations: q.enqueue(x) – returns ack q.dequeue – returns the first item in the queue or empty if the queue is empty.

10 FIFO queue + registers can implement 2-process consensus Decide(v) ; code for p i, i=0,1 1.Prefer[i]:=v 2.qval=Q.deq() 3.if (qval = 0) then return v 4.else return Prefer[1-i] Initially Q= and Prefer[i]=null, i=0,1 There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers

11 A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers)

12 The wait-free hierarchy We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y from instances of X and registers in a system with more than m processes.

13 The wait-free hierarchy (cont’d)  Compare-and-swap … 2FIFO queue, stack, test-and-set 1registers

14 The universality of conensus An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes The lock-freedom progress property is weaker than wait-freedom. An algorithm is lock-free if it guarantees that some operation terminates after some finite total number of steps performed by processes.

15 Universal constructions Given the sequential specification of any object, implement a linearizable wait-free concurrent version of it: A lock-free construction using CAS A lock-free construction using consensus A wait-free construction using consensus A bounded-memory wait-free construction using consensus

16 A lock-free universal algorithm using CAS inv new-state response … Head Each operation is represented by a shared record of type opr. typedef opr structure { inv ;the operation invocation, including its parameters new-state ;the new state of the object, after applying the operation response ;The response of the operation }

inv new-state response 17 A lock-free universal algorithm using CAS (cont’d) Initially Head points to the anchor record. Head.newstate is initialized with the implemented object’s initial state. 1.When inv occurs 2. point:=new opr, point.inv:=inv 3. repeat 4. h:=Head 5. point.new-state, point.response=apply(inv, h.new-state) 6. until compare&swap(Head, h, point)=h 7. return point.response inv new-state response … inv new-state=init response Head anchor

18 A lock-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) … Head inv new-state response after seq inv new-state response after seq inv=null new-state=init response=null after seq=1 anchor

19 A lock-free universal algorithm using consensus (cont’d) … Head inv new-state response after seq inv new-state response after seq inv=null new-state=init response=null after seq=1 anchor Initially all Head entries points to the anchor record. 1.When inv occurs 2. point:=new opr, point.inv:=inv 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j].seq > Head[i].seq then Head[i]=Head[j] 5. repeat 6. win:=decide(Head[i].after,point) ; try to thread your operation 7. win.seq:=Head[i].seq+1 8. :=apply(win.inv, Head[i].new-state) 9. Head[i]=win ; point to the following record 10. until win=point 11. return point.response

20 A wait-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) We add a helping mechanism Announce inv new-state response after seq When performing operation with sequence number j, try to help process (j mod n)

21 A wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. 1.When inv occurs 2. Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j].seq > Head[i].seq then Head[i]=Head[j] 5. while Announce[i].seq=0 do 6. priority:=Head[i].seq+1 mod n ; ID of process with priority 7. if Announce[priority].seq=0 ; If help is needed 8. then point:=Announce[priority] ; help the other process 9. else point:=Announce[i] ; perform own operation 10. win:=decide(Head[i].after, point) 11. :=apply(win.inv,Head[i].new-state) 12. win.seq:=Head[i].seq Head[i]=win 14.return Announce[i].reponse

22 A proof that the universal algorithm using consensus is wait-free

23 A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records Each process allocates records from its private pool A record is recycled once we’re sure it will not be referenced anymore We don’t need this mechanism if we use a language with a GC (such as Java)

24 A bounded-memory wait-free universal algorithm using consensus (cont’d) When can we recycle record #k? No process trying to thread record (k+n+1) or higher will write record k. After all the processes that thread records k…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k+1…k+n – it can be recycled.

25 Each operation is represented by a shared record of type opr. released typedef opr structure { seq ;the operation’s sequential number (register) inv ;the operation invocation, including its parameters (register) new-state ;the new state of the object, after applying the operation (register) response ;The response of the operation, including its return value (register) after ;A pointer to the next record (consensus object) before ;A pointer to the previous record released[1..n] initially true A bounded-memory wait-free universal algorithm using consensus: data structures … Head anchor inv new-state response before after seq inv new-state response before after seq inv new-state response before after seq

26 Initially all Head and Announce entries point to the anchor record. 1.When inv occurs 2.point:=a free record from private pool, point.inv:=inv,point.seq:=0 for r:=1 to n do point.released[r]:=false, Announce[i]:=point 3. for j=0 to n-1 ; find a record with the maximum sequenece number 4. if Head[j].seq > Head[i].seq then Head[i]=Head[j] 5. while Announce[i].seq=0 do 6. priority:=Head[i].seq+1 mod n ; ID of process with priority 7. if Announce[priority].seq=0 ; If help is needed 8. then point:=Announce[priority] ; help the other process 9. else point:=Announce[i] ; perform own operation 10. win:=decide(Head[i].after, point) 11. :=apply(win.inv,Head[i].new-state) 12. win.before:=Head[i] 13. win.seq:=Head[i].seq Head[i]=win 15.temp:=Announce[i].before 16.for r:=1 to n do 17. if temp<> anchor then 18. before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp 19.return Announce[i].response A bounded-memory wait-free universal algorithm using consensus (cont’d)

27 How many records are required by the algorithm? Each incomplete operation may waste n distinct records There may be up to n incomplete operations At any point in time, up to n 2 non-recycable records All non-recycable records may belong to same process! Each pool should have O(n 2 ) records, O(n 3 ) total records needed