Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Slides:



Advertisements
Similar presentations
Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.
Advertisements

1 Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Types of Synchronization.
Synchronization without Contention
Operating Systems Part III: Process Management (Process Synchronization)
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
John M. Mellor-Crummey Algorithms for Scalable Synchronization on Shared- Memory Multiprocessors Joseph Garvey & Joshua San Miguel Michael L. Scott.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Process Synchronization Continued 7.2 The Critical-Section Problem.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
1 Chapter 2 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2007 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld.
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
1 Operating Systems, 122 Practical Session 5, Synchronization 1.
Local-spin, Abortable Mutual Exclusion Joe Rideout.
Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer.
Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.
Local-Spin Algorithms
1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.
1 Chapter 3 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.
CPSC 668Set 7: Mutual Exclusion with Read/Write Variables1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Multiprocess Synchronization Algorithms ( )
CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.
CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
OS Spring’04 Concurrency Operating Systems Spring 2004.
1 Adaptive and Efficient Mutual Exclusion Presented by: By Hagit Attya and Vita Bortnikov Mian Huang.
Concurrency in Distributed Systems: Mutual exclusion.
Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
THIRD PART Algorithms for Concurrent Distributed Systems: The Mutual Exclusion problem.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 9 th Edition Chapter 5: Process Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
Mutual Exclusion Using Atomic Registers Lecturer: Netanel Dahan Instructor: Prof. Yehuda Afek B.Sc. Seminar on Distributed Computation Tel-Aviv University.
O(log n / log log n) RMRs Randomized Mutual Exclusion Danny Hendler Philipp Woelfel PODC 2009 Ben-Gurion University University of Calgary.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
Synchronicity Introduction to Operating Systems: Module 5.
Local-Spin Mutual Exclusion Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.
Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler
CPSC 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 6: Mutual Exclusion in Shared Memory 1.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.
OS Winter’03 Concurrency. OS Winter’03 Bakery algorithm of Lamport  Critical section algorithm for any n>1  Each time a process is requesting an entry.
Distributed Algorithms (22903)
Bakery Algorithm - Proof
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
O(log n / log log n) RMRs Randomized Mutual Exclusion
O(log n / log log n) RMRs Randomized Mutual Exclusion
Outline Monitors Barrier synchronization Readers and Writers
Spin Locks and Contention Management
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Course Syllabus 1. Introduction - History; Views; Concepts; Structure
Distributed Algorithms (22903)
Sitting on a Fence: Complexity Implications of Memory Reordering
Course Syllabus 1. Introduction - History; Views; Concepts; Structure
Multiprocessor Synchronization Algorithms ( )
Course Syllabus 1. Introduction - History; Views; Concepts; Structure
Chapter 6: Synchronization Tools
Course Syllabus 1. Introduction - History; Views; Concepts; Structure
Course Syllabus 1. Introduction - History; Views; Concepts; Structure
Process/Thread Synchronization (Part 2)
Outline Monitors Barrier synchronization Readers and Writers
Syllabus 1. Introduction - History; Views; Concepts; Structure
Presentation transcript:

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

The CC and DSM models This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Remote and local memory accesses In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access or if v has been written by another process since p’s last access of it.

Local-spin algorithms In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM/CC) and non-local spin on the other! For local-spin algorithms, our complexity metric is the worst-case number of Remote Memory References (RMRs)

Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[1]:=false Is this algorithm local-spin on a DSM machine? No Is this algorithm local-spin on a CC machine? Yes

Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[0]:=false What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant

Kessel’s single-writer algorithm Program for process 0 1.b[0]:=true 2.local[0]:=turn[1] 3.turn[0]:=local[0] 4.Await (b[1]=false or local[0]<>turn[1]) 5.CS 6.b[0]:=false Program for process 1 1.b[1]:=true 2.local[1]:=1-turn[0] 3.turn[1]:=local[1] 4.Await (b[0]=false or local[1]=turn[0]) 5.CS 6.b[1]:=false Can Kessel’s algorithm be made local-spin on a DSM machine? Yes, if:  b[1], turn[1] are located at p 0 ’s memory module  b[0], turn[0] are located at p 1 ’s memory module

Local Spinning Mutual Exclusion Using Strong Primitives

Anderson’s queue-based algorithm Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i  {1,..,n-1} Local: integer myTicket Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor 0123n-1 valid ticket

Anderson’s queue-based algorithm (cont’d) 0 ticket valid Initial configuration 1 ticket valid After entry section of p 3 0 myTicket 3 After p 1 performs entry section 2 ticket valid myTicket 3 1 myTicket 1 2 ticket valid After p 3 exits 1 myTicket 1

Anderson’s queue-based algorithm (cont’d) What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor

Graunke and Thakkar’s algorithm Uses the more common swap primitive: swap(w, new) do atomically prev:=w w:=new return prev

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i  {0,..,n-1} structure {bit value, bit *node} tail, initially {0, &slots[0]} Local: structure {bit value, bit *node} myRecord, prev bit temp 0 tail n-11 slots

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i  {0,..,n-1} structure {bit value, bit* slot} tail, initially {0, &slot[0]} Local: structure {bit value, bit* node} myRecord, prev, bit temp Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor

Graunke and Thakkar’s algorithm (cont’d)

What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor

The MCS queue-based algorithm Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially nil Local: Qnode *myNode, initially &nodes[i] Qnode *prev, *successor Has constant RMR complexity under both the DSM and CC models Uses swap and CAS

The MCS queue-based algorithm (cont’d) Program for process i 1.myNode.next := nil ; prepare to be last in queue 2.prev := myNode ;prepare to thread yourself 3.swap(&tail, prev) ;tail now points to myNode 4.if (prev ≠ nil) ;I need to wait for a predecessor 5. *myNode.locked := true ;prepare to wait 6. *prev.next := myNode ;let my predecessor know it has to unlock me 7. await myNode.locked := false 8.CS 9.if (myNode.next = nil) ; if not sure there is a successor 10. if (compare-and-swap(tail, myNode, nil) = false) ; if there is a successor 11. await (myNode->next ≠ null) ; spin until successor let me know its identity 12. successor := myNode->next ; get a pointer to my successor 13. successor->locked := false ; unlock my successor 14.else ; for sure, I have a successor 15. successor := myNode->next ; get a pointer to my successor 16. successor->locked := false ; unlock my successor

The MCS queue-based algorithm (cont’d)

Local Spinning Mutual Exclusion Using reads and writes

A local-spin tournament-tree algorithm (Anderson, Yang, 1993) O(log n) RMR complexity for both DSM and CC systems This is `suspected’ to be optimal! Uses O(n log n) registers Level 0 Level 1 Level 2 Processes Each node is identified by (level, number)

A local-spin tournament-tree algorithm (cont’d) Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node] - Per each level l and process i, a spin flag: flag[level, i] Local : level, node, id

A local-spin tournament-tree algorithm (cont’d) Program for process i 1.id:=i 2.For level = o to log n-1 do ;from leaf to root 3. node:=  id/2  ;the current node 4. name[level, 2node+(id mod 2)]:=i ;identify yourself 5. turn[level,node]:=id ;update the tie-breaker 6. flag[level, i]:=0 ;initialize the locally-accessible spin flag 7. if (even(id)) 8. rival:=name[level, id+1] 9. else 10. rival:=name[level, id-1] 11. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival 12. if (flag[level, rival] =0) 13. flag[level, rival]:=1 ;release the rival from waiting 14. await flag[level, i] ≠ 0 ;await until sure the rival updated the tie-breaker 15. if (turn[level,node]=i) ;if I lost 16. await flag[level,i]=2 ;wait till rival notifies me its my turn 17. id:=node ;move to the next level 18.CS 19.for level=log n –1 downto 0 do ;begin exit code 20. id:=  i/2 level , node:=  id/2  ;set node and id 21. name[level, 2node+(id mod 2]) :=-1 ;erase name 22.rival := turn[level,node] ;find who rival is (if there is one) 23.if rival ≠ i ;if there is a rival 24. flag[level,rival] :=2 ;notify rival