Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Slides:

Advertisements

Similar presentations

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.

Advertisements

Synchronization without Contention

Operating Systems Part III: Process Management (Process Synchronization)

1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Process Synchronization Continued 7.2 The Critical-Section Problem.

Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.

1 Chapter 2 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2007 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld.

Chapter 6: Process Synchronization

Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Local-spin, Abortable Mutual Exclusion Joe Rideout.

Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer.

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Local-Spin Algorithms

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

1 Chapter 3 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Multiprocess Synchronization Algorithms ( )

CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

OS Spring’04 Concurrency Operating Systems Spring 2004.

1 Adaptive and Efficient Mutual Exclusion Presented by: By Hagit Attya and Vita Bortnikov Mian Huang.

Concurrency in Distributed Systems: Mutual exclusion.

Synchronization (other solutions …). Announcements Assignment 2 is graded Project 1 is due today.

Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.

Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.

Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.

Maekawa’s algorithm Divide the set of processes into subsets that satisfy the following two conditions: i  S i  i,j :  i,j  n-1 :: S i  S j.

Process Synchronization Continued 7.2 Critical-Section Problem 7.3 Synchronization Hardware 7.4 Semaphores.

THIRD PART Algorithms for Concurrent Distributed Systems: The Mutual Exclusion problem.

Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 9 th Edition Chapter 5: Process Synchronization.

DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE

Mutual Exclusion Using Atomic Registers Lecturer: Netanel Dahan Instructor: Prof. Yehuda Afek B.Sc. Seminar on Distributed Computation Tel-Aviv University.

O(log n / log log n) RMRs Randomized Mutual Exclusion Danny Hendler Philipp Woelfel PODC 2009 Ben-Gurion University University of Calgary.

1 Concurrent Processes. 2 Cooperating Processes  Operating systems allow for the creation and concurrent execution of multiple processes  concurrency.

Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.

Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-5 Process Synchronization Department of Computer Science and Software.

Local-Spin Mutual Exclusion Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler

CPSC 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 6: Mutual Exclusion in Shared Memory 1.

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

OS Winter’03 Concurrency. OS Winter’03 Bakery algorithm of Lamport  Critical section algorithm for any n>1  Each time a process is requesting an entry.

Bakery Algorithm - Proof

O(log n / log log n) RMRs Randomized Mutual Exclusion

O(log n / log log n) RMRs Randomized Mutual Exclusion

Chapter 5: Process Synchronization

Outline Monitors Barrier synchronization Readers and Writers

Module 7a: Classic Synchronization

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Sitting on a Fence: Complexity Implications of Memory Reordering

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Multiprocessor Synchronization Algorithms ( )

Lecture 21 Syed Mansoor Sarwar

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

CSE 153 Design of Operating Systems Winter 19

Chapter 6: Synchronization Tools

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Process/Thread Synchronization (Part 2)

Syllabus 1. Introduction - History; Views; Concepts; Structure

Presentation transcript:

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

The CC and DSM models This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Remote and local memory accesses In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.

Local-spin algorithms In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM/CC) and non-local spin on the other! For local-spin algorithms, our complexity metric is the worst- case number of Remote Memory References (RMRs)

Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[1]:=false Is this algorithm local-spin on a DSM machine? No Is this algorithm local-spin on a CC machine? Yes

Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[0]:=false What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant

Recall the following simple test-and-set based algorithm Shared lock initially 0 1.While (! lock.test-and-set() ) // entry section 2.Critical Section 3.Lock := 0 // exit section Is this algorithm local-spin on either a DSM or CC machine? Nope.

A better algorithm Shared lock initially 0 1.While (! lock.test-and-set() // entry section 2. await(lock == 0) 3.Critical Section 4.Lock := 0 // exit section Creates less traffic in CC machines, still not local-spin.

Local Spinning Mutual Exclusion Using Strong Primitives

Anderson’s queue-based algorithm Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i  {1,..,n-1} Local: integer myTicket Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor 0123n-1 valid ticket

Anderson’s queue-based algorithm (cont’d) 0 ticket valid Initial configuration 1 ticket valid After entry section of p 3 0 myTicket 3 After p 1 performs entry section 2 ticket valid myTicket 3 1 myTicket 1 2 ticket valid After p 3 exits 1 myTicket 1

Anderson’s queue-based algorithm (cont’d) What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor

Graunke and Thakkar’s algorithm Uses the more common swap (a.k.a. fetch-and-store) primitive: swap(w, new) do atomically prev:=w w:=new return prev

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i  {0,..,n-1} structure {bit value, bit *node} tail, initially {0, &slots[0]} Local: structure {bit value, bit *node} myRecord, prev bit temp 0 tail n-11 slots

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i  {0,..,n-1} structure {bit value, bit* slot} tail, initially {0, &slot[0]} Local: structure {bit value, bit* node} myRecord, prev, bit temp Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor

Graunke and Thakkar’s algorithm (cont’d)

What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor It is not known it there’s a local-spin DSM mutual exclusion that uses only swap (in addition to reads/writes).

The MCS queue-based algorithm Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially nil Local: Qnode *myNode, initially &nodes[i] Qnode *prev, *successor Has constant RMR complexity under both the DSM and CC models Uses swap and CAS

The MCS queue-based algorithm (cont’d) Program for process i 1.myNode->next := nil ; prepare to be last in queue 2.prev := myNode ;prepare to thread yourself 3.swap(&tail, prev) ;tail now points to myNode 4.if (prev ≠ nil) ;I need to wait for a predecessor 5. myNode->locked := true ;prepare to wait 6. prev->next := myNode ;let my predecessor know it has to unlock me 7. await myNode.locked := false 8.CS 9.if (myNode.next = nil) ; if not sure there is a successor 10. if (compare-and-swap(tail, myNode, nil) = false) ; if there is a successor 11. await (myNode->next ≠ null) ; spin until successor let me know its identity 12. successor := myNode->next ; get a pointer to my successor 13. successor->locked := false ; unlock my successor 14.else ; for sure, I have a successor 15. successor := myNode->next ; get a pointer to my successor 16. successor->locked := false ; unlock my successor

The MCS queue-based algorithm (cont’d)

Local Spinning Mutual Exclusion Using reads and writes

A local-spin tournament-tree algorithm (Anderson, Yang, 1993) O(log n) RMR complexity for both DSM and CC systems This is optimal! Uses O(n log n) registers Level 0 Level 1 Level 2 Processes Each node is identified by (level, number)

A local-spin tournament-tree algorithm (cont’d) Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node] - Per each level l and process i, a spin flag: flag[level, i] Local : level, node, id

A local-spin tournament-tree algorithm (cont’d) Program for process i 1.id:=i 2.For level = o to log n-1 do ;from leaf to root 3. node:=  id/2  ;the current node 4. name[level, 2node+(id mod 2)]:=i ;identify yourself 5. turn[level,node]:=id ;update the tie-breaker 6. flag[level, i]:=0 ;initialize the locally-accessible spin flag 7. if (even(id)) 8. rival:=name[level, id+1] 9. else 10. rival:=name[level, id-1] 11. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival 12. if (flag[level, rival] =0) 13. flag[level, rival]:=1 ;release the rival from waiting 14. await flag[level, i] ≠ 0 ;await until sure the rival updated the tie-breaker 15. if (turn[level,node]=i) ;if I lost 16. await flag[level,i]=2 ;wait till rival notifies me its my turn 17. id:=node ;move to the next level 18.CS 19.for level=log n –1 downto 0 do ;begin exit code 20. id:=  i/2 level , node:=  id/2  ;set node and id 21. name[level, 2node+(id mod 2]) :=-1 ;erase name 22. rival := turn[level,node] ;find who rival is (if there is one) 23. if rival ≠ i ;if there is a rival 24. flag[level,rival] :=2 ;notify rival

Local-Spin Leader Election Exactly one process is elected All other processes are not-elected Processes may busy-wait

Choy and Sing's filter Filter m processes The rest are “halted” Between 1 and  m/2  processes “exit “ Filter guarantees: Safety: if m processes enter a filter, at most  m/2  exit. Progress: if some processes enter a filter, at least one exits.

Choy and Sing’s filter (cont’d) Shared: integer turn Boolean b, initially false Program for process i 1.turn := i 2.await  b // wait for barrier to open 3.b := true // close barrier 4.if turn ≠ i // not last to cross the barrier 5. b := false // open barrier 6. halt 7.else 8. exit Why are filter guarantees satisfied?

Choy and Sing’s filter algorithm Filter #1 Filter #2 Filter #i

Choy and Sing’s filter algorithm (cont’d) Shared: typdef struct{integer turn, boolean b,c initially false} filter filter A[log n + 1] Program for process i 1.curr := 0 2.A[curr].turn := p 3.Await  A[curr].b 4.if (A[curr]. turn ≠ i) 5. A[curr].c := true // mark that some process failed on filter 6. A[curr].b := false 7. return not-elected 8.else if (curr > 0)   A[curr-1].c 9. return elected // Other processes will never reach this filter 10.Else 11. curr := curr+1 Do you see any problem with this algorithm?

Choy and Sing’s filter algorithm (cont’d) What is the DSM RMR complexity? What is the CC RMR complexity? What is the worst-case average (CC) RMR complexity?

Is there an O(1) RMRs leader election algorithm from reads and writes? Yes [Golab, Hendler and Woelfel, 2006] Conditional primitives (e.g. compare-and-swap) are no stronger than reads & writes for RMR complexity [Golab, Hadzilacos, Hendler and Woelfel, 2007]