Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman
Remote and local memory accesses In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.
Local-spin algorithms In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other. For local-spin algorithms, our complexity metric is the worst-case number of Remote Memory References (RMRs)
Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[1]:=false Is this algorithm local-spin on a DSM machine? No Is this algorithm local-spin on a CC machine? Yes
Recall the following simple test-and-set based algorithm Shared lock initially 0 1.While (! lock.test-and-set() ) // entry section 2.Critical Section 3.Lock := 0 // exit section This algorithm is not local-spin on neither a DSM or CC machine (A RMW operation always incurs an RMR)
A better algorithm: test-and-test-and-set Shared lock initially 0 1.While (! lock.test-and-set() )// entry section 2. await(lock == 0) 3.Critical Section 4.Lock := 0 // exit section Creates less traffic in CC machines, still not local-spin.
Local Spinning Mutual Exclusion Using Strong Primitives
Anderson’s queue-based algorithm (Anderson, 1990) Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i {1,..,n-1} Local: integer myTicket Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor 0123n-1 valid ticket
Anderson’s queue-based algorithm (cont’d) 0 ticket valid Initial configuration 1 ticket valid After entry section of p 3 0 myTicket 3 After p 1 performs entry section 2 ticket valid myTicket 3 1 myTicket 1 2 ticket valid After p 3 exits 1 myTicket 1
Anderson’s queue-based algorithm (cont’d) What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor
The MCS queue-based algorithm (Mellor-Crummey and Scott, 1991) Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially null Local: Qnode *myNode, initially &nodes[i] Qnode *successor Has constant RMR complexity under both the DSM and CC models Uses swap and CAS Tail nodes n-1 n FTT
The MCS queue-based algorithm (cont’d) Program for process i 1.myNode->next := null; prepare to be last in queue 2.pred=swap(&tail, myNode ) ;tail now points to myNode 3.if (pred ≠ null) ;I need to wait for a predecessor 4. myNode->locked := true ;prepare to wait 5. pred->next := myNode ;let my predecessor know it has to unlock me 6. await myNode.locked := false 7.CS 8.if (myNode.next = null) ; if not sure there is a successor 9. if (compare-and-swap(&tail, myNode, null) = false) ; if there is a successor 10. await (myNode->next ≠ null) ; spin until successor lets me know its identity 11. successor := myNode->next ; get a pointer to my successor 12. successor->locked := false ; unlock my successor 13.else ; for sure, I have a successor 14. successor := myNode->next ; get a pointer to my successor 15. successor->locked := false ; unlock my successor
The MCS queue-based algorithm (cont’d)
Local Spinning Mutual Exclusion Using reads and writes
A local-spin tournament-tree algorithm (Anderson, Yang, 1993) O(log n) RMR complexity for both DSM and CC systems This is optimal (Attiya, Hendler, woelfel, 2008) Uses O(n log n) registers Level 0 Level 1 Level 2 Processes Each node is identified by (level, number)
A local-spin tournament-tree algorithm (cont’d) Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node] - Per each level l and process i, a spin flag: flag[ level, i ] initially 0 Local : level, node, id
A local-spin tournament-tree algorithm (cont’d) Program for process i 1.node:=i 2.For level = o to log n-1 do ;from leaf to root 3. node:= node/2 ;compute node in new level 4. id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1) 5. name[level, 2node + id]:=i ;identify yourself 6. turn[level,node]:=i ;update the tie-breaker 7. flag[level, i]:=0 ;initialize my locally-accessible spin flag 8. rival:=name[level, 2node+1-id] 9. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival 10. if (flag[level, rival] =0) If rival may get to wait at line flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker 12. await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-breaker) 13. if (turn[level,node]=i) ;if I lost 14. await flag[level,i]=2 ;wait till rival notifies me its my turn 15. id:=node ;move to the next level 16.EndFor 17.CS 18.for level=log n –1 downto 0 do ;begin exit code 19. id:= i/2 level, node:= id/2 ;set node and id 20. name[level, 2node+id ]) :=-1 ;erase name 21. rival := turn[level,node] ;find who rival is (if there is one) 22. if rival ≠ i ;if there is a rival 23. flag[level,rival] :=2 ;notify rival
Local-Spin Leader Election Exactly one process is elected All other processes are not-elected Processes may busy-wait
Choy and Sing's filter Filter m processes The rest are “halted” Between 1 and m/2 processes “exit “ Filter guarantees: Safety: if m processes enter a filter, at most m/2 exit. Progress: if some processes enter a filter, at least one exits.
Choy and Singh's filter (cont’d) Shared: integer turn Boolean b, initially false Program for process i 1.turn := i 2.await b // wait for barrier to open 3.b := true // close barrier 4.if turn ≠ i // not last to cross the barrier 5. b := false // open barrier 6. halt 7.else 8. exit Why are filter guarantees satisfied? Why does the barrier has to be re-opened?
Choy and Sing’s filter algorithm Filter #1 Filter #2 Filter #i
Choy and Sing’s filter algorithm (cont’d) Shared: typdef struct{integer turn, boolean b,c initially false} filter filter A[log n + 1] Program for process i 1.For (curr=0; cur < log n +1; curr++) 2. A[curr].turn := p 3. Await A[curr].b 4. A[curr].b:=true 5. if (A[curr]. turn ≠ i) 6. A[curr].c := true // mark that some process failed on filter 7. A[curr].b := false 8. return not-elected 9. else if (curr > 0) A[curr-1].c 10. return elected // Other processes will never exit this filter 11. else 12. curr := curr+1 13.EndFor Do you see any problem with this algorithm? How can this be fixed?
Choy and Sing’s filter algorithm (cont’d) What is the DSM RMR complexity? Unbounded Program for process i 1.For (curr=0; cur < log n +1; curr++) 2. A[curr].turn := p 3. Await A[curr].b 4. A[curr].b:=true 5. if (A[curr]. turn ≠ i) 6. A[curr].c := true // mark that some process failed on filter 7. A[curr].b := false 8. return not-elected 9. else if (curr > 0) A[curr-1].c 10. return elected // Other processes will never reach this filter 11. Else 12. curr := curr+1 13.EndFor
Choy and Sing’s filter algorithm (cont’d) What is the CC RMR complexity? Program for process i 1.For (curr=0; cur < log n +1; curr++) 2. A[curr].turn := p 3. Await A[curr].b 4. A[curr].b:=true 5. if (A[curr]. turn ≠ i) 6. A[curr].c := true // mark that some process failed on filter 7. A[curr].b := false 8. return not-elected 9. else if (curr > 0) A[curr-1].c 10. return elected // Other processes will never reach this filter 11. Else 12. curr := curr+1 13.EndFor
Choy and Sing’s filter algorithm (cont’d) What is the CC RMR complexity? Program for process i 1.For (curr=0; cur < log n +1; curr++) 2. A[curr].turn := p 3. Await A[curr].b 4. A[curr].b:=true 5. if (A[curr]. turn ≠ i) 6. A[curr].c := true // mark that some process failed on filter 7. A[curr].b := false 8. return not-elected 9. else if (curr > 0) A[curr-1].c 10. return elected // Other processes will never reach this filter 11. Else 12. curr := curr+1 13.EndFor A process may incur here a linear number of RMRs
What is the worst-case CC RMR complexity? Choy and Sing’s filter algorithm (cont’d) Linear Any ideas for a (log n)-RMRs algorithm? A simple modification of the tournament-tree algorithm
Is there an O(1) RMRs leader election algorithm from reads and writes? Yes [Golab, Hendler and Woelfel, 2006] Conditional primitives (e.g. compare-and-swap) are no stronger than reads & writes for RMR complexity [Golab, Hadzilacos, Hendler and Woelfel, 2007]