Download presentation
Presentation is loading. Please wait.
Published byMagdalene Hancock Modified over 9 years ago
1
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
2
Art of Multiprocessor Programming2 Memory Models Memory Contention Communication Contention Communication Latency Cache Coherent (CC) memory Distributed Shared Memory (DSM)
3
Art of Multiprocessor Programming3 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware
4
Remote Access Remote access is expensive! Allow spinning only on local variables: –DSM: spin only on variables in the local memory –CC: spin only on variables in cache 4
5
Art of Multiprocessor Programming5 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...
6
Art of Multiprocessor Programming6 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention – no local spinning!
7
Art of Multiprocessor Programming7 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others
8
Art of Multiprocessor Programming8 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
9
Art of Multiprocessor Programming9 Anderson Queue Lock Good –Local spinning (CC model) –Simple, easy to implement Bad –One bit per thread Unknown number of threads? Small number of actual contenders?
10
Art of Multiprocessor Programming10 CLH Lock FIFO order Small, constant-size overhead per thread
11
Art of Multiprocessor Programming11 Initially false tail idle
12
Art of Multiprocessor Programming12 Green Wants the Lock false tail acquiring
13
Art of Multiprocessor Programming13 Green Wants the Lock false tail acquiring true
14
Art of Multiprocessor Programming14 Green Wants the Lock false tail acquiring true Swap
15
Art of Multiprocessor Programming15 Green Has the Lock false tail acquired true
16
Art of Multiprocessor Programming16 Blue Wants the Lock false tail acquired acquiring true
17
Art of Multiprocessor Programming17 Blue Wants the Lock false tail acquired acquiring true Swap true
18
Art of Multiprocessor Programming18 Blue Wants the Lock false tail acquired acquiring true
19
Art of Multiprocessor Programming19 Blue Wants the Lock false tail acquired acquiring true
20
Art of Multiprocessor Programming20 Blue Wants the Lock false tail acquired acquiring true Implicitely Linked list
21
Art of Multiprocessor Programming21 Blue Wants the Lock false tail acquired acquiring true
22
Art of Multiprocessor Programming22 Blue Wants the Lock false tail acquired acquiring true Actually, it spins on cached copy
23
Art of Multiprocessor Programming23 Green Releases false tail release acquiring false true false Bingo!
24
Art of Multiprocessor Programming24 Green Releases tail released acquired true
25
CLH Queue Lock Entry section Exit section Art of Multiprocessor Programming25 new myNode myNode := true do myPred := tail while !CAS(tail,myPred,myNode) wait until !myPred new myNode myNode := true do myPred := tail while !CAS(tail,myPred,myNode) wait until !myPred myNode := false
26
Art of Multiprocessor Programming26 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Not local spinning for DSM model
27
Art of Multiprocessor Programming27 CLH Lock Each thread spin’s on predecessor’s memory Could be far away …
28
Art of Multiprocessor Programming28 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead
29
Art of Multiprocessor Programming29 Initially tail false idle
30
Art of Multiprocessor Programming30 Acquiring false true acquiring (allocate Qnode) tail
31
Art of Multiprocessor Programming31 Acquiring tail true swap false acquiring
32
Art of Multiprocessor Programming32 Acquiring tail true false acquiring
33
Art of Multiprocessor Programming33 Acquired tail true acquired false
34
Art of Multiprocessor Programming34 Acquiring tail false acquired acquiring true swap
35
Art of Multiprocessor Programming35 Acquiring tail acquired acquiring true false
36
Art of Multiprocessor Programming36 Acquiring tail acquired acquiring true false
37
Art of Multiprocessor Programming37 Acquiring tail acquired acquiring true false
38
Art of Multiprocessor Programming38 Acquiring tail acquired acquiring true false
39
Art of Multiprocessor Programming39 Acquiring tail acquired acquiring true Yes!
40
MCS Queue Lock Entry section Exit section Art of Multiprocessor Programming40 new myNode do myPred := tail while !CAS(tail,myPred,myNode) If myPred!=null myNode.locked:= true myPred.next:= myNode wait until !(myPred.locked) new myNode do myPred := tail while !CAS(tail,myPred,myNode) If myPred!=null myNode.locked:= true myPred.next:= myNode wait until !(myPred.locked) If myNode.next == null if CAS(tail,myNode,null)then return wait until myNode.next!=null myNode.next.locked := false If myNode.next == null if CAS(tail,myNode,null)then return wait until myNode.next!=null myNode.next.locked := false
41
Art of Multiprocessor Programming41 Green Release false releasing swap false
42
Art of Multiprocessor Programming42 Green Release false releasing swap false By looking at the queue, I see another thread is active
43
Art of Multiprocessor Programming43 Green Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish
44
Art of Multiprocessor Programming44 Green Release false releasing prepare to spin true
45
Art of Multiprocessor Programming45 Green Release false releasing spinning true
46
Art of Multiprocessor Programming46 Green Release false releasing spinning false
47
Art of Multiprocessor Programming47 Green Release false releasing Acquired lock false
48
Non-Uniform Memory Architecture (NUMA) 48 memory
49
Non-Uniform Memory Architecture (NUMA) Today, many large scale modern multiprocessors are NUMA: –Clusters of processors with shared local memory –Access by a processor to the memory of its cluster two or more times faster than remote memory –Per cluster cache 49
50
Lock Bouncing 50 memory
51
Hierarchical Locks Encourage threads with high mutual memory locality to acquire the lock consecutively Reduce overall cache misses 51
52
Hierarchical CLH (HCLH) Lock Local queue per cluster Global queue to enter the critical section A local queue is added to the global queue with a single CAS 52 [Luchangco, Nussbaum and Shavit 2006]
53
HCLH Lock First, add the thread to the local queue If a thread is the first in the local queue, it is responsible for merging into the global queue 53
54
HCLH Lock 54 false Local tail acquiring
55
HCLH Lock 55 false Local tail acquiring cidtruefalse Successor_must_wait Tail_when_merged
56
HCLH Lock 56 false Local tail acquiring cid Swap truefalse Successor_must_wait Tail_when_merged
57
HCLH Lock 57 false Local tail cidtruefalse acquiring
58
HCLH Lock 58 false Local tail cidtruefalse acquiring cidtruefalse acquiring
59
HCLH Lock 59 false Local tail cidtruefalse acquiring cidtruefalse Swap acquiring
60
HCLH Lock 60 false Local tail cidtruefalse acquiring cidtruefalse acquiring
61
HCLH Lock 61 false Local tail cidtruefalse acquiring cidtruefalse acquiring
62
HCLH Lock 62 false Local tail cidtruefalsecidtruefalse
63
HCLH Lock 63 false Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail Cluster master: sees lock is held, so waits a “combining delay”
64
HCLH Lock 64 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail Cluster master: sees lock is held, so waits a “combining delay”
65
HCLH Lock 65 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE SWAP Global tail
66
HCLH Lock 66 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail
67
HCLH Lock 67 Local tail cidtruefalse cidtrueTRUE false cidtruefalse cidtrueTRUE Global tail
68
References Spin, Anderson, CLH, MCS Locks: “The Art of Multiprocessor Programming”, Herlihy and Shavit, Chapter 7. HCLH Lock: “A Hierarchical CLH Queue Lock”, Luchangco, Nussbaum and Shavit, Euro-Par 2006. 68
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.