Download presentation
Presentation is loading. Please wait.
Published byCory Montgomery Modified over 9 years ago
1
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640, University of Pennsylvania
2
Art of Multiprocessor Programming 2 Muddy children’s puzzle (Common Knowledge) A group of kids are playing. A stranger walks by and announces “Some of you have mud on your forehead Each kid can see everyone else’s forehead, but not his/her own (and they don’t talk to one another) Stranger says “Raise your hand if you conclude that you have mud on your forehead”. Nobody does. Stranger keeps on repeating the statement. If k kids have muddy foreheads, then exactly these k kids raise their hands after the stranger repeats the statement exactly k times
3
Art of Multiprocessor Programming 3 Muddy children’s puzzle Why does this happen? For every k: –If >=k kids have muddy foreheads, then in the first k-1 rounds nobody raises hands –If k kids have muddy foreheads, then in the k-th round, exactly muddy kids raise their hands This claim can be proved by induction on k –Base case k=1 –Inductive case (assume for k, and prove for k+1)
4
Art of Multiprocessor Programming 4 What is the role of stranger’s statement? Let p stand for “> 0 kids have muddy foreheads” Assuming >1 kids are muddy, stranger announcing p does not add to anyone’s information However, without stranger’s announcement, nobody will ever raise their hands So what’s going on Well, the base case for our proof fails, but exactly what information do kids acquire from the stranger’s announcement?
5
Art of Multiprocessor Programming 5 Common Knowledge E p : Everybody knows p E E p: Everybody knows that everybody knows p E k p : defined similarly (k repetitions) C p : p is “common knowledge”: limit of Everybody knows that everybody knows …. For k =2, each kid knows p, but not Ep, and after stranger’s announcement, each kid knows E p If k kids are muddy, before announcement, each kid knows E k-1 p, but not E k p Stranger makes p the common knowledge
6
Art of Multiprocessor Programming6 Mutual Exclusion Focus so far: Correctness Models –Accurate –But idealized Protocols –Elegant –Important –But used in practice
7
Art of Multiprocessor Programming7 New Focus: Performance Models –More complicated –Still focus on principles Protocols –Elegant –Important –And realistic
8
Art of Multiprocessor Programming8 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data.
9
Art of Multiprocessor Programming9 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data. Our space (1)
10
Art of Multiprocessor Programming10 MIMD Architectures Memory Contention Communication Contention Communication Latency Shared Bus memory Distributed
11
Art of Multiprocessor Programming11 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware And get to know a collection of locking algorithms… (1)
12
Art of Multiprocessor Programming12 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)
13
Art of Multiprocessor Programming13 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus
14
Art of Multiprocessor Programming14 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...
15
Art of Multiprocessor Programming15 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequential bottleneck
16
Art of Multiprocessor Programming16 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention
17
Art of Multiprocessor Programming17 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Notice: these are distinct phenomena …lock suffers from contention
18
Art of Multiprocessor Programming18 Test-and-Set Primitive Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”
19
Art of Multiprocessor Programming19 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)
20
Art of Multiprocessor Programming20 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic
21
Art of Multiprocessor Programming21 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values
22
Art of Multiprocessor Programming22 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)
23
Art of Multiprocessor Programming23 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS
24
Art of Multiprocessor Programming24 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false
25
Art of Multiprocessor Programming25 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
26
Art of Multiprocessor Programming26 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean
27
Art of Multiprocessor Programming27 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired
28
Art of Multiprocessor Programming28 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false
29
Art of Multiprocessor Programming29 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the (n) lower bound? We used a combined read-write operation…
30
Art of Multiprocessor Programming30 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?
31
Art of Multiprocessor Programming31 Mystery #1 time threads TAS lock Ideal (1) What is going on?
32
Art of Multiprocessor Programming32 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking
33
Art of Multiprocessor Programming33 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }
34
Art of Multiprocessor Programming34 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Wait until lock looks free
35
Art of Multiprocessor Programming35 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it
36
Art of Multiprocessor Programming36 Mystery #2 TAS lock TTAS lock Ideal time threads
37
Art of Multiprocessor Programming37 Mystery Both –TAS and TTAS –Do the same thing (in our model) Except that –TTAS performs much better than TAS –Neither approaches ideal
38
Art of Multiprocessor Programming38 Opinion Our memory abstraction is broken TAS & TTAS methods –Are provably the same (in our model) –Except they aren’t (in field tests) Need a more detailed model …
39
Art of Multiprocessor Programming39 Bus-Based Architectures Bus cache memory cache
40
Art of Multiprocessor Programming40 Bus-Based Architectures Bus cache memory cache Random access memory (10s of cycles)
41
Art of Multiprocessor Programming41 Bus-Based Architectures cache memory cache Shared Bus Broadcast medium One broadcaster at a time Processors and memory all “snoop” Bus
42
Art of Multiprocessor Programming42 Bus-Based Architectures Bus cache memory cache Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information
43
Art of Multiprocessor Programming43 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing Cache miss –“I had to go all the way to memory for that data” –Bad Thing
44
Art of Multiprocessor Programming44 Caveat This model is still a simplification –But not in any essential way –Illustrates basic principles Will discuss complexities later
45
Art of Multiprocessor Programming45 Bus Processor Issues Load Request cache memory cache data
46
Art of Multiprocessor Programming46 Bus Processor Issues Load Request Bus cache memory cache data Gimme data
47
Art of Multiprocessor Programming47 cache Bus Memory Responds Bus memory cache data Got your data right here data
48
Art of Multiprocessor Programming48 Bus Processor Issues Load Request memory cache data Gimme data
49
Art of Multiprocessor Programming49 Bus Processor Issues Load Request Bus memory cache data Gimme data
50
Art of Multiprocessor Programming50 Bus Processor Issues Load Request Bus memory cache data I got data
51
Art of Multiprocessor Programming51 Bus Other Processor Responds memory cache data I got data data Bus
52
Art of Multiprocessor Programming52 Bus Other Processor Responds memory cache data Bus
53
Art of Multiprocessor Programming53 Modify Cached Data Bus data memory cachedata (1)
54
Art of Multiprocessor Programming54 Modify Cached Data Bus data memory cachedata (1)
55
Art of Multiprocessor Programming55 memory Bus data Modify Cached Data cachedata
56
Art of Multiprocessor Programming56 memory Bus data Modify Cached Data cache What’s up with the other copies? data
57
Art of Multiprocessor Programming57 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?
58
Art of Multiprocessor Programming58 Write-Back Caches Accumulate changes in cache Write back when needed –Need the cache for something else –Another processor wants it On first modification –Invalidate other entries –Requires non-trivial protocol …
59
Art of Multiprocessor Programming59 Write-Back Caches Cache entry has three states –Invalid: contains raw seething bits –Valid: I can read but I can’t write –Dirty: Data has been modified Intercept other load requests Write back to memory before using cache
60
Art of Multiprocessor Programming60 Bus Invalidate memory cachedata
61
Art of Multiprocessor Programming61 Bus Invalidate Bus memory cachedata Mine, all mine!
62
Art of Multiprocessor Programming62 Bus Invalidate Bus memory cachedata cache Uh,oh
63
Art of Multiprocessor Programming63 cache Bus Invalidate memory cachedata Other caches lose read permission
64
Art of Multiprocessor Programming64 cache Bus Invalidate memory cachedata Other caches lose read permission This cache acquires write permission
65
Art of Multiprocessor Programming65 cache Bus Invalidate memory cachedata Memory provides data only if not present in any cache, so no need to change it now (expensive) (2)
66
Art of Multiprocessor Programming66 cache Bus Another Processor Asks for Data memory cachedata (2) Bus
67
Art of Multiprocessor Programming67 cache data Bus Owner Responds memory cachedata (2) Bus Here it is!
68
Art of Multiprocessor Programming68 Bus End of the Day … memory cachedata (1) Reading OK, no writing data
69
Art of Multiprocessor Programming69 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock
70
Art of Multiprocessor Programming70 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners
71
Art of Multiprocessor Programming71 Test-and-test-and-set Wait until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …
72
Art of Multiprocessor Programming72 Local Spinning while Lock is Busy Bus memory busy
73
Art of Multiprocessor Programming73 Bus On Release memory freeinvalid free
74
Art of Multiprocessor Programming74 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)
75
Art of Multiprocessor Programming75 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)
76
Art of Multiprocessor Programming76 Problems Everyone misses –Reads satisfied sequentially Everyone does TAS –Invalidates others’ caches Eventually quiesces after lock acquired –How long does this take?
77
Art of Multiprocessor Programming77 Mystery Explained TAS lock TTAS lock Ideal time threads Better than TAS but still not as good as ideal
78
Art of Multiprocessor Programming78 Solution: Introduce Delay spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be contention Better to back off than to collide again
79
Art of Multiprocessor Programming79 Dynamic Example: Exponential Backoff time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait
80
Art of Multiprocessor Programming80 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
81
Art of Multiprocessor Programming81 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
82
Art of Multiprocessor Programming82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
83
Art of Multiprocessor Programming83 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
84
Art of Multiprocessor Programming84 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration
85
Art of Multiprocessor Programming85 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason
86
Art of Multiprocessor Programming86 Spin-Waiting Overhead TTAS Lock Backoff lock time threads
87
Art of Multiprocessor Programming87 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms
88
Art of Multiprocessor Programming88 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others
89
Art of Multiprocessor Programming89 Anderson Queue Lock flags next TFFFFFFF idle
90
Art of Multiprocessor Programming90 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
91
Art of Multiprocessor Programming91 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
92
Art of Multiprocessor Programming92 Anderson Queue Lock flags next TFFFFFFF acquired Mine!
93
Art of Multiprocessor Programming93 Anderson Queue Lock flags next TFFFFFFF acquired acquiring
94
Art of Multiprocessor Programming94 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
95
Art of Multiprocessor Programming95 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
96
Art of Multiprocessor Programming96 acquired Anderson Queue Lock flags next TFFFFFFF acquiring
97
Art of Multiprocessor Programming97 released Anderson Queue Lock flags next TTFFFFFF acquired
98
Art of Multiprocessor Programming98 released Anderson Queue Lock flags next TTFFFFFF acquired Yow!
99
Art of Multiprocessor Programming99 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot;
100
Art of Multiprocessor Programming100 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; One flag per thread
101
Art of Multiprocessor Programming101 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; Next flag to use
102
Art of Multiprocessor Programming102 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; Thread-local variable
103
Art of Multiprocessor Programming103 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; }
104
Art of Multiprocessor Programming104 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Take next slot
105
Art of Multiprocessor Programming105 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Spin until told to go
106
Art of Multiprocessor Programming106 Anderson Queue Lock public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for re-use
107
Art of Multiprocessor Programming107 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Tell next thread to go
108
Art of Multiprocessor Programming108 Performance Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness queue TTAS
109
Art of Multiprocessor Programming109 Anderson Queue Lock Good –First truly scalable lock –Simple, easy to implement Bad –Space hog –One bit per thread Unknown number of threads? Small number of actual contenders?
110
Art of Multiprocessor Programming110 CLH Lock FIFO order Small, constant-size overhead per thread
111
Art of Multiprocessor Programming111 Initially false tail idle
112
Art of Multiprocessor Programming112 Initially false tail idle Queue tail
113
Art of Multiprocessor Programming113 Initially false tail idle Lock is free
114
Art of Multiprocessor Programming114 Initially false tail idle
115
Art of Multiprocessor Programming115 Purple Wants the Lock false tail acquiring
116
Art of Multiprocessor Programming116 Purple Wants the Lock false tail acquiring true
117
Art of Multiprocessor Programming117 Purple Wants the Lock false tail acquiring true Swap
118
Art of Multiprocessor Programming118 Purple Has the Lock false tail acquired true
119
Art of Multiprocessor Programming119 Red Wants the Lock false tail acquired acquiring true
120
Art of Multiprocessor Programming120 Red Wants the Lock false tail acquired acquiring true Swap true
121
Art of Multiprocessor Programming121 Red Wants the Lock false tail acquired acquiring true
122
Art of Multiprocessor Programming122 Red Wants the Lock false tail acquired acquiring true
123
Art of Multiprocessor Programming123 Red Wants the Lock false tail acquired acquiring true Implicit Linked list
124
Art of Multiprocessor Programming124 Red Wants the Lock false tail acquired acquiring true
125
Art of Multiprocessor Programming125 Red Wants the Lock false tail acquired acquiring true Actually, it spins on cached copy
126
Art of Multiprocessor Programming126 Purple Releases false tail release acquiring false true false Bingo!
127
Art of Multiprocessor Programming127 Purple Releases tail released acquired true
128
Art of Multiprocessor Programming128 Space Usage Let –L = number of locks –N = number of threads ALock –O(LN) CLH lock –O(L+N)
129
Art of Multiprocessor Programming129 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }
130
Art of Multiprocessor Programming130 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { myNode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} (3)
131
Art of Multiprocessor Programming131 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} (3) Queue tail
132
Art of Multiprocessor Programming132 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} (3) Thread-local Qnode
133
Art of Multiprocessor Programming133 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} (3) Swap in my node
134
Art of Multiprocessor Programming134 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} (3) Spin until predecessor releases lock
135
Art of Multiprocessor Programming135 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3)
136
Art of Multiprocessor Programming136 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3) Notify successor
137
Art of Multiprocessor Programming137 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3) Recycle predecessor’s node
138
Art of Multiprocessor Programming138 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3) (Code in book shows how it’s done using myPred reference.)
139
Art of Multiprocessor Programming139 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Doesn’t work for uncached NUMA architectures
140
Art of Multiprocessor Programming140 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others
141
Art of Multiprocessor Programming141 NUMA Machines Spinning on local memory is fast
142
Art of Multiprocessor Programming142 NUMA Machines Spinning on remote memory is slow
143
Art of Multiprocessor Programming143 CLH Lock Each thread spins on predecessor’s memory Could be far away …
144
Art of Multiprocessor Programming144 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead
145
Art of Multiprocessor Programming145 Initially false idle tail
146
Art of Multiprocessor Programming146 Acquiring false true acquiring (allocate Qnode) tail
147
Art of Multiprocessor Programming147 Acquiring false tail false true acquired swap
148
Art of Multiprocessor Programming148 Acquiring false tail false true acquired
149
Art of Multiprocessor Programming149 Acquired false tail false true acquired
150
Art of Multiprocessor Programming150 Acquiring tail false acquired acquiring true swap
151
Art of Multiprocessor Programming151 Acquiring tail acquired acquiring true false
152
Art of Multiprocessor Programming152 Acquiring tail acquired acquiring true false
153
Art of Multiprocessor Programming153 Acquiring tail acquired acquiring true false
154
Art of Multiprocessor Programming154 Acquiring tail acquired acquiring true false
155
Art of Multiprocessor Programming155 Acquiring tail acquired acquiring true Yes! false
156
Art of Multiprocessor Programming156 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; }
157
Art of Multiprocessor Programming157 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3)
158
Art of Multiprocessor Programming158 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Make a QNode
159
Art of Multiprocessor Programming159 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) add my Node to the tail of queue
160
Art of Multiprocessor Programming160 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Fix if queue was non-empty
161
Art of Multiprocessor Programming161 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Wait until unlocked
162
Art of Multiprocessor Programming162 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3)
163
Art of Multiprocessor Programming163 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Missing successor?
164
Art of Multiprocessor Programming164 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) If really no successor, return
165
Art of Multiprocessor Programming165 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Otherwise wait for successor to catch up
166
Art of Multiprocessor Programming166 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Pass lock to successor
167
Art of Multiprocessor Programming167 Purple Release false releasing swap false (2)
168
Art of Multiprocessor Programming168 Purple Release false releasing swap false By looking at the queue, I see another thread is active (2)
169
Art of Multiprocessor Programming169 Purple Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish (2)
170
Art of Multiprocessor Programming170 Purple Release false releasing prepare to spin true
171
Art of Multiprocessor Programming171 Purple Release false releasing spinning true
172
Art of Multiprocessor Programming172 Purple Release false releasing spinning true false
173
Art of Multiprocessor Programming173 Purple Release false releasing true Acquired lock false
174
Art of Multiprocessor Programming174 Abortable Locks What if you want to give up waiting for a lock? For example –Timeout –Database transaction aborted by user
175
Art of Multiprocessor Programming175 Back-off Lock Aborting is trivial –Just return from lock() call Extra benefit: –No cleaning up –Wait-free –Immediate return
176
Art of Multiprocessor Programming176 Queue Locks Can’t just quit –Thread in line behind will starve Need a graceful way out Timeout Queue Lock
177
Art of Multiprocessor Programming177 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: – the application – the hardware – which properties are important
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.