Download presentation
Presentation is loading. Please wait.
Published byRaquel Kimberley Modified over 9 years ago
1
Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007
2
© Herlihy-Shavit 20072 Focus so far: Correctness Models –Accurate (we never lied to you) –But idealized (so we forgot to mention a few things) Protocols –Elegant –Important –But naïve
3
© Herlihy-Shavit 20073 New Focus: Performance Models –More complicated (not the same as complex!) –Still focus on principles (not soon obsolete) Protocols –Elegant (in their fashion) –Important (why else would we pay attention) –And realistic (your mileage may vary)
4
© Herlihy-Shavit 20074 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data.
5
© Herlihy-Shavit 20075 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data. Our space (1)
6
© Herlihy-Shavit 20076 MIMD Architectures Memory Contention Communication Contention Communication Latency Shared Bus memory Distributed
7
© Herlihy-Shavit 20077 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multipprocessor machines hardware And get to know a collection of locking algorithms… (1)
8
© Herlihy-Shavit 20078 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)
9
© Herlihy-Shavit 20079 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus
10
© Herlihy-Shavit 200710 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...
11
© Herlihy-Shavit 200711 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequntial bottleneck
12
© Herlihy-Shavit 200712 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention
13
© Herlihy-Shavit 200713 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Notice: these are two different phenomenon …lock suffers from contention Seq Bottleneck no parallelism
14
© Herlihy-Shavit 200714 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Contention ??? …lock suffers from contention
15
© Herlihy-Shavit 200715 Review: Test-and-Set Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”
16
© Herlihy-Shavit 200716 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)
17
© Herlihy-Shavit 200717 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic
18
© Herlihy-Shavit 200718 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values
19
© Herlihy-Shavit 200719 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)
20
© Herlihy-Shavit 200720 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS
21
© Herlihy-Shavit 200721 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false
22
© Herlihy-Shavit 200722 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
23
© Herlihy-Shavit 200723 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean
24
© Herlihy-Shavit 200724 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired
25
© Herlihy-Shavit 200725 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false
26
© Herlihy-Shavit 200726 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the (n) lower bound? We used a RMW operation…
27
© Herlihy-Shavit 200727 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?
28
© Herlihy-Shavit 200728 Graph ideal time threads no speedup because of sequential bottleneck
29
© Herlihy-Shavit 200729 Mystery #1 time threads TAS lock Ideal (1) What is going on? Lets try and fix it…
30
© Herlihy-Shavit 200730 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking
31
© Herlihy-Shavit 200731 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }
32
© Herlihy-Shavit 200732 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Wait until lock looks free
33
© Herlihy-Shavit 200733 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it
34
© Herlihy-Shavit 200734 Mystery #2 TAS lock TTAS lock Ideal time threads
35
© Herlihy-Shavit 200735 Mystery Both –TAS and TTAS –Do the same thing (in our model) Except that –TTAS performs much better than TAS –Neither approaches ideal
36
© Herlihy-Shavit 200736 Opinion Our memory abstraction is broken TAS & TTAS methods –Are provably the same (in our model) –Except they aren’t (in field tests) Need a more detailed model …
37
© Herlihy-Shavit 200737 Bus-Based Architectures Bus cache memory cache
38
© Herlihy-Shavit 200738 Bus-Based Architectures Bus cache memory cache Random access memory (10s of cycles)
39
© Herlihy-Shavit 200739 Bus-Based Architectures cache memory cache Shared Bus broadcast medium One broadcaster at a time Processors and memory all “snoop” Bus
40
© Herlihy-Shavit 200740 Bus-Based Architectures Bus cache memory cache Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information
41
© Herlihy-Shavit 200741 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™
42
© Herlihy-Shavit 200742 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™ Cache miss –“I had to shlep all the way to memory for that data” –Bad Thing™
43
© Herlihy-Shavit 200743 Cave Canem This model is still a simplification –But not in any essential way –Illustrates basic principles Will discuss complexities later
44
© Herlihy-Shavit 200744 Bus Processor Issues Load Request cache memory cache data
45
© Herlihy-Shavit 200745 Bus Processor Issues Load Request Bus cache memory cache data Gimme data
46
© Herlihy-Shavit 200746 cache Bus Memory Responds Bus memory cache data Got your data right here data
47
© Herlihy-Shavit 200747 Bus Processor Issues Load Request memory cache data Gimme data
48
© Herlihy-Shavit 200748 Bus Processor Issues Load Request Bus memory cache data Gimme data
49
© Herlihy-Shavit 200749 Bus Processor Issues Load Request Bus memory cache data I got data
50
© Herlihy-Shavit 200750 Bus Other Processor Responds memory cache data I got data data Bus
51
© Herlihy-Shavit 200751 Bus Other Processor Responds memory cache data Bus
52
© Herlihy-Shavit 200752 Modify Cached Data Bus data memory cachedata (1)
53
© Herlihy-Shavit 200753 Modify Cached Data Bus data memory cachedata (1)
54
© Herlihy-Shavit 200754 memory Bus data Modify Cached Data cachedata
55
© Herlihy-Shavit 200755 memory Bus data Modify Cached Data cache What’s up with the other copies? data
56
© Herlihy-Shavit 200756 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?
57
© Herlihy-Shavit 200757 Write-Back Caches Accumulate changes in cache Write back when needed –Need the cache for something else –Another processor wants it On first modification –Invalidate other entries –Requires non-trivial protocol …
58
© Herlihy-Shavit 200758 Write-Back Caches Cache entry has three states –Invalid: contains raw seething bits –Valid: I can read but I can’t write –Dirty: Data has been modified Intercept other load requests Write back to memory before using cache
59
© Herlihy-Shavit 200759 Bus Invalidate memory cachedata
60
© Herlihy-Shavit 200760 Bus Invalidate Bus memory cachedata Mine, all mine!
61
© Herlihy-Shavit 200761 Bus Invalidate Bus memory cachedata cache Uh,oh
62
© Herlihy-Shavit 200762 cache Bus Invalidate memory cachedata Other caches lose read permission
63
© Herlihy-Shavit 200763 cache Bus Invalidate memory cachedata Other caches lose read permission This cache acquires write permission
64
© Herlihy-Shavit 200764 cache Bus Invalidate memory cachedata Memory provides data only if not present in any cache, so no need to change it now (expensive) (2)
65
© Herlihy-Shavit 200765 cache Bus Another Processor Asks for Data memory cachedata (2) Bus
66
© Herlihy-Shavit 200766 cache data Bus Owner Responds memory cachedata (2) Bus Here it is!
67
© Herlihy-Shavit 200767 Bus End of the Day … memory cachedata (1) Reading OK, no writing data
68
© Herlihy-Shavit 200768 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock
69
© Herlihy-Shavit 200769 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners
70
© Herlihy-Shavit 200770 Test-and-test-and-set Wait until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …
71
© Herlihy-Shavit 200771 Local Spinning while Lock is Busy Bus memory busy
72
© Herlihy-Shavit 200772 Bus On Release memory freeinvalid free
73
© Herlihy-Shavit 200773 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)
74
© Herlihy-Shavit 200774 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)
75
© Herlihy-Shavit 200775 Problems Everyone misses –Reads satisfied sequentially Everyone does TAS –Invalidates others’ caches Eventually quiesces after lock acquired –How long does this take?
76
© Herlihy-Shavit 200776 Measuring Quiescence Time P1P1 P2P2 PnPn X = time of ops that don’t use the bus Y = time of ops that cause intensive bus traffic In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance. By gradually varying X, can determine the exact time to quiesce.
77
© Herlihy-Shavit 200777 Quiescence Time Increses linearly with the number of processors for bus architecture time threads
78
© Herlihy-Shavit 200778 Mystery Explained TAS lock TTAS lock Ideal time threads Better than TAS but still not as good as ideal
79
© Herlihy-Shavit 200779 Solution: Introduce Delay spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again
80
© Herlihy-Shavit 200780 Dynamic Example: Exponential Backoff time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait
81
© Herlihy-Shavit 200781 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
82
© Herlihy-Shavit 200782 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
83
© Herlihy-Shavit 200783 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
84
© Herlihy-Shavit 200784 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
85
© Herlihy-Shavit 200785 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration
86
© Herlihy-Shavit 200786 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason
87
© Herlihy-Shavit 200787 Spin-Waiting Overhead TTAS Lock Backoff lock time threads
88
© Herlihy-Shavit 200788 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms
89
© Herlihy-Shavit 200789 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others
90
© Herlihy-Shavit 200790 Anderson Queue Lock flags next TFFFFFFF idle
91
© Herlihy-Shavit 200791 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
92
© Herlihy-Shavit 200792 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
93
© Herlihy-Shavit 200793 Anderson Queue Lock flags next TFFFFFFF acquired Mine!
94
© Herlihy-Shavit 200794 Anderson Queue Lock flags next TFFFFFFF acquired acquiring
95
© Herlihy-Shavit 200795 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
96
© Herlihy-Shavit 200796 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
97
© Herlihy-Shavit 200797 acquired Anderson Queue Lock flags next TFFFFFFF acquiring
98
© Herlihy-Shavit 200798 released Anderson Queue Lock flags next TTFFFFFF acquired
99
© Herlihy-Shavit 200799 released Anderson Queue Lock flags next TTFFFFFF acquired Yow!
100
© Herlihy-Shavit 2007100 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];
101
© Herlihy-Shavit 2007101 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread
102
© Herlihy-Shavit 2007102 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use
103
© Herlihy-Shavit 2007103 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; Thread-local variable
104
© Herlihy-Shavit 2007104 Anderson Queue Lock public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; }
105
© Herlihy-Shavit 2007105 Anderson Queue Lock public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Take next slot
106
© Herlihy-Shavit 2007106 Anderson Queue Lock public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Spin until told to go
107
© Herlihy-Shavit 2007107 Anderson Queue Lock public lock() { int slot[i]=next.getAndIncrement(); while (!flags[slot[i] % n]) {}; flags[slot[i] % n] = false; } public unlock() { flags[slot[i]+1 % n] = true; } Prepare slot for re-use
108
© Herlihy-Shavit 2007108 Anderson Queue Lock public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Tell next thread to go
109
© Herlihy-Shavit 2007109 Performance Curve is practically flat Scalable performance FIFO fairness queue TTAS
110
© Herlihy-Shavit 2007110 Anderson Queue Lock Good –First truly scalable lock –Simple, easy to implement Bad –Space hog –One bit per thread Unknown number of threads? Small number of actual contenders?
111
© Herlihy-Shavit 2007111 CLH Lock FIFO order Small, constant-size overhead per thread
112
© Herlihy-Shavit 2007112 Initially false tail idle
113
© Herlihy-Shavit 2007113 Initially false tail idle Queue tail
114
© Herlihy-Shavit 2007114 Initially false tail idle Lock is free
115
© Herlihy-Shavit 2007115 Initially false tail idle
116
© Herlihy-Shavit 2007116 Purple Wants the Lock false tail acquiring
117
© Herlihy-Shavit 2007117 Purple Wants the Lock false tail acquiring true
118
© Herlihy-Shavit 2007118 Purple Wants the Lock false tail acquiring true Swap
119
© Herlihy-Shavit 2007119 Purple Has the Lock false tail acquired true
120
© Herlihy-Shavit 2007120 Red Wants the Lock false tail acquired acquiring true
121
© Herlihy-Shavit 2007121 Red Wants the Lock false tail acquired acquiring true Swap true
122
© Herlihy-Shavit 2007122 Red Wants the Lock false tail acquired acquiring true
123
© Herlihy-Shavit 2007123 Red Wants the Lock false tail acquired acquiring true
124
© Herlihy-Shavit 2007124 Red Wants the Lock false tail acquired acquiring true Actually, it spins on cached copy
125
© Herlihy-Shavit 2007125 Purple Releases false tail release acquiring false true false Bingo!
126
© Herlihy-Shavit 2007126 Purple Releases tail released acquired true
127
© Herlihy-Shavit 2007127 Space Usage Let –L = number of locks –N = number of threads ALock –O(LN) CLH lock –O(L+N)
128
© Herlihy-Shavit 2007128 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }
129
© Herlihy-Shavit 2007129 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } Not released yet
130
© Herlihy-Shavit 2007130 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }} (3)
131
© Herlihy-Shavit 2007131 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }} (3) Tail of the queue
132
© Herlihy-Shavit 2007132 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }} (3) Thread-local Qnode
133
© Herlihy-Shavit 2007133 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }} (3) Swap in my node
134
© Herlihy-Shavit 2007134 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }} (3) Spin until predecessor releases lock
135
© Herlihy-Shavit 2007135 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3)
136
© Herlihy-Shavit 2007136 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3) Notify successor
137
© Herlihy-Shavit 2007137 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } (3) Recycle predecessor’s node
138
© Herlihy-Shavit 2007138 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Doesn’t work for uncached NUMA architectures
139
© Herlihy-Shavit 2007139 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others
140
© Herlihy-Shavit 2007140 NUMA Machines Spinning on local memory is fast
141
© Herlihy-Shavit 2007141 NUMA Machines Spinning on remote memory is slow
142
© Herlihy-Shavit 2007142 CLH Lock Each thread spin’s on predecessor’s memory Could be far away …
143
© Herlihy-Shavit 2007143 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead
144
© Herlihy-Shavit 2007144 Initially false tail false idle
145
© Herlihy-Shavit 2007145 Acquiring false queue false true acquiring (allocate Qnode)
146
© Herlihy-Shavit 2007146 Acquiring false tail false true acquired swap
147
© Herlihy-Shavit 2007147 Acquiring false tail false true acquired
148
© Herlihy-Shavit 2007148 Acquired false tail false true acquired
149
© Herlihy-Shavit 2007149 Acquiring tail true acquired acquiring true swap
150
© Herlihy-Shavit 2007150 Acquiring tail acquired acquiring true
151
© Herlihy-Shavit 2007151 Acquiring tail acquired acquiring true
152
© Herlihy-Shavit 2007152 Acquiring tail acquired acquiring true
153
© Herlihy-Shavit 2007153 Acquiring tail acquired acquiring true false
154
© Herlihy-Shavit 2007154 Acquiring tail acquired acquiring true Yes! false
155
© Herlihy-Shavit 2007155 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; }
156
© Herlihy-Shavit 2007156 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3)
157
© Herlihy-Shavit 2007157 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Make a QNode
158
© Herlihy-Shavit 2007158 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) add my Node to the tail of queue
159
© Herlihy-Shavit 2007159 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Fix if queue was non-empty
160
© Herlihy-Shavit 2007160 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Wait until unlocked
161
© Herlihy-Shavit 2007161 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3)
162
© Herlihy-Shavit 2007162 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Missing successor?
163
© Herlihy-Shavit 2007163 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) If really no successor, return
164
© Herlihy-Shavit 2007164 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Otherwise wait for successor to catch up
165
© Herlihy-Shavit 2007165 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Pass lock to successor
166
© Herlihy-Shavit 2007166 Purple Release false releasing swap false (2)
167
© Herlihy-Shavit 2007167 Purple Release false releasing swap false By looking at the queue, I see another thread is active (2)
168
© Herlihy-Shavit 2007168 Purple Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish (2)
169
© Herlihy-Shavit 2007169 Purple Release false releasing spinning true
170
© Herlihy-Shavit 2007170 Purple Release false releasing spinning true false
171
© Herlihy-Shavit 2007171 Purple Release false releasing spinning true locked false
172
© Herlihy-Shavit 2007172 Abortable Locks What if you want to give up waiting for a lock? For example –Timeout –Database transaction aborted by user
173
© Herlihy-Shavit 2007173 Back-off Lock Aborting is trivial –Just return from lock() call Extra benefit: –No cleaning up –Wait-free –Immediate return
174
© Herlihy-Shavit 2007174 Queue Locks Can’t just quit –Thread in line behind will starve Need a graceful way out
175
© Herlihy-Shavit 2007175 Queue Locks spinning true spinning true spinning
176
© Herlihy-Shavit 2007176 Queue Locks spinning true spinning true false locked
177
© Herlihy-Shavit 2007177 Queue Locks spinning true locked false
178
© Herlihy-Shavit 2007178 Queue Locks locked false
179
© Herlihy-Shavit 2007179 Queue Locks spinning true spinning true spinning
180
© Herlihy-Shavit 2007180 Queue Locks spinning true spinning
181
© Herlihy-Shavit 2007181 Queue Locks spinning true false locked
182
© Herlihy-Shavit 2007182 Queue Locks spinning true false
183
© Herlihy-Shavit 2007183 Queue Locks pwned true false
184
© Herlihy-Shavit 2007184 Abortable CLH Lock When a thread gives up –Removing node in a wait-free way is hard Idea: –let successor deal with it.
185
© Herlihy-Shavit 2007185 Initially tail idle Pointer to predecessor (or null) A
186
© Herlihy-Shavit 2007186 Initially tail idle Distinguished available node means lock is free A
187
© Herlihy-Shavit 2007187 Acquiring tail acquiring A
188
© Herlihy-Shavit 2007188 Acquiring acquiring A Null predecessor means lock not released or aborted
189
© Herlihy-Shavit 2007189 Acquiring acquiring A Swap
190
© Herlihy-Shavit 2007190 Acquiring acquiring A
191
© Herlihy-Shavit 2007191 Acquired locked A Pointer to AVAILABLE means lock is free.
192
© Herlihy-Shavit 2007192 Normal Case spinning locked Null means lock is not free & request not aborted
193
© Herlihy-Shavit 2007193 One Thread Aborts spinning Timed out locked
194
© Herlihy-Shavit 2007194 Successor Notices spinning Timed out locked Non-Null means predecessor aborted
195
© Herlihy-Shavit 2007195 Recycle Predecessor’s Node spinning locked
196
© Herlihy-Shavit 2007196 Spin on Earlier Node spinning locked
197
© Herlihy-Shavit 2007197 Spin on Earlier Node spinning released A The lock is now mine
198
© Herlihy-Shavit 2007198 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference tail; ThreadLocal myNode; (3)
199
© Herlihy-Shavit 2007199 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference tail; ThreadLocal myNode; (3) Distinguished node to signify free lock
200
© Herlihy-Shavit 2007200 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference tail; ThreadLocal myNode; (3) Tail of the queue
201
© Herlihy-Shavit 2007201 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference tail; ThreadLocal myNode; (3) Remember my node …
202
© Herlihy-Shavit 2007202 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; } … (3)
203
© Herlihy-Shavit 2007203 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; } (3) Create & initialize node
204
© Herlihy-Shavit 2007204 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; } (3) Swap with tail
205
© Herlihy-Shavit 2007205 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }... (3) If predecessor absent or released, we are done
206
© Herlihy-Shavit 2007206 Time-out Lock … long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } … (3)
207
© Herlihy-Shavit 2007207 Time-out Lock … long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } … (3) Keep trying for a while …
208
© Herlihy-Shavit 2007208 Time-out Lock … long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } … (3) Spin on predecessor’s prev field
209
© Herlihy-Shavit 2007209 Time-out Lock … long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } … (3) Predecessor released lock
210
© Herlihy-Shavit 2007210 Time-out Lock … long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } … (3) Predecessor aborted, advance one
211
© Herlihy-Shavit 2007211 Time-out Lock … if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; } (3)
212
© Herlihy-Shavit 2007212 Time-out Lock … if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; } (3) Try to put predecessor back
213
© Herlihy-Shavit 2007213 Time-out Lock … if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; } (3) Otherwise, if it’s too late, redirect to predecessor
214
© Herlihy-Shavit 2007214 Time-out Lock public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } (3)
215
© Herlihy-Shavit 2007215 Time-out Lock public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } (3) Clean up if no one waiting
216
© Herlihy-Shavit 2007216 Time-out Lock public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } (3) Notify successor by pointing to AVAILABLE node
217
© Herlihy-Shavit 2007217 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: – the application – the hardware – which properties are important
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.