Download presentation
Presentation is loading. Please wait.
Published byBlaze Butler Modified over 8 years ago
1
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
2
Art of Multiprocessor Programming2 Focus so far: Correctness and Progress Models –Accurate (we never lied to you) –But idealized (so we forgot to mention a few things) Protocols –Elegant –Important –But naïve
3
Art of Multiprocessor Programming3 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware And get to know a collection of locking algorithms… (1)
4
Art of Multiprocessor Programming4 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)
5
Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus
6
Art of Multiprocessor Programming6 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...
7
Art of Multiprocessor Programming7 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequential bottleneck
8
Art of Multiprocessor Programming8 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention
9
Art of Multiprocessor Programming9 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Notice: these are distinct phenomena …lock suffers from contention
10
Art of Multiprocessor Programming10 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention Seq Bottleneck no parallelism
11
Art of Multiprocessor Programming11 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Contention ??? …lock suffers from contention
12
Art of Multiprocessor Programming12 Review: Test-and-Set Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”
13
Art of Multiprocessor Programming13 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)
14
Art of Multiprocessor Programming14 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic
15
Art of Multiprocessor Programming15 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values
16
Art of Multiprocessor Programming16 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)
17
Art of Multiprocessor Programming17 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS
18
Art of Multiprocessor Programming18 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false
19
Art of Multiprocessor Programming19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
20
Art of Multiprocessor Programming20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean
21
Art of Multiprocessor Programming21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired
22
Art of Multiprocessor Programming22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false
23
Art of Multiprocessor Programming23 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the (n) lower bound? We used a RMW operation…
24
Art of Multiprocessor Programming24 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?
25
Art of Multiprocessor Programming25 Graph ideal time threads no speedup because of sequential bottleneck better
26
Art of Multiprocessor Programming26 Mystery #1 time threads TAS lock Ideal (1) What is going on? better
27
Art of Multiprocessor Programming27 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking
28
Art of Multiprocessor Programming28 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }
29
Art of Multiprocessor Programming29 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Read until lock looks free
30
Art of Multiprocessor Programming30 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it
31
Art of Multiprocessor Programming31 Mystery #2 TAS lock TTAS lock Ideal time threads better
32
Art of Multiprocessor Programming32 Mystery Both –TAS and TTAS –Do the same thing (in our model) Except that –TTAS performs much better than TAS –Neither approaches ideal Why is TTAS better than TAS? Why is it so much worse than ideal?
33
Art of Multiprocessor Programming33 Caches Our memory abstraction is broken TAS & TTAS methods –Are provably the same (in our model) –Except they aren’t (in field tests) Need to consider caches, cache hits, dirty cache lines, etc.
34
Art of Multiprocessor Programming34 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock
35
Art of Multiprocessor Programming35 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners
36
Art of Multiprocessor Programming36 Test-and-test-and-set Read until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …
37
Art of Multiprocessor Programming37 Local Spinning while Lock is Busy Bus memory busy
38
Art of Multiprocessor Programming38 Bus On Release memory freeinvalid free
39
Art of Multiprocessor Programming39 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)
40
Art of Multiprocessor Programming40 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)
41
Art of Multiprocessor Programming41 Problems Everyone misses –Reads satisfied sequentially Everyone does TAS –Invalidates others’ caches Eventually quiesces after lock acquired
42
Art of Multiprocessor Programming42 Mystery Explained TAS lock TTAS lock Ideal time threads Better than TAS but still not as good as ideal better
43
Art of Multiprocessor Programming43 Solution: Introduce Delay spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again
44
Art of Multiprocessor Programming44 Dynamic Example: Exponential Backoff time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait
45
Art of Multiprocessor Programming45 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
46
Art of Multiprocessor Programming46 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
47
Art of Multiprocessor Programming47 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
48
Art of Multiprocessor Programming48 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
49
Art of Multiprocessor Programming49 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration
50
Art of Multiprocessor Programming50 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason
51
Art of Multiprocessor Programming51 Spin-Waiting Overhead TTAS Lock Backoff lock time threads better
52
Art of Multiprocessor Programming52 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms
53
Art of Multiprocessor Programming53 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others
54
Art of Multiprocessor Programming54 Anderson Queue Lock flags next TFFFFFFF idle
55
Art of Multiprocessor Programming55 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
56
Art of Multiprocessor Programming56 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
57
Art of Multiprocessor Programming57 Anderson Queue Lock flags next TFFFFFFF acquired Mine!
58
Art of Multiprocessor Programming58 Anderson Queue Lock flags next TFFFFFFF acquired acquiring
59
Art of Multiprocessor Programming59 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
60
Art of Multiprocessor Programming60 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
61
Art of Multiprocessor Programming61 acquired Anderson Queue Lock flags next TFFFFFFF acquiring
62
Art of Multiprocessor Programming62 released Anderson Queue Lock flags next TTFFFFFF acquired
63
Art of Multiprocessor Programming63 released Anderson Queue Lock flags next TTFFFFFF acquired Yow!
64
Art of Multiprocessor Programming64 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];
65
Art of Multiprocessor Programming65 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread
66
Art of Multiprocessor Programming66 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use
67
Art of Multiprocessor Programming67 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; Thread-local variable
68
Art of Multiprocessor Programming68 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; }
69
Art of Multiprocessor Programming69 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Take next slot
70
Art of Multiprocessor Programming70 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Spin until told to go
71
Art of Multiprocessor Programming71 Anderson Queue Lock public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for re-use
72
Art of Multiprocessor Programming72 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Tell next thread to go
73
Art of Multiprocessor Programming73 Performance Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness queue TTAS better
74
Art of Multiprocessor Programming74 Anderson Queue Lock Good –First truly scalable lock –Simple, easy to implement Bad –Space hog –One bit per thread Unknown number of threads? Small number of actual contenders?
75
Art of Multiprocessor Programming75 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead
76
Art of Multiprocessor Programming76 Initially false tail false idle
77
Art of Multiprocessor Programming77 Acquiring false queue false true acquiring (allocate Qnode)
78
Art of Multiprocessor Programming78 Acquiring false tail false true acquired swap
79
Art of Multiprocessor Programming79 Acquiring false tail false true acquired
80
Art of Multiprocessor Programming80 Acquired false tail false true acquired
81
Art of Multiprocessor Programming81 Acquiring tail false acquired acquiring true swap
82
Art of Multiprocessor Programming82 Acquiring tail acquired acquiring true false
83
Art of Multiprocessor Programming83 Acquiring tail acquired acquiring true false
84
Art of Multiprocessor Programming84 Acquiring tail acquired acquiring true false
85
Art of Multiprocessor Programming85 Acquiring tail acquired acquiring true false
86
Art of Multiprocessor Programming86 Acquiring tail acquired acquiring true Yes! false
87
Art of Multiprocessor Programming87 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; }
88
Art of Multiprocessor Programming88 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3)
89
Art of Multiprocessor Programming89 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Make a QNode
90
Art of Multiprocessor Programming90 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) add my Node to the tail of queue
91
Art of Multiprocessor Programming91 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Fix if queue was non-empty
92
Art of Multiprocessor Programming92 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Wait until unlocked
93
Art of Multiprocessor Programming93 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3)
94
Art of Multiprocessor Programming94 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Missing successor?
95
Art of Multiprocessor Programming95 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) If really no successor, return
96
Art of Multiprocessor Programming96 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Otherwise wait for successor to catch up
97
Art of Multiprocessor Programming97 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Pass lock to successor
98
Art of Multiprocessor Programming98 Purple Release false releasing swap false (2)
99
Art of Multiprocessor Programming99 Purple Release false releasing swap false By looking at the queue, I see another thread is active (2)
100
Art of Multiprocessor Programming100 Purple Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish (2)
101
Art of Multiprocessor Programming101 Purple Release false releasing prepare to spin true
102
Art of Multiprocessor Programming102 Purple Release false releasing spinning true
103
Art of Multiprocessor Programming103 Purple Release false releasing spinning true false
104
Art of Multiprocessor Programming104 Purple Release false releasing true Acquired lock false
105
Art of Multiprocessor Programming105 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.