Download presentation
Presentation is loading. Please wait.
Published byKeyshawn Basden Modified over 9 years ago
1
Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich
2
Mutual Exclusion Most programs aren’t embarrassingly parallel “critical sections” of the code must be executed by one thread at a time to ensure correctness use locks for mutual exclusion Art of Multiprocessor Programming2
3
Example: concurrent counter Art of Multiprocessor Programming3 Thread 2Thread 1 R1 W2
4
Art of Multiprocessor Programming4 Locks CS Resets lock upon exit lock critical section... …lock introduces sequential bottleneck
5
Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)
6
Outline Spinlock review TAS-lock optimizations Queue locks Abortable locks Art of Multiprocessor Programming6
7
7 Review: Test-and-Set Atomic operation Test-and-set (addr,new_val) –Set the current value of the word addr to new_val –Return the old value TAS aka “getAndSet”
8
Art of Multiprocessor Programming8 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)
9
Art of Multiprocessor Programming9 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false
10
Art of Multiprocessor Programming10 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
11
Art of Multiprocessor Programming11 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean
12
Art of Multiprocessor Programming12 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired
13
Art of Multiprocessor Programming13 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false
14
Art of Multiprocessor Programming14 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space
15
Art of Multiprocessor Programming15 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?
16
Art of Multiprocessor Programming16 Mystery #1 time threads TAS lock Ideal (1) What is going on?
17
Art of Multiprocessor Programming17 Bus-Based Architectures Bus cache memory cache
18
Art of Multiprocessor Programming18 Bus Processor Issues Load Request cache memory cache data
19
Art of Multiprocessor Programming19 Bus Processor Issues Load Request Bus cache memory cache data Gimme data
20
Art of Multiprocessor Programming20 cache Bus Memory Responds Bus memory cache data Got your data right here data
21
Art of Multiprocessor Programming21 Bus Processor Issues Load Request memory cache data Gimme data
22
Art of Multiprocessor Programming22 Bus Processor Issues Load Request Bus memory cache data Gimme data
23
Art of Multiprocessor Programming23 Bus Processor Issues Load Request Bus memory cache data I got data
24
Art of Multiprocessor Programming24 Bus Other Processor Responds memory cache data I got data data Bus
25
Art of Multiprocessor Programming25 Bus Other Processor Responds memory cache data Bus
26
Art of Multiprocessor Programming26 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?
27
Art of Multiprocessor Programming27 Modify Cached Data Bus data memory cachedata (1)
28
Art of Multiprocessor Programming28 Modify Cached Data Bus data memory cachedata (1)
29
Art of Multiprocessor Programming29 memory Bus data Modify Cached Data cachedata
30
Art of Multiprocessor Programming30 memory Bus data Modify Cached Data cache What’s up with the other copies? data
31
Art of Multiprocessor Programming31 cache Bus Modified cache data memory cachedata Other caches invalidate data This cache acquires write permission
32
Art of Multiprocessor Programming32 cache Bus Modified cache data memory cachedata Memory can be updated later
33
Art of Multiprocessor Programming33 What’s wrong with TASLock? TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners
34
Art of Multiprocessor Programming34 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking
35
Art of Multiprocessor Programming35 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }
36
Art of Multiprocessor Programming36 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Wait until lock looks free
37
Art of Multiprocessor Programming37 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it
38
Art of Multiprocessor Programming38 Graph TAS lock TTAS lock Ideal time threads
39
Art of Multiprocessor Programming39 Test-and-test-and-set Wait until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …
40
Art of Multiprocessor Programming40 Local Spinning while Lock is Busy Bus memory busy
41
Art of Multiprocessor Programming41 Bus On Release memory freeinvalid free
42
Art of Multiprocessor Programming42 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)
43
Art of Multiprocessor Programming43 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)
44
Art of Multiprocessor Programming44 An important observation spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be contention Better to back off than to collide again
45
Art of Multiprocessor Programming45 Solution: delay time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait
46
Art of Multiprocessor Programming46 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
47
Art of Multiprocessor Programming47 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
48
Art of Multiprocessor Programming48 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
49
Art of Multiprocessor Programming49 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
50
Art of Multiprocessor Programming50 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration
51
Art of Multiprocessor Programming51 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason
52
Art of Multiprocessor Programming52 Spin-Waiting Overhead TTAS Lock Backoff lock time threads
53
Art of Multiprocessor Programming53 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms
54
Summary: basic TAS-Lock Perform well for low contention, but basic spinlocks aren’t scalable All thread spin on the same shared memory location, causing a lot of bus traffic No fairness, so a thread might starve Art of Multiprocessor Programming54
55
Queue locks Keep FIFO Order Scalable locks Harder to implement Hurt performance for low contention Art of Multiprocessor Programming55
56
Art of Multiprocessor Programming56 Anderson Queue Lock flags next TFFFFFFF idle
57
Art of Multiprocessor Programming57 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
58
Art of Multiprocessor Programming58 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement
59
Art of Multiprocessor Programming59 Anderson Queue Lock flags next TFFFFFFF acquired Mine!
60
Art of Multiprocessor Programming60 Anderson Queue Lock flags next TFFFFFFF acquired acquiring
61
Art of Multiprocessor Programming61 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
62
Art of Multiprocessor Programming62 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement
63
Art of Multiprocessor Programming63 acquired Anderson Queue Lock flags next TFFFFFFF acquiring
64
Art of Multiprocessor Programming64 released Anderson Queue Lock flags next FTFFFFFF acquired
65
Problem: false sharing Each thread spins on different variable, so there is no reason for contention. But adjacent Array elements are contained within the same cacheline… Art of Multiprocessor Programming65
66
66 released The Solution: Padding flags next T///F/// acquired Line 1 Line 2 Art of Multiprocessor Programming Spin on my line
67
Art of Multiprocessor Programming67 Performance Shorter handover than backoff Curve is practically flat Scalable performance queue TTAS
68
Art of Multiprocessor Programming68 Anderson Queue Lock Good - Easy to implement Queue lock Bad –Not Space efficient What if unknown number of threads? What if small number of actual contenders?
69
Art of Multiprocessor Programming69 CLH Lock FIFO order Small, constant-size overhead per thread
70
Art of Multiprocessor Programming70 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }
71
Art of Multiprocessor Programming71 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
72
Art of Multiprocessor Programming72 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Queue tail
73
Art of Multiprocessor Programming73 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Thread-local Qnode
74
Art of Multiprocessor Programming74 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Swap in my node
75
Art of Multiprocessor Programming75 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Spin until predecessor releases lock
76
Art of Multiprocessor Programming76 Initially false tail idle
77
Art of Multiprocessor Programming77 Initially false tail idle
78
Art of Multiprocessor Programming78 Purple Wants the Lock false tail acquiring
79
Art of Multiprocessor Programming79 Purple Wants the Lock false tail acquiring true
80
Art of Multiprocessor Programming80 Purple Wants the Lock false tail acquiring true Swap
81
Art of Multiprocessor Programming81 Purple Has the Lock false tail acquired true
82
Art of Multiprocessor Programming82 Red Wants the Lock false tail acquired acquiring true
83
Art of Multiprocessor Programming83 Red Wants the Lock false tail acquired acquiring true Swap true
84
Art of Multiprocessor Programming84 Red Wants the Lock false tail acquired acquiring true
85
Art of Multiprocessor Programming85 Red Wants the Lock false tail acquired acquiring true
86
Art of Multiprocessor Programming86 Red Wants the Lock false tail acquired acquiring true Implicit Linked list
87
Art of Multiprocessor Programming87 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }
88
Art of Multiprocessor Programming88 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } Notify successor
89
Art of Multiprocessor Programming89 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } Recycle predecessor’s node
90
Art of Multiprocessor Programming90 Purple Releases false tail release acquiring false true false Bingo!
91
Art of Multiprocessor Programming91 Purple Releases tail released acquired true
92
Art of Multiprocessor Programming92 Space Usage Let –L = number of locks –N = number of threads ALock –O(LN) CLH lock –O(L+N)
93
Art of Multiprocessor Programming93 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Doesn’t work for uncached NUMA architectures
94
Art of Multiprocessor Programming94 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others
95
Art of Multiprocessor Programming95 MCS Lock FIFO order, list based Queue lock Similar to CLH Spin on local memory only, solving the NUMA problem
96
MCS lock Each node contains now a “next” field. Each node spins locally on its own “Locked” field upon release, notify next node you finished Art of Multiprocessor Programming96
97
Art of Multiprocessor Programming97 Abortable Locks What if you want to give up waiting for a lock? For example –Timeout –Database transaction aborted by user
98
Art of Multiprocessor Programming98 Back-off Lock Aborting is trivial –Just return from lock() call Extra benefit: –No cleaning up –Immediate return
99
Art of Multiprocessor Programming99 Queue Locks Can’t just quit –Thread in line behind will starve Need a graceful way out
100
Art of Multiprocessor Programming100 Abortable CLH Lock When a thread gives up –Removing node in a wait-free way is hard Idea: –let successor deal with it.
101
Art of Multiprocessor Programming101 Queue Locks locked true spinning true spinning
102
Art of Multiprocessor Programming102 Queue Locks locked true abor true spinning Time-out
103
Art of Multiprocessor Programming103 Queue Locks locked true abor true spinning Predecessor aborted
104
Art of Multiprocessor Programming104 Queue Locks locked true spinning
105
Art of Multiprocessor Programming105 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: – the application – the hardware – which properties are important
106
Art of Multiprocessor Programming106 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.