Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich.

Similar presentations


Presentation on theme: "Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich."— Presentation transcript:

1 Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich

2 Mutual Exclusion Most programs aren’t embarrassingly parallel “critical sections” of the code must be executed by one thread at a time to ensure correctness use locks for mutual exclusion Art of Multiprocessor Programming2

3 Example: concurrent counter Art of Multiprocessor Programming3 Thread 2Thread 1 R1 W2

4 Art of Multiprocessor Programming4 Locks CS Resets lock upon exit lock critical section... …lock introduces sequential bottleneck

5 Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)

6 Outline Spinlock review TAS-lock optimizations Queue locks Abortable locks Art of Multiprocessor Programming6

7 7 Review: Test-and-Set Atomic operation Test-and-set (addr,new_val) –Set the current value of the word addr to new_val –Return the old value TAS aka “getAndSet”

8 Art of Multiprocessor Programming8 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)

9 Art of Multiprocessor Programming9 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false

10 Art of Multiprocessor Programming10 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

11 Art of Multiprocessor Programming11 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean

12 Art of Multiprocessor Programming12 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired

13 Art of Multiprocessor Programming13 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false

14 Art of Multiprocessor Programming14 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space

15 Art of Multiprocessor Programming15 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?

16 Art of Multiprocessor Programming16 Mystery #1 time threads TAS lock Ideal (1) What is going on?

17 Art of Multiprocessor Programming17 Bus-Based Architectures Bus cache memory cache

18 Art of Multiprocessor Programming18 Bus Processor Issues Load Request cache memory cache data

19 Art of Multiprocessor Programming19 Bus Processor Issues Load Request Bus cache memory cache data Gimme data

20 Art of Multiprocessor Programming20 cache Bus Memory Responds Bus memory cache data Got your data right here data

21 Art of Multiprocessor Programming21 Bus Processor Issues Load Request memory cache data Gimme data

22 Art of Multiprocessor Programming22 Bus Processor Issues Load Request Bus memory cache data Gimme data

23 Art of Multiprocessor Programming23 Bus Processor Issues Load Request Bus memory cache data I got data

24 Art of Multiprocessor Programming24 Bus Other Processor Responds memory cache data I got data data Bus

25 Art of Multiprocessor Programming25 Bus Other Processor Responds memory cache data Bus

26 Art of Multiprocessor Programming26 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?

27 Art of Multiprocessor Programming27 Modify Cached Data Bus data memory cachedata (1)

28 Art of Multiprocessor Programming28 Modify Cached Data Bus data memory cachedata (1)

29 Art of Multiprocessor Programming29 memory Bus data Modify Cached Data cachedata

30 Art of Multiprocessor Programming30 memory Bus data Modify Cached Data cache What’s up with the other copies? data

31 Art of Multiprocessor Programming31 cache Bus Modified cache data memory cachedata Other caches invalidate data This cache acquires write permission

32 Art of Multiprocessor Programming32 cache Bus Modified cache data memory cachedata Memory can be updated later

33 Art of Multiprocessor Programming33 What’s wrong with TASLock? TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners

34 Art of Multiprocessor Programming34 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking

35 Art of Multiprocessor Programming35 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }

36 Art of Multiprocessor Programming36 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Wait until lock looks free

37 Art of Multiprocessor Programming37 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it

38 Art of Multiprocessor Programming38 Graph TAS lock TTAS lock Ideal time threads

39 Art of Multiprocessor Programming39 Test-and-test-and-set Wait until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …

40 Art of Multiprocessor Programming40 Local Spinning while Lock is Busy Bus memory busy

41 Art of Multiprocessor Programming41 Bus On Release memory freeinvalid free

42 Art of Multiprocessor Programming42 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)

43 Art of Multiprocessor Programming43 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)

44 Art of Multiprocessor Programming44 An important observation spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be contention Better to back off than to collide again

45 Art of Multiprocessor Programming45 Solution: delay time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait

46 Art of Multiprocessor Programming46 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

47 Art of Multiprocessor Programming47 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

48 Art of Multiprocessor Programming48 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

49 Art of Multiprocessor Programming49 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

50 Art of Multiprocessor Programming50 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration

51 Art of Multiprocessor Programming51 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason

52 Art of Multiprocessor Programming52 Spin-Waiting Overhead TTAS Lock Backoff lock time threads

53 Art of Multiprocessor Programming53 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms

54 Summary: basic TAS-Lock Perform well for low contention, but basic spinlocks aren’t scalable All thread spin on the same shared memory location, causing a lot of bus traffic No fairness, so a thread might starve Art of Multiprocessor Programming54

55 Queue locks Keep FIFO Order Scalable locks Harder to implement Hurt performance for low contention Art of Multiprocessor Programming55

56 Art of Multiprocessor Programming56 Anderson Queue Lock flags next TFFFFFFF idle

57 Art of Multiprocessor Programming57 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement

58 Art of Multiprocessor Programming58 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement

59 Art of Multiprocessor Programming59 Anderson Queue Lock flags next TFFFFFFF acquired Mine!

60 Art of Multiprocessor Programming60 Anderson Queue Lock flags next TFFFFFFF acquired acquiring

61 Art of Multiprocessor Programming61 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement

62 Art of Multiprocessor Programming62 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement

63 Art of Multiprocessor Programming63 acquired Anderson Queue Lock flags next TFFFFFFF acquiring

64 Art of Multiprocessor Programming64 released Anderson Queue Lock flags next FTFFFFFF acquired

65 Problem: false sharing Each thread spins on different variable, so there is no reason for contention. But adjacent Array elements are contained within the same cacheline… Art of Multiprocessor Programming65

66 66 released The Solution: Padding flags next T///F/// acquired Line 1 Line 2 Art of Multiprocessor Programming Spin on my line

67 Art of Multiprocessor Programming67 Performance Shorter handover than backoff Curve is practically flat Scalable performance queue TTAS

68 Art of Multiprocessor Programming68 Anderson Queue Lock Good - Easy to implement Queue lock Bad –Not Space efficient What if unknown number of threads? What if small number of actual contenders?

69 Art of Multiprocessor Programming69 CLH Lock FIFO order Small, constant-size overhead per thread

70 Art of Multiprocessor Programming70 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }

71 Art of Multiprocessor Programming71 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}

72 Art of Multiprocessor Programming72 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Queue tail

73 Art of Multiprocessor Programming73 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Thread-local Qnode

74 Art of Multiprocessor Programming74 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Swap in my node

75 Art of Multiprocessor Programming75 CLH Queue Lock class CLHLock implements Lock { AtomicReference tail; ThreadLocal myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Spin until predecessor releases lock

76 Art of Multiprocessor Programming76 Initially false tail idle

77 Art of Multiprocessor Programming77 Initially false tail idle

78 Art of Multiprocessor Programming78 Purple Wants the Lock false tail acquiring

79 Art of Multiprocessor Programming79 Purple Wants the Lock false tail acquiring true

80 Art of Multiprocessor Programming80 Purple Wants the Lock false tail acquiring true Swap

81 Art of Multiprocessor Programming81 Purple Has the Lock false tail acquired true

82 Art of Multiprocessor Programming82 Red Wants the Lock false tail acquired acquiring true

83 Art of Multiprocessor Programming83 Red Wants the Lock false tail acquired acquiring true Swap true

84 Art of Multiprocessor Programming84 Red Wants the Lock false tail acquired acquiring true

85 Art of Multiprocessor Programming85 Red Wants the Lock false tail acquired acquiring true

86 Art of Multiprocessor Programming86 Red Wants the Lock false tail acquired acquiring true Implicit Linked list

87 Art of Multiprocessor Programming87 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }

88 Art of Multiprocessor Programming88 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } Notify successor

89 Art of Multiprocessor Programming89 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } Recycle predecessor’s node

90 Art of Multiprocessor Programming90 Purple Releases false tail release acquiring false true false Bingo!

91 Art of Multiprocessor Programming91 Purple Releases tail released acquired true

92 Art of Multiprocessor Programming92 Space Usage Let –L = number of locks –N = number of threads ALock –O(LN) CLH lock –O(L+N)

93 Art of Multiprocessor Programming93 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Doesn’t work for uncached NUMA architectures

94 Art of Multiprocessor Programming94 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others

95 Art of Multiprocessor Programming95 MCS Lock FIFO order, list based Queue lock Similar to CLH Spin on local memory only, solving the NUMA problem

96 MCS lock Each node contains now a “next” field. Each node spins locally on its own “Locked” field upon release, notify next node you finished Art of Multiprocessor Programming96

97 Art of Multiprocessor Programming97 Abortable Locks What if you want to give up waiting for a lock? For example –Timeout –Database transaction aborted by user

98 Art of Multiprocessor Programming98 Back-off Lock Aborting is trivial –Just return from lock() call Extra benefit: –No cleaning up –Immediate return

99 Art of Multiprocessor Programming99 Queue Locks Can’t just quit –Thread in line behind will starve Need a graceful way out

100 Art of Multiprocessor Programming100 Abortable CLH Lock When a thread gives up –Removing node in a wait-free way is hard Idea: –let successor deal with it.

101 Art of Multiprocessor Programming101 Queue Locks locked true spinning true spinning

102 Art of Multiprocessor Programming102 Queue Locks locked true abor true spinning Time-out

103 Art of Multiprocessor Programming103 Queue Locks locked true abor true spinning Predecessor aborted

104 Art of Multiprocessor Programming104 Queue Locks locked true spinning

105 Art of Multiprocessor Programming105 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: – the application – the hardware – which properties are important

106 Art of Multiprocessor Programming106 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.


Download ppt "Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich."

Similar presentations


Ads by Google