Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Similar presentations


Presentation on theme: "Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."— Presentation transcript:

1 Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming2 Focus so far: Correctness and Progress Models –Accurate (we never lied to you) –But idealized (so we forgot to mention a few things) Protocols –Elegant –Important –But naïve

3 Art of Multiprocessor Programming3 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware And get to know a collection of locking algorithms… (1)

4 Art of Multiprocessor Programming4 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)

5 Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus

6 Art of Multiprocessor Programming6 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...

7 Art of Multiprocessor Programming7 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequential bottleneck

8 Art of Multiprocessor Programming8 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention

9 Art of Multiprocessor Programming9 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Notice: these are distinct phenomena …lock suffers from contention

10 Art of Multiprocessor Programming10 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention Seq Bottleneck  no parallelism

11 Art of Multiprocessor Programming11 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Contention  ??? …lock suffers from contention

12 Art of Multiprocessor Programming12 Review: Test-and-Set Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”

13 Art of Multiprocessor Programming13 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)

14 Art of Multiprocessor Programming14 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic

15 Art of Multiprocessor Programming15 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values

16 Art of Multiprocessor Programming16 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)

17 Art of Multiprocessor Programming17 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS

18 Art of Multiprocessor Programming18 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false

19 Art of Multiprocessor Programming19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

20 Art of Multiprocessor Programming20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean

21 Art of Multiprocessor Programming21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired

22 Art of Multiprocessor Programming22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false

23 Art of Multiprocessor Programming23 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the  (n) lower bound? We used a RMW operation…

24 Art of Multiprocessor Programming24 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?

25 Art of Multiprocessor Programming25 Graph ideal time threads no speedup because of sequential bottleneck better

26 Art of Multiprocessor Programming26 Mystery #1 time threads TAS lock Ideal (1) What is going on? better

27 Art of Multiprocessor Programming27 Test-and-Test-and-Set Locks Lurking stage –Wait until lock “looks” free –Spin while read returns true (lock taken) Pouncing state –As soon as lock “looks” available –Read returns false (lock free) –Call TAS to acquire lock –If TAS loses, back to lurking

28 Art of Multiprocessor Programming28 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }

29 Art of Multiprocessor Programming29 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Read until lock looks free

30 Art of Multiprocessor Programming30 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } Then try to acquire it

31 Art of Multiprocessor Programming31 Mystery #2 TAS lock TTAS lock Ideal time threads better

32 Art of Multiprocessor Programming32 Mystery Both –TAS and TTAS –Do the same thing (in our model) Except that –TTAS performs much better than TAS –Neither approaches ideal Why is TTAS better than TAS? Why is it so much worse than ideal?

33 Art of Multiprocessor Programming33 Caches Our memory abstraction is broken TAS & TTAS methods –Are provably the same (in our model) –Except they aren’t (in field tests) Need to consider caches, cache hits, dirty cache lines, etc.

34 Art of Multiprocessor Programming34 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock

35 Art of Multiprocessor Programming35 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus Thread wants to release lock –delayed behind spinners

36 Art of Multiprocessor Programming36 Test-and-test-and-set Read until lock “looks” free –Spin on local cache –No bus use while lock busy Problem: when lock is released –Invalidation storm …

37 Art of Multiprocessor Programming37 Local Spinning while Lock is Busy Bus memory busy

38 Art of Multiprocessor Programming38 Bus On Release memory freeinvalid free

39 Art of Multiprocessor Programming39 On Release Bus memory freeinvalid free miss Everyone misses, rereads (1)

40 Art of Multiprocessor Programming40 On Release Bus memory freeinvalid free TAS(…) Everyone tries TAS (1)

41 Art of Multiprocessor Programming41 Problems Everyone misses –Reads satisfied sequentially Everyone does TAS –Invalidates others’ caches Eventually quiesces after lock acquired

42 Art of Multiprocessor Programming42 Mystery Explained TAS lock TTAS lock Ideal time threads Better than TAS but still not as good as ideal better

43 Art of Multiprocessor Programming43 Solution: Introduce Delay spin lock time d r1dr1d r2dr2d If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again

44 Art of Multiprocessor Programming44 Dynamic Example: Exponential Backoff time d 2d4d spin lock If I fail to get lock –wait random duration before retry –Each subsequent failure doubles expected wait

45 Art of Multiprocessor Programming45 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

46 Art of Multiprocessor Programming46 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

47 Art of Multiprocessor Programming47 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

48 Art of Multiprocessor Programming48 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

49 Art of Multiprocessor Programming49 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration

50 Art of Multiprocessor Programming50 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason

51 Art of Multiprocessor Programming51 Spin-Waiting Overhead TTAS Lock Backoff lock time threads better

52 Art of Multiprocessor Programming52 Backoff: Other Issues Good –Easy to implement –Beats TTAS lock Bad –Must choose parameters carefully –Not portable across platforms

53 Art of Multiprocessor Programming53 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others

54 Art of Multiprocessor Programming54 Anderson Queue Lock flags next TFFFFFFF idle

55 Art of Multiprocessor Programming55 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement

56 Art of Multiprocessor Programming56 Anderson Queue Lock flags next TFFFFFFF acquiring getAndIncrement

57 Art of Multiprocessor Programming57 Anderson Queue Lock flags next TFFFFFFF acquired Mine!

58 Art of Multiprocessor Programming58 Anderson Queue Lock flags next TFFFFFFF acquired acquiring

59 Art of Multiprocessor Programming59 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement

60 Art of Multiprocessor Programming60 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement

61 Art of Multiprocessor Programming61 acquired Anderson Queue Lock flags next TFFFFFFF acquiring

62 Art of Multiprocessor Programming62 released Anderson Queue Lock flags next TTFFFFFF acquired

63 Art of Multiprocessor Programming63 released Anderson Queue Lock flags next TTFFFFFF acquired Yow!

64 Art of Multiprocessor Programming64 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

65 Art of Multiprocessor Programming65 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread

66 Art of Multiprocessor Programming66 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use

67 Art of Multiprocessor Programming67 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal mySlot; Thread-local variable

68 Art of Multiprocessor Programming68 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; }

69 Art of Multiprocessor Programming69 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Take next slot

70 Art of Multiprocessor Programming70 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Spin until told to go

71 Art of Multiprocessor Programming71 Anderson Queue Lock public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for re-use

72 Art of Multiprocessor Programming72 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Tell next thread to go

73 Art of Multiprocessor Programming73 Performance Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness queue TTAS better

74 Art of Multiprocessor Programming74 Anderson Queue Lock Good –First truly scalable lock –Simple, easy to implement Bad –Space hog –One bit per thread Unknown number of threads? Small number of actual contenders?

75 Art of Multiprocessor Programming75 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead

76 Art of Multiprocessor Programming76 Initially false tail false idle

77 Art of Multiprocessor Programming77 Acquiring false queue false true acquiring (allocate Qnode)

78 Art of Multiprocessor Programming78 Acquiring false tail false true acquired swap

79 Art of Multiprocessor Programming79 Acquiring false tail false true acquired

80 Art of Multiprocessor Programming80 Acquired false tail false true acquired

81 Art of Multiprocessor Programming81 Acquiring tail false acquired acquiring true swap

82 Art of Multiprocessor Programming82 Acquiring tail acquired acquiring true false

83 Art of Multiprocessor Programming83 Acquiring tail acquired acquiring true false

84 Art of Multiprocessor Programming84 Acquiring tail acquired acquiring true false

85 Art of Multiprocessor Programming85 Acquiring tail acquired acquiring true false

86 Art of Multiprocessor Programming86 Acquiring tail acquired acquiring true Yes! false

87 Art of Multiprocessor Programming87 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; }

88 Art of Multiprocessor Programming88 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3)

89 Art of Multiprocessor Programming89 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Make a QNode

90 Art of Multiprocessor Programming90 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) add my Node to the tail of queue

91 Art of Multiprocessor Programming91 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Fix if queue was non-empty

92 Art of Multiprocessor Programming92 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} (3) Wait until unlocked

93 Art of Multiprocessor Programming93 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3)

94 Art of Multiprocessor Programming94 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Missing successor?

95 Art of Multiprocessor Programming95 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) If really no successor, return

96 Art of Multiprocessor Programming96 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Otherwise wait for successor to catch up

97 Art of Multiprocessor Programming97 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} (3) Pass lock to successor

98 Art of Multiprocessor Programming98 Purple Release false releasing swap false (2)

99 Art of Multiprocessor Programming99 Purple Release false releasing swap false By looking at the queue, I see another thread is active (2)

100 Art of Multiprocessor Programming100 Purple Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish (2)

101 Art of Multiprocessor Programming101 Purple Release false releasing prepare to spin true

102 Art of Multiprocessor Programming102 Purple Release false releasing spinning true

103 Art of Multiprocessor Programming103 Purple Release false releasing spinning true false

104 Art of Multiprocessor Programming104 Purple Release false releasing true Acquired lock false

105 Art of Multiprocessor Programming105 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.


Download ppt "Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."

Similar presentations


Ads by Google