Spin Locks and Contention Management

Slides:



Advertisements
Similar presentations
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Advertisements

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Chapter 6: Process Synchronization
Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007.
Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
CS510 Concurrent Systems Class 1b Spin Lock Performance.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Mutual Exclusion.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Spin Locks and Contention
MULTIVIE W Slide 1 (of 23) The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Paper: Thomas E. Anderson Presentation: Emerson.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
SYNAR Systems Networking and Architecture Group CMPT 886: The Art of Scalable Synchronization Dr. Alexandra Fedorova School of Computing Science SFU.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640,
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Multiprocessors – Locks
Spin Locks and Contention
CS703 – Advanced Operating Systems
Lecture 21 Synchronization
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Atomic Operations in Hardware
Multiprocessor Cache Coherency
Chapter 5: Process Synchronization
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Spin Locks and Contention
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
CS510 Concurrent Systems Jonathan Walpole.
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Critical section problem
Barrier Synchronization
Implementing Mutual Exclusion
CS533 Concepts of Operating Systems
Lecture 4: Synchronization
Implementing Mutual Exclusion
Kernel Synchronization II
CS510 Concurrent Systems Jonathan Walpole.
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
CSE 153 Design of Operating Systems Winter 19
Lecture 24: Virtual Memory, Multiprocessors
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Problems with Locks Andrew Whitaker CSE451.
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Nir Shavit Multiprocessor Synchronization Spring 2003
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Spin Locks and Contention Management Multiprocessor Synchronization Nir Shavit Spring 2003

Focus so far: Correctness Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve 24-Nov-18 © 2003 Herlihy & Shavit

New Focus: Performance Models More complicated (not the same as complex!) Still focus on principals (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary) 24-Nov-18 © 2003 Herlihy & Shavit

Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data MIMD (Multiprocessors) Multiple instruction Multiple data. Our space 24-Nov-18 (1) © 2003 Herlihy & Shavit (1)

MIMD Architectures memory Memory Contention Communication Contention Shared Bus Distributed Memory Contention Communication Contention Communication Latency 24-Nov-18 © 2003 Herlihy & Shavit (1)

Lets Look Again at Mutual Exclusion This time using synchronization operations stronger than reads or writes And on the way learn more about hardware and about: Memory Contention Communication Contention Communication Latency 24-Nov-18 © 2003 Herlihy & Shavit

Real World: What should a thread do if it can’t get the lock? Keep trying “spin”, “busy-wait” Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus 24-Nov-18 © 2003 Herlihy & Shavit (1)

. Basic Spin-Lock …lock suffers from contention 0/1 CS P2 spin lock critical section Resets lock upon exit Pn Lets try and understand this phenomena 24-Nov-18 © 2003 Herlihy & Shavit

Review: Test-and-Set remember old value return old value public class RMW extends Register { int value; public synchronized int TAS() { int result = value; value = 1; return result; } remember old value return old value new value is 1 24-Nov-18 © 2003 Herlihy & Shavit (5)

Test-and-Set Atomically Atomic swap Use write method to reset Returns previous value Sets current value to 1 Atomic swap Use write method to reset 24-Nov-18 © 2003 Herlihy & Shavit

Test-and-Set Locks Locking Acquire lock by calling TAS Lock is free: value is 0 Lock is taken: value is 1 Acquire lock by calling TAS If result is 0, you win If result is 1, you lose Release lock by writing 0 24-Nov-18 © 2003 Herlihy & Shavit

TASLock Keep trying until lock acquired Simple write to release public class TASLOCK implements Lock { TASRegister lock = TASRegister(0); public void acquire(int i) { while (lock.TAS() == 1) {}; } } public void release(int i) } lock.write(0); }} Keep trying until lock acquired Simple write to release 24-Nov-18 © 2003 Herlihy & Shavit (1)

Performance Experiment How long should it take? How long does it take? n threads Increment shared counter 1 million times How long should it take? How long does it take? 24-Nov-18 © 2003 Herlihy & Shavit

Graph time threads Initial speedup: loop overhead in parallel ideal Work independent of number of threads 24-Nov-18 © 2003 Herlihy & Shavit (2)

Huston, we have a problem … Mystery #1 TAS lock time Ideal Huston, we have a problem … threads 24-Nov-18 © 2003 Herlihy & Shavit (1)

Test-and-Test-and-Set Locks Lurking stage Wait until lock “looks” free Spin while read returns 1 (lock taken) Pouncing state As soon as lock “looks” available Read returns 0 (lock free) Call TAS to acquire lock If TAS loses, back to lurking 24-Nov-18 © 2003 Herlihy & Shavit

TTASLock Then try to acquire lock Wait until lock looks free public class TTASLock implements Lock { TASRegister lock = TASRegister(0); public void acquire(int i) { while (true) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; } Then try to acquire lock Wait until lock looks free 24-Nov-18 © 2003 Herlihy & Shavit (3)

Mystery #2 time threads Ideal TAS lock TTAS lock 24-Nov-18 © 2003 Herlihy & Shavit

Mystery Both Except that TAS and TTAS Do the same thing (in our model) TTAS performs much better than TAS Neither approaches ideal 24-Nov-18 © 2003 Herlihy & Shavit

Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren’t (in field tests) Need a more detailed model … 24-Nov-18 © 2003 Herlihy & Shavit

Bus-Based Architectures cache cache cache Bus memory 24-Nov-18 © 2003 Herlihy & Shavit

Bus-Based Architectures Random access memory (10s of cycles) cache cache cache Bus memory 24-Nov-18 © 2003 Herlihy & Shavit

Bus-Based Architectures Shared Bus broadcast medium One broadcaster at a time Processors and memory all “snoop” cache cache cache Bus memory 24-Nov-18 © 2003 Herlihy & Shavit

Bus-Based Architectures Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information Bus-Based Architectures cache cache cache Bus memory 24-Nov-18 © 2003 Herlihy & Shavit

Jargon Watch Cache hit Cache miss “I found what I wanted in my cache” Good Thing™ Cache miss “I had to shlep all the way to memory for that data” Bad Thing™ 24-Nov-18 © 2003 Herlihy & Shavit

Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later 24-Nov-18 © 2003 Herlihy & Shavit

Processor Issues Load Request Gimme data cache cache cache Bus Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (1)

Memory Responds memory cache cache cache I got data data data Bus Bus 24-Nov-18 © 2003 Herlihy & Shavit (3)

Processor Issues Load Request Gimme data data cache cache Bus Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (2)

Other Processor Responds I got data data data cache cache Bus Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (2)

Modify Cached Data memory data data data data cache Bus 24-Nov-18 © 2003 Herlihy & Shavit (1)

What’s up with the other copies? Modify Cached Data data data cache Bus What’s up with the other copies? memory data 24-Nov-18 © 2003 Herlihy & Shavit (1)

Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion? Generic version is a fundamental problem™ in Computer Science 24-Nov-18 © 2003 Herlihy & Shavit

Fundamental Problem™ Managing replicated data This is a fundamental problem™ in Computer Science Multiprocessor architecture Distributed file systems Distributed databases … 24-Nov-18 © 2003 Herlihy & Shavit

Write-Through Cache memory Listen to me! data data data data cache Bus Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (5)

Write-Through Caches “show stoppers” Immediately broadcast changes Good Memory, caches always agree More read hits, maybe Bad Bus traffic on all writes Most writes to unshared data For example, loop indexes … “show stoppers” 24-Nov-18 © 2003 Herlihy & Shavit (1)

Write-Back Caches Accumulate changes in cache Write back when needed Need the cache for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol … 24-Nov-18 © 2003 Herlihy & Shavit

Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid: I can read but I can’t write Dirty: Data has been modified Intercept other load requests Write back to memory before using cache 24-Nov-18 © 2003 Herlihy & Shavit

Invalidate memory Mine, all mine! Uh,oh cache data data cache data Bus 24-Nov-18 © 2003 Herlihy & Shavit (4)

Invalidate memory Other caches lose read permission data cache Bus This cache acquires write permission memory data 24-Nov-18 © 2003 Herlihy & Shavit (2)

Invalidate Memory provides data only if not present in any cache, so no need to change it now (expensive) cache data cache Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (2)

Another Processor Asks for Data cache data cache Bus Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit (2)

Owner Responds memory Here it is! cache data data cache data Bus Bus 24-Nov-18 © 2003 Herlihy & Shavit (2)

End of the Day … memory Reading OK, no writing data data data cache Bus memory data Reading OK, no writing 24-Nov-18 © 2003 Herlihy & Shavit (1)

Bus-Based Architectures data data cache Bus memory data 24-Nov-18 © 2003 Herlihy & Shavit

Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock 24-Nov-18 © 2003 Herlihy & Shavit

Simple TAS TAS invalidates cache lines Spinners Miss in cache Go to bus Thread wants to release lock delayed behind spinners 24-Nov-18 © 2003 Herlihy & Shavit

Test-and-test-and-set Wait until lock “looks” free Spin on local cache No bus use while lock busy Problem: when lock is released Invalidation storm … 24-Nov-18 © 2003 Herlihy & Shavit

Local Spinning while Lock is Busy memory busy 24-Nov-18 © 2003 Herlihy & Shavit

On Release memory free invalid invalid free Bus 24-Nov-18 © 2003 Herlihy & Shavit

Everyone misses, rereads On Release Everyone misses, rereads miss miss invalid invalid free Bus memory free 24-Nov-18 © 2003 Herlihy & Shavit (1)

On Release memory Everyone tries TAS TAS(…) TAS(…) free invalid Bus memory free 24-Nov-18 © 2003 Herlihy & Shavit (1)

Problems Everyone misses Everyone does TAS Reads satisfied sequentially Everyone does TAS Invalidates others’ caches Eventually reaches quiescence after lock acquired How long does this take? 24-Nov-18 © 2003 Herlihy & Shavit

Measuring Quiescence Time X = time of ops that don’t use the bus Y = time of ops that cause intensive bus traffic 0/1 CS spin lock critical section P1 P2 Pn In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance. By gradually varying X, can determine the exact time to quiesce. 24-Nov-18 © 2003 Herlihy & Shavit

Quiescence Time time threads Increses linearly with the number of processors for bus architecture time threads 24-Nov-18 © 2003 Herlihy & Shavit

Mystery Explained time threads Ideal TAS lock TTAS lock 24-Nov-18 © 2003 Herlihy & Shavit

Solution: Introduce Delay The place where the delay is inserted: After lock release After every lock reference The way the size is set: 1. Static 2. Dynamic 1 1 time spin lock r2d r1d d 24-Nov-18 © 2003 Herlihy & Shavit

Static Example: Slotted Delays 1 1 time spin lock 3d 2d d Split time into slots. Each process delays amount that will place him in a predetermined slot. 24-Nov-18 © 2003 Herlihy & Shavit

Dynamic Example: Exponential Backoff 1 1 time spin lock 4d 2d d If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait 24-Nov-18 © 2003 Herlihy & Shavit

Delay Strategies The place where the delay is inserted: After lock release After every lock reference The way the size is set: 1. Static 2. Dynamic Usually most Effective…Exp Backoff 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; while (lock.read() == 1) { if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; while (true) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; while (true) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; while (TRUE) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; While (true) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration 24-Nov-18 © 2003 Herlihy & Shavit

Exponential Backoff Lock public class backoff implements lock public void acquire() { int delay = MIN_DELAY; while (true) { while (lock.read() == 1) {}; if (lock.TAS() == 0) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason 24-Nov-18 © 2003 Herlihy & Shavit

Spin-Waiting Overhead TTS exp-backoff 24-Nov-18 © 2003 Herlihy & Shavit

Can We Improve On This? Optimize “slot” size before trying to enter the CS and avoid useless invalidations How? By keeping a queue of threads Each thread Notifies next in line Without bothering the others 24-Nov-18 © 2003 Herlihy & Shavit

Anderson’s Queue Lock class ALock implements lock { int flags[n] = {ENTER, WAIT, …, WAIT}; int myslot[n] = {0,..,0}; RMW tail = new RMW(0) ; 24-Nov-18 © 2003 Herlihy & Shavit

Anderson’s Queue Lock Thread i gets slot Reset slot Wait while busy public void acquire() { myslot[i] = tail.fetchInc(); while (flags[myslot[i] % n] == WAIT){}; flags[myslot[i] % n] = WAIT; } public void release() { flags[myslot[i] + 1 % n] = ENTER; Reset slot for next round Wait while busy Release next thread 24-Nov-18 © 2003 Herlihy & Shavit

Performance tts Curve is practically flat Scalable performance queue 24-Nov-18 © 2003 Herlihy & Shavit

Observations Need to allocate size n-array no matter how many threads actually access the lock Cache line granularity? Read the whole array into cache? Can we do better? 24-Nov-18 © 2003 Herlihy & Shavit

CLH Lock FIFO order Small, Constant-size overhead per thread 24-Nov-18 © 2003 Herlihy & Shavit

CLH Queue Lock 1 Critical Section release Swap into queue tail, wait for a “0” in Predecessor’s node 24-Nov-18 © 2003 Herlihy & Shavit

CLH Queue Lock I have not released yet class Qnode { boolean locked = true; } 24-Nov-18 © 2003 Herlihy & Shavit

Get pred and point tail to me CLH Queue Lock class CLHLock implements Lock { RMW queue; public void acquire(Qnode mynode){ /** mynode.locked = true; */ Qnode pred = queue.swap(qnode); while (pred.locked) {} }} Get pred and point tail to me Wait until unlocked 24-Nov-18 © 2003 Herlihy & Shavit (3)

CLH Queue Lock Notify successor class CLHLock implements Lock { RMW queue; public void release(Qnode mynode) { mynode.locked = false; }} Notify successor 24-Nov-18 © 2003 Herlihy & Shavit (3)

Initially acquire idle queue false 24-Nov-18 © 2003 Herlihy & Shavit

Purple Acquires Lock acquire idle queue false true 24-Nov-18 © 2003 Herlihy & Shavit

Red Wants Lock acquire want Enter CS queue true false true 24-Nov-18 © 2003 Herlihy & Shavit

NUMA Machines Distributed Shared Memory Machine Non-Uniform Memory Access (NUMA) Shared local memory is fast Shared remote memory is slow 24-Nov-18 © 2003 Herlihy & Shavit

MCS Lock On NUMA machine without caches CLH is problematic because it spins on remote location MCS Queue Lock: FIFO order Small, Constant-size overhead Local spinning! 24-Nov-18 © 2003 Herlihy & Shavit

MCS Queue Lock 1 Critical Section 1 1 Swap into tail of list, wait for a “0” in local node 24-Nov-18 © 2003 Herlihy & Shavit

MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; } 24-Nov-18 © 2003 Herlihy & Shavit

MCS Queue Lock Point to my qnode Point pred to my node class MCSLock implements Lock { RMW queue; public void acquire(Qnode mynode) { Qnode pred = queue.swap(mynode); if (pred != null) { mynode.locked = true; pred.next = mynode; while (mynode.locked) {} }}} Point to my qnode Point pred to my node Wait until unlocked 24-Nov-18 © 2003 Herlihy & Shavit (3)

Purple Acquires Lock locked idle false 24-Nov-18 © 2003 Herlihy & Shavit

Red Wants Lock locked allocate qnode true false 24-Nov-18 © 2003 Herlihy & Shavit

Red Wants Lock locked spinning true false 24-Nov-18 © 2003 Herlihy & Shavit

MCS Queue Lock No successor? Wait for successor Notify successor class MCSLock implements Lock { RMW queue; public void release(Qnode mynode) { if (mynode.next == null) { if (queue.CAS(mynode, null)) return; while (mynode.next == null) {} } mynode.next.locked = false; }} No successor? Wait for successor Notify successor 24-Nov-18 © 2003 Herlihy & Shavit (3)

Purple Release releasing swap By looking at the queue, I see another thread is active releasing swap false false I have to wait for that thread to finish 24-Nov-18 © 2003 Herlihy & Shavit (2)

Purple Release releasing spinning Enter CS false true false 24-Nov-18 © 2003 Herlihy & Shavit

Performance Test&Test&set A-Lock Exp-backoff MCS NUMA No Coherence (c. 1982) 24-Nov-18 © 2003 Herlihy & Shavit

How Different are Modern Machines TAS with backoff MCS 16 32 48 Sun Wildfire (c. 1998) experiments curtsey of Scott

Contention Eliminated We reduced contention by slotting thread access to a lock over time We saw that Queue Locks provide very tight slotting and limit invalidation traffic thus lowering contention with minimal latency 24-Nov-18 © 2003 Herlihy & Shavit

Java Synchronization synchronize (exp) {…actions …} wait – lock an object wait – release lock and suspend thread notify, notifyall – wake one or all to resume execution where it was suspended 24-Nov-18 © 2003 Herlihy & Shavit

Locks in Java Frequent Ubiquitous Benchmark: 765,000/second Every object has a (potential) lock Space overhead? Potentially huge Actual small (6% in Javac) 24-Nov-18 © 2003 Herlihy & Shavit

Paradox? Frequency Ubiquity Requires time efficiency Requires space efficiency 24-Nov-18 © 2003 Herlihy & Shavit

Solution Create lock only when needed Fast path for common case The Meta Lock: 2 bits in header Local spinning only 24-Nov-18 © 2003 Herlihy & Shavit

Java Synchronization Java compiled to byte code Must respect block structure Must deal with exceptions Nested locks OK Locks need to count 24-Nov-18 © 2003 Herlihy & Shavit

Jargon Watch Monitor lock Meta lock Modus Operandi Protects object Protects monitor lock Modus Operandi Acquire meta lock Manipulate monitor lock Release meta lock 24-Nov-18 © 2003 Herlihy & Shavit

Java Objects Class pointer Object header Multi-use word User-defined fields 24-Nov-18 © 2003 Herlihy & Shavit

Meta-Lock meta lock other stuff 2 bits 30 bits Multi-use word 24-Nov-18 © 2003 Herlihy & Shavit

Meta-Lock - Neutral Locked Waiters - Busy 2 bits = 4 states 24-Nov-18 © 2003 Herlihy & Shavit

Usual state: nothing happening Neutral State hash code age Usual state: nothing happening 24-Nov-18 © 2003 Herlihy & Shavit

pointer to lock records Locked State lock record pointer to lock records 1 Object is monitor-locked 24-Nov-18 © 2003 Herlihy & Shavit

Lock Record Owner thread Lock count Hash and age (displaced) Next lock record in queue Free list for unused records lock record pointer to lock record 1 Object is monitor-locked 24-Nov-18 © 2003 Herlihy & Shavit

Waiters State pointer to lock records 1 Monitor lock released, but other threads waiting to get in 24-Nov-18 © 2003 Herlihy & Shavit

Busy State environment pointer 1 1 Metalock is locked environment 24-Nov-18 © 2003 Herlihy & Shavit

Acquire Meta-Lock Swap it in Prepare new value BitField getMetaLock(ExecEnv *ee, Object *obj) { BitField busyBits = ee | BUSY; BitField lockBits = SWAP(busyBits, multiUseWordAddr(obj)); if (getLockState(lockBits) != BUSY) return lockBits; else return getMetaLockSlow(ee, lockBits); Swap it in Prepare new value Return if not already locked, Otherwise take slow path 24-Nov-18 © 2003 Herlihy & Shavit

Slow Path Acquire First thread knows it’s first Didn’t see BUSY bits Later threads know predecessor From result of SWAP 24-Nov-18 © 2003 Herlihy & Shavit

Release Meta-Lock Try to replace it (CAS in C returns old value) BitField releaseMetaLock(ExecEnv *ee, Object *obj, BitField releaseBits) { BitField busyBits = ee | BUSY; BitField lockBits =CAS(releaseBits, busyBits, multiUseWordAddr(obj)); if (lockBits != busyBits) releaseMetaLockSlow(ee, lockBits); Try to replace it (CAS in C returns old value) Value we expect Take slow path if unsuccessful 24-Nov-18 © 2003 Herlihy & Shavit

Release Slow Path Hand-off the metalock to next waiting thread Synchronize via sucessor’s environment structure … 24-Nov-18 © 2003 Herlihy & Shavit

Locking Objects Common cases No thread interaction needed Neutral Waiters Recursively locked No thread interaction needed 24-Nov-18 © 2003 Herlihy & Shavit

Locking Objects Mutex object (lock) Suspends on Condition variable Release processor Until condition is signalled Not a spin lock When it awakes, takes slow path Locked: go back to sleep Unlocked: update object and go for it 24-Nov-18 © 2003 Herlihy & Shavit

Unlocking Objects Common cases No thread interactions needed Recursive lock No other threads No thread interactions needed 24-Nov-18 © 2003 Herlihy & Shavit

Unlocking Object Obtain metalock Remove own lock record Wake up successor Release metalock Shorter queue Waiters state 24-Nov-18 © 2003 Herlihy & Shavit

Wait Acquire metalock Sets isWaitingForNotify field in execution environment Release metalock Wait for bit to be set Not a busy wait Can time out 24-Nov-18 © 2003 Herlihy & Shavit

Notify Acquire metalock Walk through queue Release metalock Notify: wake first waiting thread NotifyAll: wake all waiting threads Release metalock 24-Nov-18 © 2003 Herlihy & Shavit

Locking…not so easy after all Principles Create lock only when needed Fast path vs slow path Optimize the common case Locking…not so easy after all 24-Nov-18 © 2003 Herlihy & Shavit