Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani What Is This Talk About? Multi-cores widely available Developing concurrent software is not trivial Many challenges: parallelization, synch., isolation Manual locking is error prone, non compositional Recent proposal: atomic sections Raising the level of abstraction, is compositional Optimistic (transactions) implementations [Herlihy, Moss ISCA 93; Hammond et al. ISCA 04] [Shavit, Touitou PDC 95; Dice et al. DISC 06; Fraser, Harris TOPLAS 07] Limitations: non-reversible ops, overhead This talk: compiler support for atomic sections via pessimistic concurrency
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Static Lock Inference Framework Compiler support for atomic sections based on pessimistic concurrency Prevent conflicts using locks, no deadlocks Goal: reduce contention while avoiding deadlocks Lock Inference Compiler Concurrent program with atomic sections (runs on STM) Same program with locks for implementing atomic sections Specifies where, but not how Lightweight runtime support (locking library) Automatically supports non-reversible ops.
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Moving List Elements move (list* to, list* from) { atomic { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head to head from xy
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Moving List Elements move (list* to, list* from) { atomic { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head tofrom xy
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Attempt 1: Global Lock move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head tofrom xy Problem with Attempt 1: No parallelism with any other atomic sections acquire( GLOBAL ); release( GLOBAL ); Global lock protects entire memory
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 2: Fine-Grain Locks head tofrom xy acquire( &(from->head) ); … acquire( &(to->head) ); acquire( &(x->next) ); A fine-grain lock protects an individual memory address
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head tofrom xy move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 2: Fine-Grain Locks Problem with Attempt 2: may lead to deadlock acq(&(a->head) ); // deadlock here acq(&(b->head) ); acq(&(a->head) ); move(a, b)move(b, a) | acquire( &(from->head) ); acquire( &(to->head) ); acquire( &(x->next) );
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani move (list* to, list* from) { while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 3: Fine-Grain Locks at Entry elem* x = to->head; elem* y = from->head; from->head = null; … acquire( &(x->next) ); acquire( &(from->head) ); acquire( &(to->head) ); acquire( &(x->next) ); acquireAll({ } ); head tofrom xy Challenge #1: Protect locations ahead of time (at entry of atomic), i.e., find which addresses will be used inside atomic
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Protect when Entering Atomic Block Find corresponding expressions Acquire a lock for each shared location accessed within the atomic section, expressed in terms of expressions valid at the entry of the atomic block atomic { list* x = y[5]; list* d = x; d->head = NULL; } acquire( &(d->head) ) acquire( &(x->head) ) acquire( &(y[5]->head) ) Contribution #1: Identifying appropriate fine-grain locks at entry (via inter-procedural backward data-flow analysis)
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head move (list* to, list* from) { acquireAll({ } ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 3: Fine-Grain at Entry tofrom &(to->head) &(from->head) &(to->head->next) Problem with Attempt 3: Can t protect unbounded number of locations head
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head move (list* to, list* from) { acquireAll({ } ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 4: Multi-Grain Locks at Entry tofrom head A coarse-grain lock protects a set of memory locations Challenge #2: Mixing locks of multiple granularities while avoiding deadlocks
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Defining Multi-Grain Locks A fine-grain lock protects a single location A coarse-grain lock protects a set of locations Any traditional heap abstraction can be used to define coarse-grain locks E.g. types, points-to sets, shape abstractions Our compiler framework is parameterized Clients can specify the kind locks they want to use
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Borrow Database s locking protocol based on intention locks [Gray 76] Mixing Locks of Multiple Granularities Can t be held concurrently Global lock Coarse-grain locks Fine-grain locks Memory locations Contribution #2: We allow mixing locks of multiple granularities and avoid deadlocks
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Sound locking structure provided Protected by child is also protected by parent Map of expressions to locks Bounded (for termination) Soundness Theorem Compiler chooses set of locks protecting all memory accesses within atomic block Soundness Results &(to->head->next) &(to->head->next->next) … &(*->next) * Contribution #3: Framework is sound (for any sound lock structure instantiation)
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Lock structure instance: 3-level locks + effects Experiments Concurrent data-structures: rb-tree, hashtable Concurrent get (read-only), put, and remove operations 1.86Gz Intel Xeon dual-quad core machine Experimental Evaluation Global lock Points-to set locks [Steensgard s 96] Expression locks (limited in size) rw ro rw ro …
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Scalability Results Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani TH (rb-tree + hash w/rehash): 80% gets Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler didn t use fine-grain locks Scalability comparable to STM Global lock (exclusive) doesn t scale
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani TH (rb-tree + hash w/rehash): 80% puts Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks 2 coarse-grain (exclusive) locks are better than a single global lock High contention from re-hashing degrades STM performance
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani simple-hashtable: 80% gets Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler didn t use fine-grain locks for gets STM allows put and get concurrently
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani simple-hashtable: 80% puts Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler uses fine-grain locks for puts
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Differences with Recent Work No programmer annotations (other than atomic) Autolocker [McCloskey et al POPL 06] requires programmer annotations to choose appropriate granularity Moving fine-grain lock acquisitions to entry of atomic Acquiring fine-grain locks right before first use [Hindman, Grossman MSPC 06] is not fully pessimistic may generate deadlocks and need rollbacks Multi-grain locks without deadlocks Several pessimistic approaches use coarse-grained locks only [Hicks et al 06; Halpert et al. 07; Emmi et al. 07]
Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Conclusions and Future Work Lock inference framework for atomic sections Multi-grain locks to reduce contention and avoid deadlocks Soundness: accesses are protected, atomicity preserved Validation: resulting performance depends on application Locks preferable for non-reversible ops. or high-contention Future directions Better locking hierarchy instantiations (e.g. ownership) Optimizations (e.g. delay lock acquisitions) Hybrid systems (e.g. compiler support to optimize STMs)
?