Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.

Slides:



Advertisements
Similar presentations
1 Lecture 5 Towards a Verifying Compiler: Multithreading Wolfram Schulte Microsoft Research Formal Methods 2006 Race Conditions, Locks, Deadlocks, Invariants,
Advertisements

Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.
Supporting existing code in a transactional memory system Nate Nystrom Mukund Raghavachari IBM TRAMP 5 Mar 2007.
TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
1  1 =.
CM20145 Concurrency Control
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Guy Golan-GuetaTel-Aviv University Nathan Bronson Stanford University Alex Aiken Stanford University G. Ramalingam Microsoft Research Mooly Sagiv Tel-Aviv.
Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University.
Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.
Comparing and Optimising Parallel Haskell Implementations on Multicore Jost Berthold Simon Marlow Abyd Al Zain Kevin Hammond.
Improving OLTP scalability using speculative lock inheritance Ryan Johnson, Ippokratis Pandis, Anastasia Ailamaki.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Reduction, abstraction, and atomicity: How much can we prove about concurrent programs using them? Serdar Tasiran Koç University Istanbul, Turkey Tayfun.
Code Generation and Optimization for Transactional Memory Construct in an Unmanaged Language Programming Systems Lab Microprocessor Technology Labs Intel.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Evaluating Database-Oriented Replication Schemes in Software Transacional Memory Systems Roberto Palmieri Francesco Quaglia (La Sapienza, University of.
CS 5204 – Operating Systems 1 Scheduler Activations.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA.
WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Adaptive Software Lock Elision
Hathi: Durable Transactions for Memory using Flash
Lecture 20: Consistency Models, TM
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Alex Kogan, Yossi Lev and Victor Luchangco
Part 2: Software-Based Approaches
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
Faster Data Structures in Transactional Memory using Three Paths
INTER-PROCESS COMMUNICATION
Changing thread semantics
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Yiannis Nikolakopoulos
Lecture 22: Consistency Models, TM
Does Hardware Transactional Memory Change Everything?
Hybrid Transactional Memory
Design and Implementation Issues for Atomicity
Locking Protocols & Software Transactional Memory
Transactions and Concurrency
Optimistic Concurrency for Clusters via Speculative Locking
Deferred Runtime Pipelining for contentious multicore transactions
Foundations and Definitions
Lecture: Consistency Models, TM
Presentation transcript:

Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani What Is This Talk About? Multi-cores widely available Developing concurrent software is not trivial Many challenges: parallelization, synch., isolation Manual locking is error prone, non compositional Recent proposal: atomic sections Raising the level of abstraction, is compositional Optimistic (transactions) implementations [Herlihy, Moss ISCA 93; Hammond et al. ISCA 04] [Shavit, Touitou PDC 95; Dice et al. DISC 06; Fraser, Harris TOPLAS 07] Limitations: non-reversible ops, overhead This talk: compiler support for atomic sections via pessimistic concurrency

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Static Lock Inference Framework Compiler support for atomic sections based on pessimistic concurrency Prevent conflicts using locks, no deadlocks Goal: reduce contention while avoiding deadlocks Lock Inference Compiler Concurrent program with atomic sections (runs on STM) Same program with locks for implementing atomic sections Specifies where, but not how Lightweight runtime support (locking library) Automatically supports non-reversible ops.

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Moving List Elements move (list* to, list* from) { atomic { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head to head from xy

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Moving List Elements move (list* to, list* from) { atomic { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head tofrom xy

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Attempt 1: Global Lock move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; } head tofrom xy Problem with Attempt 1: No parallelism with any other atomic sections acquire( GLOBAL ); release( GLOBAL ); Global lock protects entire memory

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 2: Fine-Grain Locks head tofrom xy acquire( &(from->head) ); … acquire( &(to->head) ); acquire( &(x->next) ); A fine-grain lock protects an individual memory address

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head tofrom xy move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 2: Fine-Grain Locks Problem with Attempt 2: may lead to deadlock acq(&(a->head) ); // deadlock here acq(&(b->head) ); acq(&(a->head) ); move(a, b)move(b, a) | acquire( &(from->head) ); acquire( &(to->head) ); acquire( &(x->next) );

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani move (list* to, list* from) { while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 3: Fine-Grain Locks at Entry elem* x = to->head; elem* y = from->head; from->head = null; … acquire( &(x->next) ); acquire( &(from->head) ); acquire( &(to->head) ); acquire( &(x->next) ); acquireAll({ } ); head tofrom xy Challenge #1: Protect locations ahead of time (at entry of atomic), i.e., find which addresses will be used inside atomic

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Protect when Entering Atomic Block Find corresponding expressions Acquire a lock for each shared location accessed within the atomic section, expressed in terms of expressions valid at the entry of the atomic block atomic { list* x = y[5]; list* d = x; d->head = NULL; } acquire( &(d->head) ) acquire( &(x->head) ) acquire( &(y[5]->head) ) Contribution #1: Identifying appropriate fine-grain locks at entry (via inter-procedural backward data-flow analysis)

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head move (list* to, list* from) { acquireAll({ } ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 3: Fine-Grain at Entry tofrom &(to->head) &(from->head) &(to->head->next) Problem with Attempt 3: Can t protect unbounded number of locations head

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani head move (list* to, list* from) { acquireAll({ } ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll(); } Attempt 4: Multi-Grain Locks at Entry tofrom head A coarse-grain lock protects a set of memory locations Challenge #2: Mixing locks of multiple granularities while avoiding deadlocks

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Defining Multi-Grain Locks A fine-grain lock protects a single location A coarse-grain lock protects a set of locations Any traditional heap abstraction can be used to define coarse-grain locks E.g. types, points-to sets, shape abstractions Our compiler framework is parameterized Clients can specify the kind locks they want to use

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Borrow Database s locking protocol based on intention locks [Gray 76] Mixing Locks of Multiple Granularities Can t be held concurrently Global lock Coarse-grain locks Fine-grain locks Memory locations Contribution #2: We allow mixing locks of multiple granularities and avoid deadlocks

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Sound locking structure provided Protected by child is also protected by parent Map of expressions to locks Bounded (for termination) Soundness Theorem Compiler chooses set of locks protecting all memory accesses within atomic block Soundness Results &(to->head->next) &(to->head->next->next) … &(*->next) * Contribution #3: Framework is sound (for any sound lock structure instantiation)

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Lock structure instance: 3-level locks + effects Experiments Concurrent data-structures: rb-tree, hashtable Concurrent get (read-only), put, and remove operations 1.86Gz Intel Xeon dual-quad core machine Experimental Evaluation Global lock Points-to set locks [Steensgard s 96] Expression locks (limited in size) rw ro rw ro …

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Scalability Results Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani TH (rb-tree + hash w/rehash): 80% gets Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler didn t use fine-grain locks Scalability comparable to STM Global lock (exclusive) doesn t scale

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani TH (rb-tree + hash w/rehash): 80% puts Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks 2 coarse-grain (exclusive) locks are better than a single global lock High contention from re-hashing degrades STM performance

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani simple-hashtable: 80% gets Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler didn t use fine-grain locks for gets STM allows put and get concurrently

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani simple-hashtable: 80% puts Number of threads Execution time (sec) Global lock TL2 STM [Dice et al. DISC 06] Only coarse-grain locks Coarse + fine-grain locks Compiler uses fine-grain locks for puts

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Differences with Recent Work No programmer annotations (other than atomic) Autolocker [McCloskey et al POPL 06] requires programmer annotations to choose appropriate granularity Moving fine-grain lock acquisitions to entry of atomic Acquiring fine-grain locks right before first use [Hindman, Grossman MSPC 06] is not fully pessimistic may generate deadlocks and need rollbacks Multi-grain locks without deadlocks Several pessimistic approaches use coarse-grained locks only [Hicks et al 06; Halpert et al. 07; Emmi et al. 07]

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani Conclusions and Future Work Lock inference framework for atomic sections Multi-grain locks to reduce contention and avoid deadlocks Soundness: accesses are protected, atomicity preserved Validation: resulting performance depends on application Locks preferable for non-reversible ops. or high-contention Future directions Better locking hierarchy instantiations (e.g. ownership) Optimizations (e.g. delay lock acquisitions) Hybrid systems (e.g. compiler support to optimize STMs)

?