Adaptive Software Lock Elision

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Chapter 6: Process Synchronization
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.
Linux Kernel Development Chapter 8. Kernel Synchronization Introduction Geum-Seo Koo Fri. Operating System Lab.
Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.
GridOS: Operating System Services for Grid Architectures
Transactional Memory : Hardware Proposals Overview
Alex Kogan, Yossi Lev and Victor Luchangco
Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios.
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Håkan Sundell Philippas Tsigas
Faster Data Structures in Transactional Memory using Three Paths
INTER-PROCESS COMMUNICATION
Challenges in Concurrent Computing
Changing thread semantics
Lecture 6: Transactions
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
Yiannis Nikolakopoulos
Shared Memory Programming
Does Hardware Transactional Memory Change Everything?
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Concurrency: Mutual Exclusion and Process Synchronization
Design and Implementation Issues for Atomicity
Software Transactional Memory Should Not be Obstruction-Free
Locking Protocols & Software Transactional Memory
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Presentation transcript:

Adaptive Software Lock Elision Amitabha Roy Systems Research Group, Computer Laboratory, University of Cambridge {amitabha.roy}@cl.cam.ac.uk 1. Introduction 4. Design Challenges Seamless co-existence of threads that do not speculate past a lock Basic idea : Log the version number of speculated locks Speculative threads ensure the lock versions are unchanged at commit time Non speculative threads check version numbers of objects before using them to ensure no committed but unwritten changes A rudimentary version of this multigranularity locking idea published [1] Memory management (no write after free by speculative threads) Should support a variable number of threads in the system – avoid epoch based solutions Use external metadata like TL2 For efficiency readers should not need to indirect outside objects Solution: Version number in objects + external lock Lock properties (preserve priority inheritance/fairness properties of locks) Non-speculative threads should never be blocked by speculative threads Should be able to copy out unwritten data from committed threads Should be able to prevent failed threads from writing to version numbers of freed objects Can achieve this by using OS/scheduler support to revoke fine grained locks Lock composition (make lock based programs easier to write) Problem: Issues with Atomic Blocks + Optimistic STM Inflexible concurrency control : Usually only optimistic concurrency control, not suitable for critical sections with low contention or low disjoint access parallelism, eg linux kernel[4] Not compatible with legacy software : Need to specify atomic blocks. Difficult to handle irrevocable actions such as call outs to legacy code/system calls or IO Solution: Software Lock Elision Retain locks as the primary means for concurrency control Enhance the locking API to support lock elision, that is executed optimistically/speculatively Coarse grained locks can now scale / used when the critical section does IO Provide support for explicit lock composition Dynamically elide locks for adaptive concurrency control Consequences: Easy retrofit to legacy software and elegant new applications Minimal programmer effort Allow multigranularity concurrency control on data structures Retain properties of locks such as fairness and priority inheritance compose(foo, foobar); compose(bar, foobar); 2. Mechanics lock(foo) lock(bar) lock(bar) lock(foo) Add metadata to locks struct sle_lock { base_lock lock; int version_number; int readers;} Elide locks dynamically Independent of underlying lock implementation Handle non-2PL nesting of locks in the program /* count the number of speculative locks held if positive * and the number of non-speculative locks held if negative */ speculation_level = 0 do_sle_lock(sle_lock) (dynamic_elide() or speculation_level > 0) and speculation_level >= 0 : speculation_level + +; log_elided_lock(sle_lock); Else : speculation_level- -; do_base_lock(sle_lock.base_lock); If(exclusive_mode) sle_lock.version + +; else atomic_inc(sle_lock.readers); do_sle_unlock(sle_lock) speculation_level < 0 : speculation_level + +; If(exclusive_mode) sle_lock.version + +; else atomic_dec(sle_lock.readers); speculation_level - -; If(speculation_level == 0) commit_speculative_changes(); safe_lock(foo)/safe_lock(bar): Acquire foobar in place of foo/bar Ensure foo/bar is free before proceeding Deadlock !! 5. Preliminary Results Scalable Locking [1] : Allow locks to be acquired transactionally and non-transactionally. Illustrated key ideas in software lock elision Test bed: Altix 4700, 38 NUMA nodes * 2 sockets * dual core = 152 Itanium2 cores, 456 GB overall shared memory Benchmark: Skip lists and Red Black trees, scalable locks vs. OSTM[2] → Scalable locks scales as well as OSTM and provides better performance by a constant factor (~2X) Asymmetry: 2 threads, each on a different NUMA node, all memory local to first node Benchmark: Increment a counter, compare OSTM, RSTM[2](all contention managers) and Scalable locks (with an MCS fairlock for conflict handling) 3. Speculation → Scalable locks provides perfect thread fairness, 50% accesses by each thread Executing speculatively whenever (speculation_depth > 0) Need to version reads and shadow changes to shared state Programmer knows what lock protects what data. Must explicitly mark data protected by elidable locks Option : object granularity using compiler extensions eg. with gcc style attributes struct red_black_tree_node { … } __attribute__((__speculative__)) Pointer dereferences call into the runtime struct red_black_tree_node *rbnode1, *rbnode2; …… rbnode1->parent = rbnode2 6. Adaptive Concurrency Control Measure the amount of contention (waiting threads) of a lock Measure the amount of disjoint access parallelism behind a lock (conflicts among speculating threads) Elide the lock only if sufficient contention AND disjoint access parallelism [decided by a call to dynamic_elide() ] Adds a version number Log read Return shadow copy Log dirty 7. References [1] Amitabha Roy, Keir Fraser and Steven Hand. A Transactional Approach to Lock Scalability. Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA08), Munich, Germany, June 2008 [2] Keir Fraser. Practical lock freedom. PhD thesis, Cambridge University Computer Laboratory, 2003. Also available as Technical Report UCAM-CL-TR-579. [3] Virendra J. Marathe et al. Lowering the overhead of software transactional memory. Technical Report, Condensed version appeared in TRANSACT 2006. [4] Christopher J. Rossbach et al. Txlinux: using and managing hardware transactional memory in an operating system. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 87–102. ACM, 2007. Read Log rbnode version Snapshot state Write Log Dirty Commit time 2PL fine grained write locks + verify read versions