Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
1 Concurrency: Deadlock and Starvation Chapter 6.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Chapter 11 Concurrency Control. Lock-Based Protocols  A lock is a mechanism to control concurrent access to a data item  Data items can be locked in.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Deadlocks Silberschatz Ch. 7 and Priority Inversion Problems.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
CS333 Intro to Operating Systems Jonathan Walpole.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Process Management Deadlocks.
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
James Larus and Christos Kozyrakis
Synchronization: Distributed Deadlock Detection
By Michael Greenwald and David Cheriton Presented by Jonathan Walpole
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Memory Caches & TLB Virtual Memory
Lecture 21 Synchronization
Part 2: Software-Based Approaches
Concurrency Control.
CMSC 611: Advanced Computer Architecture
Transaction Management
Changing thread semantics
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
COT 5611 Operating Systems Design Principles Spring 2014
COP 4600 Operating Systems Fall 2010
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
Chapter 15 : Concurrency Control
Lecture 22: Consistency Models, TM
Lecture 2 Part 2 Process Synchronization
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Concurrency: Mutual Exclusion and Process Synchronization
Software Transactional Memory Should Not be Obstruction-Free
CS510 - Portland State University
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
CSE 153 Design of Operating Systems Winter 19
Lecture 23: Transactional Memory
Lecture: Consistency Models, TM
CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) .   Parallel.
CSE 451 Section 1/27/2000.
Concurrent Cache-Oblivious B-trees Using Transactional Memory
CSE 153 Design of Operating Systems Winter 2019
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole

Agenda Locking Critique TM Critique Need for a combined approach

Locking Simple approach based on mutual exclusion: Allow only a single CPU at a time to manipulate a given set of shared objects Lock granularity determines scalability Partition the shared data and protect each partition with separate a lock This allows greater concurrency but also creates problems

Locking Strengths Can be used on existing commodity hardware Standardized well-defined locking APIs e.g. the POSIX pthread API allows lock-based code to run on multiple platforms. Contention effects are concentrated within locking primitives, allowing critical sections to run at full speed. Locking can protect a wide range of operations, including non-idempotent operations such as I/O Waiting on a lock minimally degrades performance of the rest of the system. Interacts naturally with a variety of synchronization mechanisms, including reference counting, atomic operations, non-blocking synchronization, RCU Interacts in a natural manner with debuggers

Locking Weaknesses Since lock granularity determines scalability, partition the shared data and protect each partition with separate lock. While this increases concurrency, it also creates problems: Loss of modularity: need to know what locks other modules use before calling them in order to avoid self-deadlock Multiple threads may need to acquire the same set of locks. Acquiring these in different orders can cause deadlock Self-deadlock can result if interrupt is received while a lock is held by a thread and the interrupt handler also needs that lock Lack of composibility: operations may be thread safe individually, but not composed together. E.g. delete item from one hashtable and insert into another. Intermediate state (item is in neither hashtable) is visible Some data structures such as unstructured graphs are difficult to partition. May have to settle for coarser locks which leading to high contention and reduced scalability

Locking Weaknesses Priority inversion can cause a high-priority thread to miss its real-time scheduling deadline, which is unacceptable in safety-critical systems. Non-deterministic lock acquisition latency is a problem for real-time workloads. Locking uses expensive instructions and creates high synchronization overhead even at low levels of contention. Worse with fine grained locking. Locking introduces communication related cache misses into read mostly workloads which would otherwise run entirely within the cpu cache. Indefinite blocking: due to termination of the lock holder. Creates problems for fault tolerant software. Convoying: Preemption or blocking (due to I/O, page fault etc.) of the lock holder can block other threads.

Solutions to Locking Problems Priority inversion Lower priority threads temporarily inherits priority of high priority blocked thread Lock holder is assigned priority of the highest priority task that might acquire that lock Preemption is disabled entirely while locks are held Deadlock Require a clear locking hierarchy; when multiple locks are acquired they are acquired in a pre-specified order If lock not available, thread surrenders conflicting locks and retries Detect deadlock; break cycle by terminating selected threads based upon priority, work done Track lock acquisition, dynamically detect potential deadlock and prevent Self-deadlock Disable interrupts

Solutions to Locking Problems Non-partitionable data structures Redesign to use partition-able data structures such as hash tables etc. In read mostly situations, locked updates may be paired with read-copy-update (RCU) or hazard pointers Convoying Use scheduler-conscious synchronization But this does not help the case of the lock holder terminating Non-deterministic lock acquisition latency Use RCU for read side critical sections

Transactional Memory Approach borrowed from DBMS: A programmer delimits the regions of code that access shared data TM system executes these regions atomically and in isolation Mechanism: Updates done during the transaction are buffered. Validation is done to check that isolation was not violated due to conflict. If passed, updates are committed Else updates are discarded and the transaction is retried TM is a non-blocking synchronization mechanism: at least one thread will succeed Optimistic approach performs well when critical regions do not interfere with each other

HW Transactional Memory Hardware TM: New instructions (LT, LTX, ST, Abort, Commit, Validate) Fully-associative transactional cache for buffering updates Straightforward extensions to multi-processor cache coherence protocol to detect transaction conflicts Drawbacks: Portability: need special hardware Size of transaction limited by transaction cache: overflow of transaction cache addressed by virtualization in newer implementations

SW Transactional Memory Software TM: Revocable Two Phase Locking for Writes: A transaction locks all objects that it writes and does not release these locks until the transaction terminates. If deadlock occurs then one transaction aborts, releasing its locks and reverting its writes. Optimistic Concurrency Control for Reads: Whenever a transaction reads from an object, it logs the version it read. When the transaction commits, it verifies that these are still the current versions of the objects. Drawbacks: Poor performance compared to locking Atomic operations for acquiring shared object handles Cost of consistency validation Effect on cache of shared object metadata Dynamic allocation, data copying and memory reclamation

TM’s Strengths Provide performance and scalability by allowing multiple, non-interfering threads to concurrently execute in a critical section. Attains benefits of fine-grained locking but without the effort and complexity Non-blocking: at least one transaction succeeds. Fault tolerance: failure of one transaction will not affect others For multi-word objects, requires fewer memory accesses than locks since no explicit lock variable Can be used with difficult to partition data structures such as unstructured graphs Can exploit concurrency where locks cannot: e.g. enque at head and deque at tail, typically can proceed concurrently except when empty queue in which case both must update both head and tail Modular & Composible: transactions may be nested or composed

TM’s Weaknesses The performance of transactions might suffer from excessive restarts along high-contention access paths to particular data structures When transactions collide, only one can proceed, others must be rolled back. This can result in Starvation of large transactions by smaller ones delay of a high-priority thread via rollback of its transactions due to conflicts with those of a lower-priority thread Cannot be used with non-idempotent operations such as I/O due to possibility of restarts In earlier slides saw drawbacks of HTM and STM Certain STM optimizations can result in allowing concurrent access to privatized data.

TM’s Weaknesses (cont.) Certain STM optimizations can result in allowing concurrent access to privatized data:

TM’s Weaknesses (cont.) Cannot be used with non-idempotent operations such as I/O due to possibility of restarts: Client cannot defer message until commit since it depends on the Server’s reply

Solutions to TMs Problems Buffered I/O might be addressed by including the buffering mechanism within the scope of the transactions doing I/O Inevitable transactions which always commit can have non-idempotent operations. However there can be at most one of these. Contention Management Carefully select the transactions to roll back based on priority, amount of work done etc. Convert read only transactions to non-transactional form, in a manner similar to the pairing of locking with RCU. For portability: use HTM when applicable, but fall back to STM otherwise. Reduce STM overheads of indirection, dynamic allocation, data copying, and memory reclamation by relaxing the non-blocking property

Combined Approach Transactions perform well when critical regions do not interfere with each other, while locks usually perform better for highly contended critical sections. Use locks for partitionable data which can be assigned to different CPUs Use RCU/hazard pointers for read heavy workloads Use TM for update-heavy workloads using large non-partitionable data structures Atomic operations spanning multiple data structures TM should be made easily usable with locking so that the best approach is usable