Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged M. Michael Josh Triplett Presented by: Jacob Lear (Some slides borrowed from Dr. Walpole’s lectures)
Review: Why Do Concurrent Programming? Hardware has been forced down the path of concurrency: – Can’t make cores much faster – Can put more than one core on a chip Concurrent programming is required for scalable performance of software on multi- core systems – Synchronization performance and scalability has become a critical issue
Review: Lock-Based Synchronization Pessimistic concurrency control Simple approach – Identify shared data – Associate lock with data – Acquire lock before use – Release lock after use Enforces mutual exclusion
Locking’s Strengths Intuitive and relatively easy to use. Can be used on existing commodity hardware. Well-defined locking API’s are standardized. Widespread usage and experienced user base. Contention effects are concentrated within locking primitives, allowing critical sections to run at full speed. Waiting on a lock minimally degrades performance of the rest of the system. Can protect a wide range of operations, including non-idempotent operations. Interacts naturally with a large variety of synchronization mechanisms. Interacts naturally with debuggers and other software tools. Ability to support disjoint access parallelism (with sufficient effort). Power-friendly.
Locking’s Weaknesses Prone to deadlock among other threads and interrupt handlers. – Implies non-composability. Susceptible to priority inversion. High contention on non-partitionable data structures. Can block other threads on thread failure. High synchronization overhead even at low levels of contention. Non-deterministic lock-acquisition latency.
Improving Locking Avoiding deadlock – Use a clear locking hierarchy. – Use conditional lock acquisition primitives. – Use deadlock detection and recovery. – Mask signals or interrupts while locks are held. – Avoid lock acquisition in interrupt handlers.
Improving Locking Avoiding priority inversion – Use priority inheritance. – Temporarily raise lock holder’s priority to that of highest priority task that might acquire lock. – Disable preemption. – Use RCU for readers.
Improving Locking Lock contention – Partition data and redesign algorithms, if possible. – Non-partitionable data structures are a problem. Lock overhead – Use RCU for readers. – Update-heavy workloads are a problem.
Improving Locking Avoiding convoying – Use scheduler-conscious synchronization. Resolving thread failures – Abort and restart. – Death detection and cleanup state. Addressing non-determinism – Use RCU for readers. – Use FIFO lock-acquisition primitives with limited threads.
Remaining Locking Challenges Better software tools for static analysis of lock- based software. Better software tools to evaluate lock contention. Better design rules for use of locking in large software systems. More work using locking with other synchronization methodologies. Need good locking algorithms for large update- heavy non-partitionable data structures.
Review: Transactional Memory Transactional Memory (TM) is a lock-free, non- blocking concurrency control mechanism based on transactions. TM allows programmers to define customized atomic operations that apply to multiple, independently chosen memory locations
TM’s Strengths TM is simple and elegant. Any sequence of memory loads and stores may be composed into a single atomic operation. Not prone to deadlock. Easier to create and understand multi-threaded code. Many implementations of TM are composable, meaning transactions can be nested. TM attains many of the performance and scalability benefits of fine-grained locking. Some implementations of TM are non-blocking.
TM’s Weaknesses Issues with non-idempotent operations, which would be performed multiple times on transaction retry. Expensive interaction with other synchronization mechanisms. Issues transactionalizing existing sequential programs resulting in excessive conflicts. Starvation of large transactions by smaller ones. Delay of high-priority processes via rollback of transactions due to confliction with those of a lower-priority process. Lack of support for HTM (Hardware Transactional Memory). Portability problems. Poor performance of STM (Software Transactional Memory). Privatization invalidated in high-performance STM. Poor interaction with many existing software tools.
Improving TM Adding support for non-idempotent operations – Include buffering mechanism within transaction. – Use pessimistic concurrency control. – “Inevitable transactions” Improving contention management – Use a contention manager with a good policy. – Convert read-only transactions to non- transactions.
Improving TM Supporting HTM – Need to wait for hardware to be available. – Fallback to STM when hardware not available. Improving STM performance – Relax non-blocking property. Improving debugger usage – Debug HTM using STM. – Requires better integration between HTM and STM.
Improving TM Addressing high overhead – Use TM for heavy-weight operations.
Conclusion Use the right tool for the job. Understand all the techniques, their strengths and weaknesses, and potential interactions.