Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor.

Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor Technology Labs, Intel Intel Multicore University Research Conference Dec 8, 2005 By: Major Bhadauria

2 Copyright © 2006, CS 612 Motivation  Multiple cores face a serious programmability problem  Writing correct parallel programs is very difficult  Transactional Memory addresses key part of the problem  Makes parallel programming easier by simplifying coordination  Requires hardware support for performance  Implements lock-free data structures

3 Copyright © 2006, CS 612 Benefits of Going Lock-Free  Priority inversion: lower-priority process is preempted while holding lock needed by a higher-priority process  Convoying: A process holding a lock is de-scheduled, possibly stopping others programs from progressing needlessly.  Deadlock: Common problem of process A and B both needing to lock C and D.  Process B has lock for D.  Process A has lock for C  DEADLOCK ALLOWS US TO AVOID:

4 Copyright © 2006, CS 612 Problem: Lock-Based Synchronization  Software engineering problems  Lock-based programs do not compose  Performance and correctness tightly coupled  Timing dependent errors are difficult to find and debug  Performance problems  High performance requires finer grain locking  More and more locks add more overhead Need a better concurrency model for multi-core software Lock-based synchronization of shared data access Is fundamentally problematic

5 Copyright © 2006, CS 612 What is Transactional Memory (TM)? Basic mechanisms:  Isolation: Track read and writes, detect when conflicts occur  Version management: Record new/old values  Atomicity: Commit new values or abort back to old values Transactional Memory (TM) allows arbitrary multiple memory locations to be updated atomically and serially. begin_xaction A = A – 20 B = B + 20 A = A – B C = C + 20 end_xaction Thread 1 begin_xaction C = C - 30 A = A + 30 end_xaction Thread 2 Thread 1’s accesses and updates to A, B, C are atomic Thread 2 sees either “all” or “none” of Thread 1’s updates

6 Copyright © 2006, CS 612 Transactional Memory benefits  Focus: Multithreaded programmability crisis  Programmability & performance  Allows conservative synchronization  Programmer focuses on parallelism & correctness, HW extracts performance  Software engineering and composability  Allows library re-use and composition (locks make this very difficult)  Critical for wide multi-core demand  Makes high performance MT programming easier  Captures a fundamental, well-known, intuitive “atomic” construct  Been around for decades  Similar to a “critical section” but without its problems  No deadlocks, priority inversion, data races, unnecessary serialization

7 Copyright © 2006, CS 612 Software Transactional Memory  Software Transactional Memory (1995 until now)  Significant work from Sun, Brown, Cambridge, Microsoft  Serious performance limitations  Degrades “common” case of no conflicts/contention  >90% of transactions are no conflicts –90% of critical sections are uncontended: what if all these slowed down by 5X?  Serious deployability limitations  Relies on special runtime support  Invasive to applications and libraries  Is STM is too slow and too invasive to deploy?  Could there be better implementation?  But great to understand complex usage models of the future

8 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Minimum implementation builds on previous LL/SC structure:  Loading a shared value to read  Writing to a shared value  Commit/Abort Functions to commit or flush new values  Extensions for performance:  Store copy of old value in cache – reduce bus traffic, latency  Store committed data in cache – reduce bus traffic  Have a read value that you can later change. – reduce bus traffic  Have Validate function to check current status – reduce number of orphan transactions  Busy bus signal for transactions - reduce number of aborts

9 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Transactional Memory primitives:  Load-transactional (LT): reads value of a shared memory location into a private register.  Load-transactional-exclusive (LTX): similar to LT, but indicates that likely to be updated.  Store-transactional (ST): tentatively writes a value to a shared memory location, but not visible until a successful committal.  TM instructions- commit, abort, validate.  Commit makes tentative changes permanent if the transaction’s read set (locations referenced by LT) has not changed, and no other process had read any location within the write set (locations referenced by LTX and ST)  Abort discards updates to write set.  Validate –tests the current transaction status, T-continue, F-Abort

12 Copyright © 2006, CS 612 Hardware Support for Performance A.sum = 100 B.sum = 200 Core 1 begin_xaction A.withdraw(20) B.deposit(20) end_xaction Architectural Memory state A.sum = 100 B.sum = 200 1 1 A.sum = 80 B.sum = 220 A.sum = 80 B.sum = 220 1.Record recovery state 2.Buffer updates/track accesses 3.Commit if no external access (discard all updates if conflict) 1 Core 2 1 begin_xaction Sum = A.sum + B.sum end_xaction Coherence protocol for conflicts Core 2 sees $300 – never $280 or $320 A.sum = 100 B.sum = 200 1 1 Abort & restart

15 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Limitations  Transaction size implementation-dependent  Must finish in a single scheduling quantum  Locations accessed cannot exceed architecturally specified limit  Short duration, small data sets  Requires cache coherence model that’s sequentially consistent (otherwise may need fences)  Nested transactions problematic  Hence Programmer Must Be Aware of Hardware

16 Copyright © 2006, CS 612 What if HW is not sufficient?  This is the deployability challenge  The missing piece of the puzzle for all prior work…  Resource limitations are fundamental  Space: caches  More HW delays the inevitable: will always be an n+1 case  Time: scheduling quanta  Programmers have no control over time  Affects functionality, not just performance  Some transactions may never complete  Making HW limit explicit is difficult  Limited usage only  Unreasonable for high level languages  How do you architect it in an evolvable manner?

17 Copyright © 2006, CS 612 Virtualizing Transactional Memory  Hides hardware limitations like virtual memory hides physical memory limitations  2 modes  1 for common case which is built into HW  2 nd for buffer overflow, page faults, context switches or thread migration which is built using SW and HW data structures  Virtualization allows transactions to: be suspended, migrate, or overflow state from local buffers, and allows nested transactions  Extra challenge: Unlike normal TM, can’t only use cache coherence mechanisms, since need to detect conflicts b/w active transactions and transactions whose state partially or completely overflowed to virtual memory

18 Copyright © 2006, CS 612 Virtualize for Completeness Timer interrupts, Context switches, Exceptions,… Limited buffers Core 1 1 1 1 Virtual TM virtual address space Log/buffer space  Overflow management  Using virtual memory  Software libs. and microcode Out-of-band concurrency control  Programmer transparent  Performance isolation  Suspendable/swappable

19 Copyright © 2006, CS 612 Recent TM Research  Recently, focus on solving the harder problem of TM  Making the model immune to cache buffer size limitations, scheduling limitations, etc.  TCC (Stanford) (2004)  Same limitation as Herlihy/Moss for TM (size limited to local caches)  LTM (MIT), VTM (Intel), LogTM (Wisconsin) (2005)  Assume hardware TM support  Add support to allow transactions to be immune to resource limitations  Goals of each similar, approaches very different  LTM: only resource overflow  VTM: complete virtualization  LogTM: only resource overflow, AND optimize for COMMITs

20 Copyright © 2006, CS 612 Some Research Challenges  Large transactions  Language extensions  IO, loophole, escape hatches, …  Interaction and co-existence with  Other synchronization schemes: locks, flags, …  Other transactions  Database transaction  System transaction (Microsoft)  Other libraries, system software, operating system, …  Performance monitor, tuning, debugging, …  Open vs Closed Nesting  Interaction between transaction & non-transaction  Usage & Workload  PLDI workshop

21 Copyright © 2006, CS 612 TM – First Decade  IBM 801 Database Storage (1980s)  Lock attribute bits on virtual memory (via TLBs, PTEs) at 128 byte granularity  Load-Linked/Store Conditional (Jensen et al. 1987)  Optimistic update of single cache line (Alpha, MIPS, PowerPC)  Transactional Memory (Herlihy&Moss 1993)  Coined term; TM generalization of LL/SC  Instructions explicitly identify transactional loads and stores  Used dedicated transaction cache  Size of transactions limited to transaction cache  Oklahoma Update (Stone et al./IBM 1993)  Similar to TM, concurrently proposed  Didn’t use cache but dedicated monitored registers to operate upon

22 Copyright © 2006, CS 612 References 1.“Transactional Memory: Architectural Support for Lock-Free Data Structures”, Moss et. al, ISCA 1993 2.“Virtualizing Transactional Memory” K. Lai et. al, ISCA 2005 3.“LogTM: Log-based Transactional Memory”, Wood et. al, HPCA-12

Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor.

Similar presentations

Presentation on theme: "Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor.

Similar presentations

Presentation on theme: "Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor."— Presentation transcript:

Similar presentations

About project

Feedback