Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor.

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
1 Hardware Transactional Memory (Herlihy, Moss, 1993) Some slides are taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
Concurrency unlocked Programming
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Hardware and Software transactional memory and usages in MRE
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Advanced Operating Systems (CS 202) Transactional memory Jan, 27, 2016 slide credit: slides adapted from several presentations, including stanford TCC.
ECE 1747: Parallel Programming Short Introduction to Transactions and Transactional Memory (a.k.a. Speculative Synchronization)
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
James Larus and Christos Kozyrakis
Speculative Lock Elision
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Virtualizing Transactional Memory
Transactional Memory : Hardware Proposals Overview
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 19: Transactional Memories III
Example Cache Coherence Problem
Transactional Memory Coherence and Consistency
The University of Adelaide, School of Computer Science
Changing thread semantics
Lecture 6: Transactions
Lecture 21: Transactional Memory
Transactional Memory An Overview of Hardware Alternatives
Part 1: Concepts and Hardware- Based Approaches
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
The University of Adelaide, School of Computer Science
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
The University of Adelaide, School of Computer Science
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
Presentation transcript:

Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor Technology Labs, Intel Intel Multicore University Research Conference Dec 8, 2005 By: Major Bhadauria

2 Copyright © 2006, CS 612 Motivation  Multiple cores face a serious programmability problem  Writing correct parallel programs is very difficult  Transactional Memory addresses key part of the problem  Makes parallel programming easier by simplifying coordination  Requires hardware support for performance  Implements lock-free data structures

3 Copyright © 2006, CS 612 Benefits of Going Lock-Free  Priority inversion: lower-priority process is preempted while holding lock needed by a higher-priority process  Convoying: A process holding a lock is de-scheduled, possibly stopping others programs from progressing needlessly.  Deadlock: Common problem of process A and B both needing to lock C and D.  Process B has lock for D.  Process A has lock for C  DEADLOCK ALLOWS US TO AVOID:

4 Copyright © 2006, CS 612 Problem: Lock-Based Synchronization  Software engineering problems  Lock-based programs do not compose  Performance and correctness tightly coupled  Timing dependent errors are difficult to find and debug  Performance problems  High performance requires finer grain locking  More and more locks add more overhead Need a better concurrency model for multi-core software Lock-based synchronization of shared data access Is fundamentally problematic

5 Copyright © 2006, CS 612 What is Transactional Memory (TM)? Basic mechanisms:  Isolation: Track read and writes, detect when conflicts occur  Version management: Record new/old values  Atomicity: Commit new values or abort back to old values Transactional Memory (TM) allows arbitrary multiple memory locations to be updated atomically and serially. begin_xaction A = A – 20 B = B + 20 A = A – B C = C + 20 end_xaction Thread 1 begin_xaction C = C - 30 A = A + 30 end_xaction Thread 2 Thread 1’s accesses and updates to A, B, C are atomic Thread 2 sees either “all” or “none” of Thread 1’s updates

6 Copyright © 2006, CS 612 Transactional Memory benefits  Focus: Multithreaded programmability crisis  Programmability & performance  Allows conservative synchronization  Programmer focuses on parallelism & correctness, HW extracts performance  Software engineering and composability  Allows library re-use and composition (locks make this very difficult)  Critical for wide multi-core demand  Makes high performance MT programming easier  Captures a fundamental, well-known, intuitive “atomic” construct  Been around for decades  Similar to a “critical section” but without its problems  No deadlocks, priority inversion, data races, unnecessary serialization

7 Copyright © 2006, CS 612 Software Transactional Memory  Software Transactional Memory (1995 until now)  Significant work from Sun, Brown, Cambridge, Microsoft  Serious performance limitations  Degrades “common” case of no conflicts/contention  >90% of transactions are no conflicts –90% of critical sections are uncontended: what if all these slowed down by 5X?  Serious deployability limitations  Relies on special runtime support  Invasive to applications and libraries  Is STM is too slow and too invasive to deploy?  Could there be better implementation?  But great to understand complex usage models of the future

8 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Minimum implementation builds on previous LL/SC structure:  Loading a shared value to read  Writing to a shared value  Commit/Abort Functions to commit or flush new values  Extensions for performance:  Store copy of old value in cache – reduce bus traffic, latency  Store committed data in cache – reduce bus traffic  Have a read value that you can later change. – reduce bus traffic  Have Validate function to check current status – reduce number of orphan transactions  Busy bus signal for transactions - reduce number of aborts

9 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Transactional Memory primitives:  Load-transactional (LT): reads value of a shared memory location into a private register.  Load-transactional-exclusive (LTX): similar to LT, but indicates that likely to be updated.  Store-transactional (ST): tentatively writes a value to a shared memory location, but not visible until a successful committal.  TM instructions- commit, abort, validate.  Commit makes tentative changes permanent if the transaction’s read set (locations referenced by LT) has not changed, and no other process had read any location within the write set (locations referenced by LTX and ST)  Abort discards updates to write set.  Validate –tests the current transaction status, T-continue, F-Abort

10 Copyright © 2006, CS 612 Example

11 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Hardware Modifications (somewhat) OLD NEW

12 Copyright © 2006, CS 612 Hardware Support for Performance A.sum = 100 B.sum = 200 Core 1 begin_xaction A.withdraw(20) B.deposit(20) end_xaction Architectural Memory state A.sum = 100 B.sum = A.sum = 80 B.sum = 220 A.sum = 80 B.sum = Record recovery state 2.Buffer updates/track accesses 3.Commit if no external access (discard all updates if conflict) 1 Core 2 1 begin_xaction Sum = A.sum + B.sum end_xaction Coherence protocol for conflicts Core 2 sees $300 – never $280 or $320 A.sum = 100 B.sum = Abort & restart

13 Copyright © 2006, CS 612 Herlihy & Moss’ Benchmarks  Compared against:  Test-and-test-and-set (TTS) w/ exponential backoff  Software queuing (QOSB mechanism)  LOAD_LINKED/STORE_COND (LL/SC)

14 Copyright © 2006, CS 612 Herlihy & Moss’ Test 1 Results

15 Copyright © 2006, CS 612 Herlihy & Moss’ Implementation  Limitations  Transaction size implementation-dependent  Must finish in a single scheduling quantum  Locations accessed cannot exceed architecturally specified limit  Short duration, small data sets  Requires cache coherence model that’s sequentially consistent (otherwise may need fences)  Nested transactions problematic  Hence Programmer Must Be Aware of Hardware

16 Copyright © 2006, CS 612 What if HW is not sufficient?  This is the deployability challenge  The missing piece of the puzzle for all prior work…  Resource limitations are fundamental  Space: caches  More HW delays the inevitable: will always be an n+1 case  Time: scheduling quanta  Programmers have no control over time  Affects functionality, not just performance  Some transactions may never complete  Making HW limit explicit is difficult  Limited usage only  Unreasonable for high level languages  How do you architect it in an evolvable manner?

17 Copyright © 2006, CS 612 Virtualizing Transactional Memory  Hides hardware limitations like virtual memory hides physical memory limitations  2 modes  1 for common case which is built into HW  2 nd for buffer overflow, page faults, context switches or thread migration which is built using SW and HW data structures  Virtualization allows transactions to: be suspended, migrate, or overflow state from local buffers, and allows nested transactions  Extra challenge: Unlike normal TM, can’t only use cache coherence mechanisms, since need to detect conflicts b/w active transactions and transactions whose state partially or completely overflowed to virtual memory

18 Copyright © 2006, CS 612 Virtualize for Completeness Timer interrupts, Context switches, Exceptions,… Limited buffers Core Virtual TM virtual address space Log/buffer space  Overflow management  Using virtual memory  Software libs. and microcode Out-of-band concurrency control  Programmer transparent  Performance isolation  Suspendable/swappable

19 Copyright © 2006, CS 612 Recent TM Research  Recently, focus on solving the harder problem of TM  Making the model immune to cache buffer size limitations, scheduling limitations, etc.  TCC (Stanford) (2004)  Same limitation as Herlihy/Moss for TM (size limited to local caches)  LTM (MIT), VTM (Intel), LogTM (Wisconsin) (2005)  Assume hardware TM support  Add support to allow transactions to be immune to resource limitations  Goals of each similar, approaches very different  LTM: only resource overflow  VTM: complete virtualization  LogTM: only resource overflow, AND optimize for COMMITs

20 Copyright © 2006, CS 612 Some Research Challenges  Large transactions  Language extensions  IO, loophole, escape hatches, …  Interaction and co-existence with  Other synchronization schemes: locks, flags, …  Other transactions  Database transaction  System transaction (Microsoft)  Other libraries, system software, operating system, …  Performance monitor, tuning, debugging, …  Open vs Closed Nesting  Interaction between transaction & non-transaction  Usage & Workload  PLDI workshop

21 Copyright © 2006, CS 612 TM – First Decade  IBM 801 Database Storage (1980s)  Lock attribute bits on virtual memory (via TLBs, PTEs) at 128 byte granularity  Load-Linked/Store Conditional (Jensen et al. 1987)  Optimistic update of single cache line (Alpha, MIPS, PowerPC)  Transactional Memory (Herlihy&Moss 1993)  Coined term; TM generalization of LL/SC  Instructions explicitly identify transactional loads and stores  Used dedicated transaction cache  Size of transactions limited to transaction cache  Oklahoma Update (Stone et al./IBM 1993)  Similar to TM, concurrently proposed  Didn’t use cache but dedicated monitored registers to operate upon

22 Copyright © 2006, CS 612 References 1.“Transactional Memory: Architectural Support for Lock-Free Data Structures”, Moss et. al, ISCA “Virtualizing Transactional Memory” K. Lai et. al, ISCA “LogTM: Log-based Transactional Memory”, Wood et. al, HPCA-12