Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
CS6223: Distributed Systems
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Distributed Systems 2006 Styles of Client/Server Computing.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Memory Management (II)
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Chapter 9 Transaction Management and Concurrency Control
Software Transactional Memory Nir Shavit Tel-Aviv University and Sun Labs “Where Do We Come From? What Are We? Where Are We Going?”
Transaction Management Chapter 9. What is a Transaction? A logical unit of work on a database A logical unit of work on a database An entire program An.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
An Introduction to Software Transactional Memory
08_Transactions_LECTURE2 DBMSs should guarantee ACID properties (Atomicity, Consistency, Isolation, Durability). This is typically done by guaranteeing.
Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
WG5: Applications & Performance Evaluation Pascal Felber
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Distributed synchronization and mutual exclusion Distributed Transactions.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Transactional Locking Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Hardware and Software transactional memory and usages in MRE
A Relativistic Enhancement to Software Transactional Memory Philip Howard, Jonathan Walpole.
9 1 Chapter 9_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Transactional Memory Companion slides for
Lecture 20: Consistency Models, TM
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Transactions and Reliability
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
Transaction Management and Concurrency Control
Faster Data Structures in Transactional Memory using Three Paths
Concurrency Control.
Chapter 10 Transaction Management and Concurrency Control
Chapter 15 : Concurrency Control
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Software Transactional Memory Should Not be Obstruction-Free
Locking Protocols & Software Transactional Memory
Lecture 23: Transactional Memory
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
Concurrency control (OCC and MVCC)
Presentation transcript:

Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Haswell

Transactional Memory [HerlihyMoss93]

Transactional Memory Memory Transactions are collections of reads and writes executed atomically Should Provide –Disjoint Access Parallelism Should maintain internal and external consistency –External (Serializability): with respect to the interleavings of other transactions. –Internal (Opacity): the transaction itself should operate on a consistent state.

External Consistency Application Memory X Y 0 0 Cannot both return 4 Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y Canonical synchronization problem all STM/HTM implementations must solve

Locking STMs Map Array of Versioned- Write-Locks Application Memory V#

Commit Time Locking (Write Buff) 1.To Read/Write: Check unlocked add to Read/Write set 2.Acquire Locks 3.Validate read/write v#’s unchanged 4.Write Values 5.Release each lock with v#+1 V# 0 Mem Locks V#+1 0 V# 0 V# 1 X Y V#+1 0 V# 1 V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y Read/WriteLockUnlockValidateWrite

Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y Transaction B: Write x Write y Transaction A: Read x = 4 Transaction A: Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR

Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Transaction B: Read x Read y Transaction A: Write x = 4 Transaction A: Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84

TL2/TinySTM’s Global Clock Have a shared global version clock Incremented by writing transactions (as infrequently as possible) Read by all transactions Used to validate state viewed by transaction is always opaque [DiceShalevShavit06/ReigelFelberFetzer06]

TL2 Style STM 1.Read Vclock 2.Read/Write: if unlocked and v# less clock add to Read/Write-Set 3.Acquire Locks 4.Increment Clock 5.Validate each v# less than clock 6.Write values 7.Release locks with v# = new clock 100 VClock V# Mem Locks X Y V# V# X Y Read/WriteLockUnlockValidateWriteRead ClockInc

TL2 Style STM Advantages –Great Disjoint Access Parallelism Disadvantages –Accessing Meta-Data is Expensive –Progress guarantee is only deadlock freedom

NOrec STM Use shared global clock as a seqlock Validation in every read if a seqlock change is detected Value-based validation: no need for meta-data (local time stamps or locks) [DalessandroSpearScott10]

NOrec STM 100 seqlock X Y Read/Write (with validation if seqlock changed) Not odd? seqlock Lock seqlock (set odd) with validation if seqlock changed Unlock seqlock (set even) Write X Y R/W Set Z = = 104 Z Z

NOrec STM Advantages –No Expensive Meta-Data Disadvantages –Poor Disjoint Access Parallelism (all writes are serialized by clock) –Progress guarantee is only starvation freedom

Hardware TM [HerlihyMoss93,IBM/Intel13] Advantages –Everything in Hardware, No Meta Data –Great Disjoint Access Parallelism Disadvantages –No Progress Guarantee; Fail because of: Unsupported instructions: system or protected instructions Exceptions: page faults and similar Capacity limit: too many accessed locations

Hybrid TM [Moir,Damron et. Al, Kumar et. al] Fast-Path: Execute Trans Using Best Effort HTM –If it Aborts because of Special Instructions or Transaction Too Large, then… Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM

Traditional Hybrid TM 0 0 Update locks Software Transaction Hardware Transaction Test Versioned- Write- Lock in every Read/Write. Update in Write Versioned- Write-Lock Versioned- Write-Lock [DamronFedorovaLevLuchangcoMoirNussbaum06]

Traditional Hybrid TM Advantages –Progress Guarantee of STM Disadvantages –HTM must access meta data –Fast path is actually slow because of extra load and branch on every read

Traditional Hybrid TM

Phased TM [LevMoirNussbaum07] Two modes: all hardware or all software Shared global mode indicator If some hardware transaction aborts switch to software mode Eventually mode reverts back to hardware

Phased TM Advantages –Fast-path Pure HTM: No Meta Data Accesses Disadvantages –Single Software Transaction Causes all HTM to switch to STM slow path –Not clear how to tune to avoid frequent mode transitions…

Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Not odd? seqlock Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Software will fail seqlock validation!

Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock

Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Guaranteed External Consistency

Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Problem: hardware opacity

Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Hardware B: Read x Read y Software A: Lock seqlock +1 Write x = 4 Write y = 2 Unlock seqlock+1 DIV by 0 ERROR! Compute z = 1/(x-y) … Odd? Seqlock 84

Hybrid Norec (2nd Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will detect seqlock invalidation! Not odd? seqlock Guarantee hardware opacity

Hybrid NOrec Advantages –Fast-path HTM: No Meta Data Accesses Disadvantages –Limited Disjoint Access Parallelism –Seqlock is in hardware tracking set throughout HTM transaction –Major sequential bottleneck

Possible Solutions Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirSco ttSpear2011] Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read But sandboxing is complex…and non- transactional ops only available in AMD proposal, not actual IBM or Intel …

Reduced Hardware Approach to HyTM Use short hardware transactions in the software slow-path I.e. create new “mixed” software/hardware path Not in order to make slow-path faster –But rather, in order to remove meta-data accesses from fast path Default to all software if mixed path fails [MatveevShavit13]

Transactional Writes Imply Hardware Opacity X Y 4 Hardware B: Read x Read y Trans A: Write x = 4 Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84 2 If in a hardware transaction this cannot happen…

Reduced Hardware NOrec In Slow-path commit, use a small hardware transaction to: –Write all values –Check seqlock has not changed –Write seqlock+1 In Fast-path: –Move seqlock test to end, un-instrumented read/writes [MatveevShavit13]

Reduced Hardware NOrec Software Norec: Hardware: Read/Write (no instrumentation) Write seqlock +1 Read/Write (with validation) Changed? seqlock Lock seqlock (set odd) Validate In HTM Trans: Write values Changed? seqlock seqlock +1 Hardware will detect write conflict without seqlock! Changed? seqlock Guarantee fast-path opacity without having seqlock in TM tracking set for long Write Lock seqlock (set even) Read seqlock

Reduced Hardware NOrec Properties –Fast-path: No Meta Data; No instrumentation of reads or writes –Slow-path: –short hardware transaction: size of write set –can repeatedly attempt short hardware transaction in commit

Reduced Hardware NOrec Advantages –Hardware Disjoint Access Parallelism – seqlock accessed only at end of HTM transaction –Surprise: 1 st HyTM that is Obstruction-free and Privatizing –Disadvantages –Still window of possible abort due to seqlock increment

Reduced Hardware NOrec

Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Hardware Will See Software Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock Write In HTM Trans: Write values Hardware will detect write conflict

Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Problem: if between validate and hardware write, can have inconsistency Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock In HTM Trans: Write values Hardware will detect write conflict Solution: combine validation and writes in single transaction In HTM Trans: Validate and Write values

Reduced Hardware TL2 Style Advantages –Complete Disjoint Access Parallelism – GV6 clock incremented on aborts only –Obstruction-free –Disadvantages –No privatization –Mixed path transaction size of meta-data set

RH1: Reduced Hardware TL2 Style

HyTM: Long Journey Combination of ideas: –hardware transactions, –global clocks, –no meta data access, –mixed hardware software paths And there is still room for improvement

Reduced Hardware NOrec

Reduced Hardware Transactions

RH Performance