Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Haswell
Transactional Memory [HerlihyMoss93]
Transactional Memory Memory Transactions are collections of reads and writes executed atomically Should Provide –Disjoint Access Parallelism Should maintain internal and external consistency –External (Serializability): with respect to the interleavings of other transactions. –Internal (Opacity): the transaction itself should operate on a consistent state.
External Consistency Application Memory X Y 0 0 Cannot both return 4 Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y Canonical synchronization problem all STM/HTM implementations must solve
Locking STMs Map Array of Versioned- Write-Locks Application Memory V#
Commit Time Locking (Write Buff) 1.To Read/Write: Check unlocked add to Read/Write set 2.Acquire Locks 3.Validate read/write v#’s unchanged 4.Write Values 5.Release each lock with v#+1 V# 0 Mem Locks V#+1 0 V# 0 V# 1 X Y V#+1 0 V# 1 V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y Read/WriteLockUnlockValidateWrite
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y Transaction B: Write x Write y Transaction A: Read x = 4 Transaction A: Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Transaction B: Read x Read y Transaction A: Write x = 4 Transaction A: Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84
TL2/TinySTM’s Global Clock Have a shared global version clock Incremented by writing transactions (as infrequently as possible) Read by all transactions Used to validate state viewed by transaction is always opaque [DiceShalevShavit06/ReigelFelberFetzer06]
TL2 Style STM 1.Read Vclock 2.Read/Write: if unlocked and v# less clock add to Read/Write-Set 3.Acquire Locks 4.Increment Clock 5.Validate each v# less than clock 6.Write values 7.Release locks with v# = new clock 100 VClock V# Mem Locks X Y V# V# X Y Read/WriteLockUnlockValidateWriteRead ClockInc
TL2 Style STM Advantages –Great Disjoint Access Parallelism Disadvantages –Accessing Meta-Data is Expensive –Progress guarantee is only deadlock freedom
NOrec STM Use shared global clock as a seqlock Validation in every read if a seqlock change is detected Value-based validation: no need for meta-data (local time stamps or locks) [DalessandroSpearScott10]
NOrec STM 100 seqlock X Y Read/Write (with validation if seqlock changed) Not odd? seqlock Lock seqlock (set odd) with validation if seqlock changed Unlock seqlock (set even) Write X Y R/W Set Z = = 104 Z Z
NOrec STM Advantages –No Expensive Meta-Data Disadvantages –Poor Disjoint Access Parallelism (all writes are serialized by clock) –Progress guarantee is only starvation freedom
Hardware TM [HerlihyMoss93,IBM/Intel13] Advantages –Everything in Hardware, No Meta Data –Great Disjoint Access Parallelism Disadvantages –No Progress Guarantee; Fail because of: Unsupported instructions: system or protected instructions Exceptions: page faults and similar Capacity limit: too many accessed locations
Hybrid TM [Moir,Damron et. Al, Kumar et. al] Fast-Path: Execute Trans Using Best Effort HTM –If it Aborts because of Special Instructions or Transaction Too Large, then… Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM
Traditional Hybrid TM 0 0 Update locks Software Transaction Hardware Transaction Test Versioned- Write- Lock in every Read/Write. Update in Write Versioned- Write-Lock Versioned- Write-Lock [DamronFedorovaLevLuchangcoMoirNussbaum06]
Traditional Hybrid TM Advantages –Progress Guarantee of STM Disadvantages –HTM must access meta data –Fast path is actually slow because of extra load and branch on every read
Traditional Hybrid TM
Phased TM [LevMoirNussbaum07] Two modes: all hardware or all software Shared global mode indicator If some hardware transaction aborts switch to software mode Eventually mode reverts back to hardware
Phased TM Advantages –Fast-path Pure HTM: No Meta Data Accesses Disadvantages –Single Software Transaction Causes all HTM to switch to STM slow path –Not clear how to tune to avoid frequent mode transitions…
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Not odd? seqlock Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Software will fail seqlock validation!
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Guaranteed External Consistency
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Problem: hardware opacity
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Hardware B: Read x Read y Software A: Lock seqlock +1 Write x = 4 Write y = 2 Unlock seqlock+1 DIV by 0 ERROR! Compute z = 1/(x-y) … Odd? Seqlock 84
Hybrid Norec (2nd Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will detect seqlock invalidation! Not odd? seqlock Guarantee hardware opacity
Hybrid NOrec Advantages –Fast-path HTM: No Meta Data Accesses Disadvantages –Limited Disjoint Access Parallelism –Seqlock is in hardware tracking set throughout HTM transaction –Major sequential bottleneck
Possible Solutions Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirSco ttSpear2011] Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read But sandboxing is complex…and non- transactional ops only available in AMD proposal, not actual IBM or Intel …
Reduced Hardware Approach to HyTM Use short hardware transactions in the software slow-path I.e. create new “mixed” software/hardware path Not in order to make slow-path faster –But rather, in order to remove meta-data accesses from fast path Default to all software if mixed path fails [MatveevShavit13]
Transactional Writes Imply Hardware Opacity X Y 4 Hardware B: Read x Read y Trans A: Write x = 4 Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84 2 If in a hardware transaction this cannot happen…
Reduced Hardware NOrec In Slow-path commit, use a small hardware transaction to: –Write all values –Check seqlock has not changed –Write seqlock+1 In Fast-path: –Move seqlock test to end, un-instrumented read/writes [MatveevShavit13]
Reduced Hardware NOrec Software Norec: Hardware: Read/Write (no instrumentation) Write seqlock +1 Read/Write (with validation) Changed? seqlock Lock seqlock (set odd) Validate In HTM Trans: Write values Changed? seqlock seqlock +1 Hardware will detect write conflict without seqlock! Changed? seqlock Guarantee fast-path opacity without having seqlock in TM tracking set for long Write Lock seqlock (set even) Read seqlock
Reduced Hardware NOrec Properties –Fast-path: No Meta Data; No instrumentation of reads or writes –Slow-path: –short hardware transaction: size of write set –can repeatedly attempt short hardware transaction in commit
Reduced Hardware NOrec Advantages –Hardware Disjoint Access Parallelism – seqlock accessed only at end of HTM transaction –Surprise: 1 st HyTM that is Obstruction-free and Privatizing –Disadvantages –Still window of possible abort due to seqlock increment
Reduced Hardware NOrec
Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Hardware Will See Software Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock Write In HTM Trans: Write values Hardware will detect write conflict
Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Problem: if between validate and hardware write, can have inconsistency Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock In HTM Trans: Write values Hardware will detect write conflict Solution: combine validation and writes in single transaction In HTM Trans: Validate and Write values
Reduced Hardware TL2 Style Advantages –Complete Disjoint Access Parallelism – GV6 clock incremented on aborts only –Obstruction-free –Disadvantages –No privatization –Mixed path transaction size of meta-data set
RH1: Reduced Hardware TL2 Style
HyTM: Long Journey Combination of ideas: –hardware transactions, –global clocks, –no meta data access, –mixed hardware software paths And there is still room for improvement
Reduced Hardware NOrec
Reduced Hardware Transactions
RH Performance