Download presentation
Presentation is loading. Please wait.
Published byGeraldine Greer Modified over 9 years ago
1
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
2
Haswell
3
Transactional Memory [HerlihyMoss93]
4
Transactional Memory Memory Transactions are collections of reads and writes executed atomically Should Provide –Disjoint Access Parallelism Should maintain internal and external consistency –External (Serializability): with respect to the interleavings of other transactions. –Internal (Opacity): the transaction itself should operate on a consistent state.
5
External Consistency Application Memory X Y 0 0 Cannot both return 4 Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y Canonical synchronization problem all STM/HTM implementations must solve
6
Locking STMs Map Array of Versioned- Write-Locks Application Memory V#
7
Commit Time Locking (Write Buff) 1.To Read/Write: Check unlocked add to Read/Write set 2.Acquire Locks 3.Validate read/write v#’s unchanged 4.Write Values 5.Release each lock with v#+1 V# 0 Mem Locks V#+1 0 V# 0 V# 1 X Y V#+1 0 V# 1 V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y Read/WriteLockUnlockValidateWrite
8
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 2 8 4 Transaction B: Write x Write y Transaction A: Read x = 4 Transaction A: Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR
9
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Transaction B: Read x Read y Transaction A: Write x = 4 Transaction A: Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84
10
TL2/TinySTM’s Global Clock Have a shared global version clock Incremented by writing transactions (as infrequently as possible) Read by all transactions Used to validate state viewed by transaction is always opaque [DiceShalevShavit06/ReigelFelberFetzer06]
11
TL2 Style STM 1.Read Vclock 2.Read/Write: if unlocked and v# less clock add to Read/Write-Set 3.Acquire Locks 4.Increment Clock 5.Validate each v# less than clock 6.Write values 7.Release locks with v# = new clock 100 VClock 87 0 34 0 88 0 44 0 V# 0 34 0 99 0 50 0 Mem Locks 87 0 34 0 99 0 50 0 34 1 99 1 87 0 X Y 121 0 50 0 87 0 121 0 88 0 V# 0 44 0 V# 0 121 0 50 0 100120121 X Y Read/WriteLockUnlockValidateWriteRead ClockInc
12
TL2 Style STM Advantages –Great Disjoint Access Parallelism Disadvantages –Accessing Meta-Data is Expensive –Progress guarantee is only deadlock freedom
13
NOrec STM Use shared global clock as a seqlock Validation in every read if a seqlock change is detected Value-based validation: no need for meta-data (local time stamps or locks) [DalessandroSpearScott10]
14
NOrec STM 100 seqlock 100101 X Y Read/Write (with validation if seqlock changed) Not odd? seqlock Lock seqlock (set odd) with validation if seqlock changed Unlock seqlock (set even) Write 102103 X Y R/W Set Z = = 104 Z Z
15
NOrec STM Advantages –No Expensive Meta-Data Disadvantages –Poor Disjoint Access Parallelism (all writes are serialized by clock) –Progress guarantee is only starvation freedom
16
Hardware TM [HerlihyMoss93,IBM/Intel13] Advantages –Everything in Hardware, No Meta Data –Great Disjoint Access Parallelism Disadvantages –No Progress Guarantee; Fail because of: Unsupported instructions: system or protected instructions Exceptions: page faults and similar Capacity limit: too many accessed locations
17
Hybrid TM [Moir,Damron et. Al, Kumar et. al] Fast-Path: Execute Trans Using Best Effort HTM –If it Aborts because of Special Instructions or Transaction Too Large, then… Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM
18
Traditional Hybrid TM 0 0 Update locks Software Transaction Hardware Transaction Test Versioned- Write- Lock in every Read/Write. Update in Write. 0 0 1 1 Versioned- Write-Lock Versioned- Write-Lock [DamronFedorovaLevLuchangcoMoirNussbaum06]
19
Traditional Hybrid TM Advantages –Progress Guarantee of STM Disadvantages –HTM must access meta data –Fast path is actually slow because of extra load and branch on every read
20
Traditional Hybrid TM
21
Phased TM [LevMoirNussbaum07] Two modes: all hardware or all software Shared global mode indicator If some hardware transaction aborts switch to software mode Eventually mode reverts back to hardware
22
Phased TM Advantages –Fast-path Pure HTM: No Meta Data Accesses Disadvantages –Single Software Transaction Causes all HTM to switch to STM slow path –Not clear how to tune to avoid frequent mode transitions…
23
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Not odd? seqlock Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Software will fail seqlock validation!
24
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock
25
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Guaranteed External Consistency
26
Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Problem: hardware opacity
27
Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Hardware B: Read x Read y Software A: Lock seqlock +1 Write x = 4 Write y = 2 Unlock seqlock+1 DIV by 0 ERROR! Compute z = 1/(x-y) … Odd? Seqlock 84
28
Hybrid Norec (2nd Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will detect seqlock invalidation! Not odd? seqlock Guarantee hardware opacity
29
Hybrid NOrec Advantages –Fast-path HTM: No Meta Data Accesses Disadvantages –Limited Disjoint Access Parallelism –Seqlock is in hardware tracking set throughout HTM transaction –Major sequential bottleneck
30
Possible Solutions Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirSco ttSpear2011] Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read But sandboxing is complex…and non- transactional ops only available in AMD proposal, not actual IBM or Intel …
31
Reduced Hardware Approach to HyTM Use short hardware transactions in the software slow-path I.e. create new “mixed” software/hardware path Not in order to make slow-path faster –But rather, in order to remove meta-data accesses from fast path Default to all software if mixed path fails [MatveevShavit13]
32
Transactional Writes Imply Hardware Opacity X Y 4 Hardware B: Read x Read y Trans A: Write x = 4 Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84 2 If in a hardware transaction this cannot happen…
33
Reduced Hardware NOrec In Slow-path commit, use a small hardware transaction to: –Write all values –Check seqlock has not changed –Write seqlock+1 In Fast-path: –Move seqlock test to end, un-instrumented read/writes [MatveevShavit13]
34
Reduced Hardware NOrec Software Norec: Hardware: Read/Write (no instrumentation) Write seqlock +1 Read/Write (with validation) Changed? seqlock Lock seqlock (set odd) Validate In HTM Trans: Write values Changed? seqlock seqlock +1 Hardware will detect write conflict without seqlock! Changed? seqlock Guarantee fast-path opacity without having seqlock in TM tracking set for long Write Lock seqlock (set even) Read seqlock
35
Reduced Hardware NOrec Properties –Fast-path: No Meta Data; No instrumentation of reads or writes –Slow-path: –short hardware transaction: size of write set –can repeatedly attempt short hardware transaction in commit
36
Reduced Hardware NOrec Advantages –Hardware Disjoint Access Parallelism – seqlock accessed only at end of HTM transaction –Surprise: 1 st HyTM that is Obstruction-free and Privatizing –Disadvantages –Still window of possible abort due to seqlock increment
37
Reduced Hardware NOrec
39
Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Hardware Will See Software Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock Write In HTM Trans: Write values Hardware will detect write conflict
40
Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Problem: if between validate and hardware write, can have inconsistency Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock In HTM Trans: Write values Hardware will detect write conflict Solution: combine validation and writes in single transaction In HTM Trans: Validate and Write values
41
Reduced Hardware TL2 Style Advantages –Complete Disjoint Access Parallelism – GV6 clock incremented on aborts only –Obstruction-free –Disadvantages –No privatization –Mixed path transaction size of meta-data set
42
RH1: Reduced Hardware TL2 Style
44
HyTM: Long Journey Combination of ideas: –hardware transactions, –global clocks, –no meta data access, –mixed hardware software paths And there is still room for improvement
46
Reduced Hardware NOrec
47
Reduced Hardware Transactions
48
RH Performance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.