Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Similar presentations


Presentation on theme: "Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)"— Presentation transcript:

1 Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

2 Haswell

3 Transactional Memory [HerlihyMoss93]

4 Transactional Memory Memory Transactions are collections of reads and writes executed atomically Should Provide –Disjoint Access Parallelism Should maintain internal and external consistency –External (Serializability): with respect to the interleavings of other transactions. –Internal (Opacity): the transaction itself should operate on a consistent state.

5 External Consistency Application Memory X Y 0 0 Cannot both return 4 Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y Canonical synchronization problem all STM/HTM implementations must solve

6 Locking STMs Map Array of Versioned- Write-Locks Application Memory V#

7 Commit Time Locking (Write Buff) 1.To Read/Write: Check unlocked add to Read/Write set 2.Acquire Locks 3.Validate read/write v#’s unchanged 4.Write Values 5.Release each lock with v#+1 V# 0 Mem Locks V#+1 0 V# 0 V# 1 X Y V#+1 0 V# 1 V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y Read/WriteLockUnlockValidateWrite

8 Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 2 8 4 Transaction B: Write x Write y Transaction A: Read x = 4 Transaction A: Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR

9 Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Transaction B: Read x Read y Transaction A: Write x = 4 Transaction A: Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84

10 TL2/TinySTM’s Global Clock Have a shared global version clock Incremented by writing transactions (as infrequently as possible) Read by all transactions Used to validate state viewed by transaction is always opaque [DiceShalevShavit06/ReigelFelberFetzer06]

11 TL2 Style STM 1.Read Vclock 2.Read/Write: if unlocked and v# less clock add to Read/Write-Set 3.Acquire Locks 4.Increment Clock 5.Validate each v# less than clock 6.Write values 7.Release locks with v# = new clock 100 VClock 87 0 34 0 88 0 44 0 V# 0 34 0 99 0 50 0 Mem Locks 87 0 34 0 99 0 50 0 34 1 99 1 87 0 X Y 121 0 50 0 87 0 121 0 88 0 V# 0 44 0 V# 0 121 0 50 0 100120121 X Y Read/WriteLockUnlockValidateWriteRead ClockInc

12 TL2 Style STM Advantages –Great Disjoint Access Parallelism Disadvantages –Accessing Meta-Data is Expensive –Progress guarantee is only deadlock freedom

13 NOrec STM Use shared global clock as a seqlock Validation in every read if a seqlock change is detected Value-based validation: no need for meta-data (local time stamps or locks) [DalessandroSpearScott10]

14 NOrec STM 100 seqlock 100101 X Y Read/Write (with validation if seqlock changed) Not odd? seqlock Lock seqlock (set odd) with validation if seqlock changed Unlock seqlock (set even) Write 102103 X Y R/W Set Z = = 104 Z Z

15 NOrec STM Advantages –No Expensive Meta-Data Disadvantages –Poor Disjoint Access Parallelism (all writes are serialized by clock) –Progress guarantee is only starvation freedom

16 Hardware TM [HerlihyMoss93,IBM/Intel13] Advantages –Everything in Hardware, No Meta Data –Great Disjoint Access Parallelism Disadvantages –No Progress Guarantee; Fail because of: Unsupported instructions: system or protected instructions Exceptions: page faults and similar Capacity limit: too many accessed locations

17 Hybrid TM [Moir,Damron et. Al, Kumar et. al] Fast-Path: Execute Trans Using Best Effort HTM –If it Aborts because of Special Instructions or Transaction Too Large, then… Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM

18 Traditional Hybrid TM 0 0 Update locks Software Transaction Hardware Transaction Test Versioned- Write- Lock in every Read/Write. Update in Write. 0 0 1 1 Versioned- Write-Lock Versioned- Write-Lock [DamronFedorovaLevLuchangcoMoirNussbaum06]

19 Traditional Hybrid TM Advantages –Progress Guarantee of STM Disadvantages –HTM must access meta data –Fast path is actually slow because of extra load and branch on every read

20 Traditional Hybrid TM

21 Phased TM [LevMoirNussbaum07] Two modes: all hardware or all software Shared global mode indicator If some hardware transaction aborts switch to software mode Eventually mode reverts back to hardware

22 Phased TM Advantages –Fast-path Pure HTM: No Meta Data Accesses Disadvantages –Single Software Transaction Causes all HTM to switch to STM slow path –Not clear how to tune to avoid frequent mode transitions…

23 Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Not odd? seqlock Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Software will fail seqlock validation!

24 Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock

25 Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Guaranteed External Consistency

26 Hybrid Norec (1 st Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will fail seqlock validation! Not odd? seqlock Problem: hardware opacity

27 Internal Inconsistency (Opacity) [GuerraouiKapalka07] X Y 4 Hardware B: Read x Read y Software A: Lock seqlock +1 Write x = 4 Write y = 2 Unlock seqlock+1 DIV by 0 ERROR! Compute z = 1/(x-y) … Odd? Seqlock 84

28 Hybrid Norec (2nd Attempt) Software Norec: Hardware: Read/Write (no validation) Write seqlock +2 Read/Write (with validation) Not odd? seqlock Lock Seqlock (set odd) Unlock Seqlock (set even) Validate Write Hardware will detect seqlock invalidation! Not odd? seqlock Guarantee hardware opacity

29 Hybrid NOrec Advantages –Fast-path HTM: No Meta Data Accesses Disadvantages –Limited Disjoint Access Parallelism –Seqlock is in hardware tracking set throughout HTM transaction –Major sequential bottleneck

30 Possible Solutions Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirSco ttSpear2011] Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read But sandboxing is complex…and non- transactional ops only available in AMD proposal, not actual IBM or Intel …

31 Reduced Hardware Approach to HyTM Use short hardware transactions in the software slow-path I.e. create new “mixed” software/hardware path Not in order to make slow-path faster –But rather, in order to remove meta-data accesses from fast path Default to all software if mixed path fails [MatveevShavit13]

32 Transactional Writes Imply Hardware Opacity X Y 4 Hardware B: Read x Read y Trans A: Write x = 4 Write y = 2 DIV by 0 ERROR! Compute z = 1/(x-y) 84 2 If in a hardware transaction this cannot happen…

33 Reduced Hardware NOrec In Slow-path commit, use a small hardware transaction to: –Write all values –Check seqlock has not changed –Write seqlock+1 In Fast-path: –Move seqlock test to end, un-instrumented read/writes [MatveevShavit13]

34 Reduced Hardware NOrec Software Norec: Hardware: Read/Write (no instrumentation) Write seqlock +1 Read/Write (with validation) Changed? seqlock Lock seqlock (set odd) Validate In HTM Trans: Write values Changed? seqlock seqlock +1 Hardware will detect write conflict without seqlock! Changed? seqlock Guarantee fast-path opacity without having seqlock in TM tracking set for long Write Lock seqlock (set even) Read seqlock

35 Reduced Hardware NOrec Properties –Fast-path: No Meta Data; No instrumentation of reads or writes –Slow-path: –short hardware transaction: size of write set –can repeatedly attempt short hardware transaction in commit

36 Reduced Hardware NOrec Advantages –Hardware Disjoint Access Parallelism – seqlock accessed only at end of HTM transaction –Surprise: 1 st HyTM that is Obstruction-free and Privatizing –Disadvantages –Still window of possible abort due to seqlock increment

37 Reduced Hardware NOrec

38

39 Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Hardware Will See Software Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock Write In HTM Trans: Write values Hardware will detect write conflict

40 Reduced Hardware TL2 Style Software TL2 style: Hardware: Read/Write (no validation) Problem: if between validate and hardware write, can have inconsistency Read/Write (validate)Validate Write values With Clock +1 Read Clock Read Clock In HTM Trans: Write values Hardware will detect write conflict Solution: combine validation and writes in single transaction In HTM Trans: Validate and Write values

41 Reduced Hardware TL2 Style Advantages –Complete Disjoint Access Parallelism – GV6 clock incremented on aborts only –Obstruction-free –Disadvantages –No privatization –Mixed path transaction size of meta-data set

42 RH1: Reduced Hardware TL2 Style

43

44 HyTM: Long Journey Combination of ideas: –hardware transactions, –global clocks, –no meta data access, –mixed hardware software paths And there is still room for improvement

45

46 Reduced Hardware NOrec

47 Reduced Hardware Transactions

48 RH Performance


Download ppt "Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)"

Similar presentations


Ads by Google