Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.

Similar presentations


Presentation on theme: "Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond."— Presentation transcript:

1 Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond

2 Do We Need Efficient STM? 2

3 Problem Solved! 3 Blue Gene/Q

4 HTM is limited… 4 Problem Solved?

5 Best-effort HTM: no completion guarantee 1 Performance penalty: short transactions 2 Language-level support for atomic blocks: STM fallback [1] I. Calciu et al. Invyswell: A Hybrid Transactional Memory for Haswell’s Restricted Transactional Memory. In PACT, 2014. [2] R. M. Yoo et al. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing. In SC, 2013. 5 atomic { from.balance -= amount; to.balance += amount; } transaction Problem Solved?

6 Existing STMs add high overhead 1,2,3 6 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.

7 Existing STMs add high overhead 1,2,3 Related challenges: scalability, progress guarantees, strong semantics 7 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.

8 Challenge Expensive to detect conflicts T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … } 8 o.f = … T2

9 Challenge Expensive to detect conflicts 9 p.g = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

10 Challenge Expensive to detect conflicts 10 t.k = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

11 Challenge Expensive to detect conflicts 11 instrumentation ? T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

12 12

13  Adds very low overhead  Achieves good scalability by using a hybrid approach  Provides strong progress guarantees  Provides strong atomicity 13 LarkTM Contributions

14 Key Insight Avoid high instrumentation costs by minimizing instrumentation costs for non-conflicting accesses 14

15 LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers 15 1. M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, 2013. 2. B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, 2006.

16 LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers 16 1. M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, 2013. 2. B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, 2006. Minimal instrumentation and synchronization for both transactional and non-transactional non-conflicting accesses Does not release locks even if transactions commit

17 17 Biased Locks f lock state object o

18 18 Biased Locks ∈ {WrEx T, RdEx T, RdSh} f lock state object o

19 19 Time T1 Multi-thread Execution f lock state T2 WrEx T1 object o

20 transaction start txn id: 42 o.f = 1 20 Time T1 Multi-thread Execution f lock state T2 last txn WrEx T1 object o

21 transaction start txn id: 42 o.f = 1 21 Time T1 Multi-thread Execution f lock state T2 update last txn 42 WrEx T1 object o

22 transaction start txn id: 42 o.f = 1 22 Time T1 Multi-thread Execution f lock state T2 add o.f undo log last txn 42 … WrEx T1 object o

23 transaction start txn id: 42 o.f = 1 23 Time T1 T2 Multi-thread Execution f lock state update last txn 1 42 … WrEx T1 object o

24 transaction start txn id: 42 o.f = 1 24 Time T1 T2 o.f = 2 Multi-thread Execution f lock state last txn 1 42 … WrEx T1 object o

25 transaction start txn id: 42 o.f = 1 … 25 Time T1 T2 o.f = 2 Multi-thread Execution f lock state No synchronization on T1’s accesses to o Problem! last txn 1 42 … WrEx T1 object o

26 transaction start txn id: 42 26 Time T1 T2 o.f = 2 Multi-thread Execution f lock state T2 starts coordination o.f = 1 … last txn 1 42 … WrEx T1 object o

27 transaction start txn id: 42 27 Time T1 T2 o.f = 2 Coordination f lock state update o.f = 1 … last txn 1 42 … Int T2 object o

28 transaction start txn id: 42 28 Time T1 T2 o.f = 2 Coordination f lock state request o.f = 1 … last txn 1 42 … Int T2 object o

29 transaction start txn id: 42 29 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point last txn 1 42 … Int T2 object o

30 transaction start txn id: 42 30 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point Detecting Conflicts last txn 1 42 … Int T2 object o

31 transaction start txn id: 42 31 Time T1 T2 o.f = 2 A Transactional Conflict f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts Contention Management detected conflicts Resolving Conflicts last txn 1 42 … Int T2 object o

32 transaction start 32 Time T1 T2 o.f = 2 Not A Transactional Conflict f lock state safe point no conflict request … safe point Detecting Conflicts last txn txn id: 43 1 42 … Int T2 object o

33 transaction start txn id: 42 33 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o

34 transaction start 34 Time T1 T2 o.f = 2 Coordination f lock state response waiting request txn id: 42 … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o

35 transaction start txn id: 42 35 Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort response 1 42 … Int T2 object o

36 transaction start txn id: 42 36 Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort Starvation and livelock freedom response 1 42 … Int T2 object o

37 transaction start txn id: 42 37 Time T1 T2 Strong Atomicity Semantics f lock state transactional access o.f = 2 request safe point o.f = 1 … … = o.f abort Detecting Conflicts last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o

38 transaction start retry transaction start txn id: 42 38 Time T1 T2 Strong Atomicity Semantics f lock state transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o

39 transaction start txn id: 42 39 Time T1 T2 Strong Atomicity Semantics f lock state safe point non-transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o

40 transaction start txn id: 42 40 Time T1 T2 Strong Atomicity Semantics f lock state non-transactional access retry request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o

41 41 Time T1 T2 Strong Atomicity Semantics non-transactional access request o.f = 2 response T1 transaction end safe point … = o.f o.f = … Non-transactional accesses  short transactions no setting up/tearing down cost

42 42 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end transaction start txn id: 51 safe point Detecting Conflicts last txn waiting response 1 42 … Int T2 object o

43 transaction start txn id: 51 43 Time T1 T2 No Transactional Conflict f lock state acquire lock o.f = 2 request transaction end safe point Detecting Conflicts last txn waiting response 1 42 … WrEx T2 object o

44 transaction start txn id: 51 44 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end update add o.f undo log safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o

45 transaction start txn id: 51 45 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end o.f undo log Two versions of coordination protocol o.f = 2 safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o

46 LarkTM-O 46 Adds very low overhead and scales well for low-contention cases

47 txn: 51 47 Time T1 T2 High-Contention Applications … = o.f … o.f = … … … = o.f … o.f = … txn: 42 txn: 43 txn: 52 … = o.f … o.f = … … o.f = …

48 48 Time T1 T2 High-Contention Applications request response … o.f = … … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … request response safe point txn: 51 txn: 42 txn: 43 txn: 52 request

49 LarkTM-S 49 Handling High Contention

50 50 Time T1 T2 LarkTM-S: Hybrid with Traditional Locking … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 o causes high contention

51 51 Time T1 T2 … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 LarkTM-S: Hybrid with Traditional Locking

52 52 Comparison Of Concurrency Control 1 B. Saha et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. In PPoPP, 2006. 2 T. Shpeisman et al. Enforcing Isolation and Ordering in STM. In PLDI, 2007. 3 L. Dalessandro et al. NOrec: Streamlining STM by Abolishing Ownership Records. In PPoPP, 2010. Write concurrency controlRead concurrency control LarkTM-O Eager per-object biased reader–writer lock LarkTM-SIntelSTM–LarkTM-O hybrid IntelSTM 1,2 Eager per-object lockLazy version validation NOrec 3 Lazy global seqlockLazy value validation

53 53 Instrumented accesses LarkTM-OAll accesses LarkTM-SAll accesses IntelSTMAll accesses NOrecAll transactional accesses Comparison Of Instrumentation except redundant accesses

54 54 Progress Guarantee LarkTM-OLivelock and starvation free LarkTM-SLivelock and starvation free IntelSTMNone NOrecLivelock free Comparison Of Progress Guarantees

55 55 Semantics LarkTM-OStrong Atomicity LarkTM-SStrong Atomicity IntelSTMStrong Atomicity NOrecSingle Global Lock Atomicity (SLA) Comparison Of Semantics

56 LarkTM-O, LarkTM-S, IntelSTM (McRT), and NOrec Developed in Jikes RVM 3.1.3 All STMs share features as much as possible (e.g., inlining decisions, redundant barrier analysis, name-mangling) Source code publicly available on the Jikes RVM Research Archive 56 Implementation

57 Evaluation Methodology TM programs STAMP benchmarks STM comparison Norec IntelSTM LarkTM-O LarkTM-S Platform Eight 8-core processors (AMD Opteron 6272) Four 8-core processors (Intel Xeon E5-4620) 57

58 Single-Thread Performance 58

59 Single-Thread Performance 59 610

60 Single-Thread Performance 60 610 2870

61 Single-Thread Performance 61 610 2870

62 Single-Thread Performance 62 610 2870

63 Single-Thread Performance 63 610 2870 40% 73%

64 64 Speedup Geomean

65 65 Speedup Geomean

66 66 Speedup Geomean

67 67 Speedup Geomean

68 68 Toward Practical STM Low instrumentation overhead

69 69 Toward Practical STM scales well Low instrumentation overhead

70 70 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees

71 71 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics

72 72 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics Thank you


Download ppt "Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond."

Similar presentations


Ads by Google