Download presentation
Presentation is loading. Please wait.
Published byKellie Newman Modified over 9 years ago
1
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond
2
Do We Need Efficient STM? 2
3
Problem Solved! 3 Blue Gene/Q
4
HTM is limited… 4 Problem Solved?
5
Best-effort HTM: no completion guarantee 1 Performance penalty: short transactions 2 Language-level support for atomic blocks: STM fallback [1] I. Calciu et al. Invyswell: A Hybrid Transactional Memory for Haswell’s Restricted Transactional Memory. In PACT, 2014. [2] R. M. Yoo et al. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing. In SC, 2013. 5 atomic { from.balance -= amount; to.balance += amount; } transaction Problem Solved?
6
Existing STMs add high overhead 1,2,3 6 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.
7
Existing STMs add high overhead 1,2,3 Related challenges: scalability, progress guarantees, strong semantics 7 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.
8
Challenge Expensive to detect conflicts T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … } 8 o.f = … T2
9
Challenge Expensive to detect conflicts 9 p.g = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }
10
Challenge Expensive to detect conflicts 10 t.k = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }
11
Challenge Expensive to detect conflicts 11 instrumentation ? T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }
12
12
13
Adds very low overhead Achieves good scalability by using a hybrid approach Provides strong progress guarantees Provides strong atomicity 13 LarkTM Contributions
14
Key Insight Avoid high instrumentation costs by minimizing instrumentation costs for non-conflicting accesses 14
15
LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers 15 1. M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, 2013. 2. B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, 2006.
16
LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers 16 1. M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, 2013. 2. B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, 2006. Minimal instrumentation and synchronization for both transactional and non-transactional non-conflicting accesses Does not release locks even if transactions commit
17
17 Biased Locks f lock state object o
18
18 Biased Locks ∈ {WrEx T, RdEx T, RdSh} f lock state object o
19
19 Time T1 Multi-thread Execution f lock state T2 WrEx T1 object o
20
transaction start txn id: 42 o.f = 1 20 Time T1 Multi-thread Execution f lock state T2 last txn WrEx T1 object o
21
transaction start txn id: 42 o.f = 1 21 Time T1 Multi-thread Execution f lock state T2 update last txn 42 WrEx T1 object o
22
transaction start txn id: 42 o.f = 1 22 Time T1 Multi-thread Execution f lock state T2 add o.f undo log last txn 42 … WrEx T1 object o
23
transaction start txn id: 42 o.f = 1 23 Time T1 T2 Multi-thread Execution f lock state update last txn 1 42 … WrEx T1 object o
24
transaction start txn id: 42 o.f = 1 24 Time T1 T2 o.f = 2 Multi-thread Execution f lock state last txn 1 42 … WrEx T1 object o
25
transaction start txn id: 42 o.f = 1 … 25 Time T1 T2 o.f = 2 Multi-thread Execution f lock state No synchronization on T1’s accesses to o Problem! last txn 1 42 … WrEx T1 object o
26
transaction start txn id: 42 26 Time T1 T2 o.f = 2 Multi-thread Execution f lock state T2 starts coordination o.f = 1 … last txn 1 42 … WrEx T1 object o
27
transaction start txn id: 42 27 Time T1 T2 o.f = 2 Coordination f lock state update o.f = 1 … last txn 1 42 … Int T2 object o
28
transaction start txn id: 42 28 Time T1 T2 o.f = 2 Coordination f lock state request o.f = 1 … last txn 1 42 … Int T2 object o
29
transaction start txn id: 42 29 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point last txn 1 42 … Int T2 object o
30
transaction start txn id: 42 30 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point Detecting Conflicts last txn 1 42 … Int T2 object o
31
transaction start txn id: 42 31 Time T1 T2 o.f = 2 A Transactional Conflict f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts Contention Management detected conflicts Resolving Conflicts last txn 1 42 … Int T2 object o
32
transaction start 32 Time T1 T2 o.f = 2 Not A Transactional Conflict f lock state safe point no conflict request … safe point Detecting Conflicts last txn txn id: 43 1 42 … Int T2 object o
33
transaction start txn id: 42 33 Time T1 T2 o.f = 2 Coordination f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o
34
transaction start 34 Time T1 T2 o.f = 2 Coordination f lock state response waiting request txn id: 42 … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o
35
transaction start txn id: 42 35 Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort response 1 42 … Int T2 object o
36
transaction start txn id: 42 36 Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort Starvation and livelock freedom response 1 42 … Int T2 object o
37
transaction start txn id: 42 37 Time T1 T2 Strong Atomicity Semantics f lock state transactional access o.f = 2 request safe point o.f = 1 … … = o.f abort Detecting Conflicts last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o
38
transaction start retry transaction start txn id: 42 38 Time T1 T2 Strong Atomicity Semantics f lock state transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o
39
transaction start txn id: 42 39 Time T1 T2 Strong Atomicity Semantics f lock state safe point non-transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o
40
transaction start txn id: 42 40 Time T1 T2 Strong Atomicity Semantics f lock state non-transactional access retry request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o
41
41 Time T1 T2 Strong Atomicity Semantics non-transactional access request o.f = 2 response T1 transaction end safe point … = o.f o.f = … Non-transactional accesses short transactions no setting up/tearing down cost
42
42 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end transaction start txn id: 51 safe point Detecting Conflicts last txn waiting response 1 42 … Int T2 object o
43
transaction start txn id: 51 43 Time T1 T2 No Transactional Conflict f lock state acquire lock o.f = 2 request transaction end safe point Detecting Conflicts last txn waiting response 1 42 … WrEx T2 object o
44
transaction start txn id: 51 44 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end update add o.f undo log safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o
45
transaction start txn id: 51 45 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end o.f undo log Two versions of coordination protocol o.f = 2 safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o
46
LarkTM-O 46 Adds very low overhead and scales well for low-contention cases
47
txn: 51 47 Time T1 T2 High-Contention Applications … = o.f … o.f = … … … = o.f … o.f = … txn: 42 txn: 43 txn: 52 … = o.f … o.f = … … o.f = …
48
48 Time T1 T2 High-Contention Applications request response … o.f = … … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … request response safe point txn: 51 txn: 42 txn: 43 txn: 52 request
49
LarkTM-S 49 Handling High Contention
50
50 Time T1 T2 LarkTM-S: Hybrid with Traditional Locking … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 o causes high contention
51
51 Time T1 T2 … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 LarkTM-S: Hybrid with Traditional Locking
52
52 Comparison Of Concurrency Control 1 B. Saha et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. In PPoPP, 2006. 2 T. Shpeisman et al. Enforcing Isolation and Ordering in STM. In PLDI, 2007. 3 L. Dalessandro et al. NOrec: Streamlining STM by Abolishing Ownership Records. In PPoPP, 2010. Write concurrency controlRead concurrency control LarkTM-O Eager per-object biased reader–writer lock LarkTM-SIntelSTM–LarkTM-O hybrid IntelSTM 1,2 Eager per-object lockLazy version validation NOrec 3 Lazy global seqlockLazy value validation
53
53 Instrumented accesses LarkTM-OAll accesses LarkTM-SAll accesses IntelSTMAll accesses NOrecAll transactional accesses Comparison Of Instrumentation except redundant accesses
54
54 Progress Guarantee LarkTM-OLivelock and starvation free LarkTM-SLivelock and starvation free IntelSTMNone NOrecLivelock free Comparison Of Progress Guarantees
55
55 Semantics LarkTM-OStrong Atomicity LarkTM-SStrong Atomicity IntelSTMStrong Atomicity NOrecSingle Global Lock Atomicity (SLA) Comparison Of Semantics
56
LarkTM-O, LarkTM-S, IntelSTM (McRT), and NOrec Developed in Jikes RVM 3.1.3 All STMs share features as much as possible (e.g., inlining decisions, redundant barrier analysis, name-mangling) Source code publicly available on the Jikes RVM Research Archive 56 Implementation
57
Evaluation Methodology TM programs STAMP benchmarks STM comparison Norec IntelSTM LarkTM-O LarkTM-S Platform Eight 8-core processors (AMD Opteron 6272) Four 8-core processors (Intel Xeon E5-4620) 57
58
Single-Thread Performance 58
59
Single-Thread Performance 59 610
60
Single-Thread Performance 60 610 2870
61
Single-Thread Performance 61 610 2870
62
Single-Thread Performance 62 610 2870
63
Single-Thread Performance 63 610 2870 40% 73%
64
64 Speedup Geomean
65
65 Speedup Geomean
66
66 Speedup Geomean
67
67 Speedup Geomean
68
68 Toward Practical STM Low instrumentation overhead
69
69 Toward Practical STM scales well Low instrumentation overhead
70
70 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees
71
71 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics
72
72 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.