Download presentation
Presentation is loading. Please wait.
Published byRussell Allison Modified over 9 years ago
1
Consistency Oblivious Programming Hillel Avni Tel Aviv University
2
Agenda Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work 2
3
Global Lock Easy to use Composable - Concatenate critical sections Not scalable 3
4
Fine Grain Locking Hard to use Not Composable Scalable Lazy linked list is a good example… 4
5
Lazy Traversal b d e a add(c) Aha! 5
6
Lock and Validate b d e a add(c) Yes, b still points to d 6
7
Perform Updates and Release Locks b d e a add(c) c 7
8
Transactional Memory Easy to use Composable Scalable How is it done? 8
9
9 Java (Duece) bool CAS(int location, int expected, int new val) { atomic { if (location != expected) return false; location = new val; } return true; }
10
10 bool CAS(int location, int expected, int new val) { __transaction_atomic { if (location != expected) return false; location = new val; } return true; } C/C++ (GCC-4.7)
11
11 Software Transactional Memory Different algorithms are used. consistency checking rollback Compiler recognizes shared accesses. Compiler recognizes shared accesses.
12
STM Problem - Overhead template static V load(const V* addr, ls_modifier mod) { if (unlikely(mod == RfW)) { pre_write(addr, sizeof(V)); return *addr; } if (unlikely(mod == RaW)) return *addr; gtm_thread *tx = gtm_thr(); gtm_rwlog_entry* log = pre_load(tx, addr, sizeof(V)); V v = *addr; atomic_thread_fence(memory_order_acquire); post_load(tx, log); return v; } load function from GCC 4.8.1 12
13
STM Problem - Overhead static gtm_rwlog_entry* pre_load(gtm_thread *tx, const void* addr, size_t len) { size_t log_start = tx->readlog.size(); gtm_word snapshot = tx->shared_state.load(memory_order_relaxed); gtm_word locked_by_tx = ml_mg::set_locked(tx); size_t orec = ml_mg::get_orec(addr); size_t orec_end = ml_mg::get_orec_end(addr, len); do { gtm_word o = o_ml_mg.orecs[orec].load(memory_order_acquire); if (likely (!ml_mg::is_more_recent_or_locked(o, snapshot))) { success: gtm_rwlog_entry *e = tx->readlog.push(); e->orec = o_ml_mg.orecs + orec; e->value = o; } else if (!ml_mg::is_locked(o)) {snapshot = extend(tx); goto success; } else { if (o != locked_by_tx) tx->restart(RESTART_LOCKED_READ);} orec = o_ml_mg.get_next_orec(orec); } while (orec != orec_end); return &tx->readlog[log_start]; } load always call pre_load 13
14
STM Problem - Overhead static void post_load(gtm_thread *tx, gtm_rwlog_entry* log) { for (gtm_rwlog_entry *end = tx->readlog.end(); log != end; log++) { gtm_word o = log->orec->load(memory_order_relaxed); if (log->value != o) tx->restart(RESTART_VALIDATE_READ); } and post_load Compare to mov eax, [ebx] on x86 Compare to mov eax, [ebx] on x86 14
15
15 Hardware Transactional Memory Exploit native cache coherence consistency checking rollback
16
16 HTM Problem – Resources limits cache size limits data footprint A transaction cannot commit if it is too big too slow quantum size limits duration
17
17 All TM Problem – False Conflicts Any address that was encountered during the transaction is monitored until the end of that transaction. An address may abort a transaction long After it is not relevant… Any address that was encountered during the transaction is monitored until the end of that transaction. An address may abort a transaction long After it is not relevant…
18
Agenda Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work 18
19
COP Operation In non transactional mode: –Execute the read-only prefix of the operation and record its output. In transactional mode: –Verify output is correct. –Perform updates. 19
20
COP Example – RB Tree 20 3010 27 40 25 28 20
21
Add 26 – Tree Unbalanced 20 3010 40 TM Search 26 27 25 28 26 21
22
Tree Balanced 27 3020 25 10 28 40 26 TM Search continues from 27 Conflict and Abort 22
23
Add 26 – Tree Unbalanced 20 3010 40 COP Search 26 27 25 28 26 23
24
Tree Balanced 27 3020 25 10 28 40 26 TM Search continues from 27 Found 24
25
COP RB-Tree Verify To facilitate verification: all nodes in the RB-Tree are connected in a successor- predecessor doubly linked list, and each node has a live mark. Search returns a node n with k or a leaf with k’s successor or predecessor. 25
26
COP RB-Tree Suffix Resume a transaction Verify: –k found and n is live – done. –K not found, check: (n.k>k>n.pred.k && !n.right) or (n.k<k<n.succ.k && !n.left) If verification failed – abort the transaction. Complete updates, add / remove / rebalance, using n. 26
27
COP Template for op start-transaction any-code suspend-transaction output = op-rop(); resume-transaction If(not(op-verify(output))) abort-transaction op-complete(output) any-code end-transaction 27
28
COP Correctness The underlying TM: Transactional Regular Registers The COP algorithm: Obliviousness Verifiability Separation We prove that if the TM yields transactional regular registers, and the COP algorithm demonstrates obliviousness, verifiability, and separation, than the COP operation is linearizeable. 28
29
Agenda Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work 29
30
STM Algorithm GCC default STM algorithm is the one that proved to be the most efficient and scalable in most scenarios: –Write Through (WT) –Encounter Time Locking (ETL) –Multi Lock (ML) 30
31
STM: WT – ETL - ML 1.RV Shared Version Clock 2.On Read: check unlocked and v# <= RV then add to read-Set 3.On write: check v# <= RV, lock, and add to undo-Set 4.WV = F&I(VClock) 5.Validate that in the read-set each v# <= RV 6.Release locks with v# WV 100 Shared Version Clock 87 0 34 0 88 0 44 0 V# 0 34 0 99 0 50 0 Mem Locks 87 0 34 0 99 0 50 0 34 1 99 1 87 0 X Y Commit 121 0 50 0 87 0 121 0 88 0 V# 0 44 0 V# 0 121 0 50 0 100 RV 100120121 X Y 31
32
GCC Constructs __transaction_atomic{}: Mark the transaction. __transaction_cancel: Explicit abort. __attribute__((transaction_safe)): Instrument the code. __attribute__((transaction_pure)): Do not instrument the code. We will show this attribute can be used efficiently as __transaction_suspend with WT – ETL – ML default STM algorithm in GCC. 32
33
pure = suspend Transactional Regular Registers – All values upto one architecture-word size are written and read atomically. The rollback may use memcpy, but the memcpy is optimized to write maximal alignment. Now we will compare the future Power architecture HTM suspended mode, to transaction_pure with WT- ETL-ML STM algorithm. 33
34
Power tsuspend - tresume 1.Until failure occurs, load instructions that access memory locations that were transactionally written by the same thread will return the transactionally written data. 2.In the event of transaction failure, failure recording is performed, but failure handling is deferred until transactional execution is resumed. 3.The initiation of a new transaction is prevented. 4.Store instructions that access memory locations that have been accessed transactionally (due to load or store) by the same thread will cause the transaction to fail. 34
35
RB – 1M sz – 20%U - 10 op/tx 35
36
RB – 1K sz – 8 Threads – 20% U 36
37
Agenda Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work 37
38
Haswell HTM with COP There is no suspend mode, so to compose COP operations, we execute all ROP before the transaction. This limits the composition to one writing COP operation in a transaction at most. 38
39
Capacity and Cache Associativity Packed Memory Array (PMA) search is done by divide and conquer. Assume a PMA size is 0x800000, and it starts at address 0. A searches for an item that is found in address 0x0…0x7FFF, must go through the addresses: 0x400000 0x20000 0x100000 0x80000 0x40000 0x20000 0x10000 0x8000 As cache size in Haswell is 0x8000, all these addresses have the same cache index (0), and will always abort. 39
40
PMA 40
41
RB-Tree Capacity Aborts 41
42
RB-Tree Conflict Aborts 42
43
Agenda Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work 43
44
Data Structures We already have COP versions of: RB-Tree Linked list PMA Cache Oblivious B-Tree Leaplist (k-ary skip list, tailored for range queries) Can we design more COP data structures? 44
45
Applications Use COP in applications. Many applications use shared data structures, so it is interesting to see the impact of COP on their performance. 45
46
Infrastructure Add statistics (transactional accesses, conflicts) to GCC. Add real suspend-mode to GCC, hardware. 46
47
Theory How to make transformation to COP automatic? Is COP applicable outside the data-structures area? Bounds on the amount of transactional accesses? Bounds on the amount of false conflicts? 47
48
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.