Download presentation
Presentation is loading. Please wait.
Published byLorraine Harvey Modified over 9 years ago
1
A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy Cutler Greg Steffan
2
2 The Parallel Programming Challenge Coarse-grained locking Easy to program Scales poorly Fine-grained locking Scales well Hard to get right eg., deadlock, priority inversion, etc. The promise of Transactional Memory As easy to program as coarse-grained locking Performance/scalability of fine-grained locking
3
3 Transactional Memory (TM) Source Code:... atomic {... access_shared_data();... }... TM System Specifies threads/transactions in source code... atomic {... access_shared_data();... }... atomic {... access_shared_data();... } Transactions: Executes transactions optimistically in parallel Programmer: TM System: 1) Checkpoints execution 2) Detects conflicts ?? 3) Commits or aborts and re-executes
4
4 TM Implementations Flavors of TM: Hardware (HTM), Software (STM), Hybrid (HyTM) STM is especially compelling Exploit current commodity hardware (multicores) Learn about real TM systems and apps Current STM Systems: Java: DSTM, ASTM C or C++: McRT icc, TL2, RSTM, OSTM object-based or programmer intensive (or both) Our focus: arbitrary C/C++, realistic environment
5
5 my_app Programming with STM #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... STM Compiler Source Code: Executable : Shared Library: glib Running Application : Not handled by current compiler/library-based STMs Loader kernel “Legacy Locks” Pre-compiled Binary System Calls
6
6 JudoSTM: An Overview Key design choices: 1)Dynamic Binary Rewriting (DBR) insert instrumentation to implement STM 2)Value-based conflict detection Resulting key features: 1)Privileged transactions (support system calls) 2)Legacy lock elision 3)Efficient invisible readers
7
7 JudoSTM Design Choice 1 Dynamic Binary Rewriting (DBR) Judo DBR Framework (user-space version of JIFL †) † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007
8
8 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1
9
9 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1 bb2
10
10 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1 bb2 bb4 bb1 bb2
11
11 Judo - Performance Normalized Runtime Overhead Overhead low enough to implement STM?
12
12 DBR-Based STM Goal: Perform These Efficiently For all non-stack write instructions Track write addresses and values (write-set) Write-buffer the values from regular memory For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set) When a transaction completes: 1)Acquire commit lock(s) 2)Validate read-set (value-based conflict detection) 3)Commit write-set to memory 4)Release commit lock(s)
13
13 DBR: Attractive Properties for STM Performance: overheads are amortized code cache Can handle arbitrary code and shared libraries any/all code is transactionalized as it executes Sandboxed Transactions Typical STM: inconsistent values could stray execution i.e., stray to non-transactionalized code (very bad!) solution: frequent & costly read-set validation DBR-based STM: any/all code is transactionalized as it executes Tough problems for conventional STMs addressed by DBR
14
14 JudoSTM Design Choice 2 Value-Based Conflict Detection (as opposed to location-based)
15
15 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 000 Strip versions: Strips 2356 235
16
16 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 1: 235 235 0
17
17 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 2: 235 0 0 26 69
18
18 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 2: 235 0 0 26 69 Commit step 1) Validate Read Set Commit step 2) Publish Writes (and inc version #s) 9 1
19
19 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 010 Legend: ReadWritten 0 2356 Strip versions: Transaction 1: 235 0 0 96 Commit step 1) Validate Read Set Abort! Note: all transactions must maintain strip version #s
20
20 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 235
21
21 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 69
22
22 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 69 Commit step 1) Validate Read Set Commit step 2) Publish Writes 9
23
23 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 96 Commit step 1) Validate Read Set Abort! Note: no version information to maintain
24
24 Privileged transactions Can execute (but not roll back) system calls Grab commit lock(s) when about to make a syscall Release when transaction completes Only one privileged transaction exists at a time JudoSTM Feature 1:
25
25 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 235
26
26 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 9 Privileged: can write directly to memory (privileged, syscalls) may be uninstrumented
27
27 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 96 Commit step 1) Validate Read Set Abort! Value-based conflict detection facilitates system calls within transactions!
28
28 Legacy Lock Elision Safely ignore locks within legacy code JudoSTM Feature 2:
29
29 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 1: 5 Lock: 2 6 Read/Write lock acquire 0 01
30
30 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 2 6 Read/Write 0 01 01 lock acquire
31
31 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 6 6 9 Read/Write 0 01 016 lock release 0
32
32 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 6 6 9 Read/Write 0 01 0160 Commit step 1) Validate Read Set Commit step 2) Publish Writes 09 silent store
33
33 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 015 lock release 0 9
34
34 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 0150 Commit step 1) Validate Read Set 9
35
35 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 0150 Commit step 2) Publish Writes 07 9 Value-based conflict detection facilitates the elision of legacy locks!
36
36 JudoSTM Feature 3: Efficient Invisible Readers
37
37 Supporting Invisible Readers Invisible Readers: don’t report reads to others good performance but can lead to inconsistent read data: errors! Data errors: segfault, divide by zero Cheap solution: catch with trap/signal handlers Control errors: jump to non-instrumented code Typical solution: verify read-set after every load Expensive! O(N 2 ) DBR solution: prevented by sandboxing DBR instruments all code as it executes
38
38 JudoSTM Details Implementation
39
39 (reminder) Goal: Perform These Efficiently For all non-stack write instructions Track write addresses and values (write-set) Buffer the values from regular memory For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set) When a transaction completes: 1)Acquire commit lock(s) 2)Validate read-set (value-based conflict detection) 3)Commit write-set to memory 4)Release commit lock(s)
40
40 Read/Write Buffer Implementation Read Hashtable: Read Buffer: Write Hashtable: Write Buffer: Linear probed open-addressed hashtables Address Efficient lookup: 5 insts for a hit (+ state-saving?) Efficient validate and commit?
41
41 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
42
42 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
43
43 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
44
44 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
45
45 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Execute the write-buffer to commit!
46
46 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
47
47 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
48
48 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
49
49 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
50
50 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Execute the read-buffer to validate the read-set!
51
51 Evaluation JudoSTM performance Comparison with Rochester’s RSTM † † http://www.cs.rochester.edu/research/synchronization/rstm
52
52 RSTM vs JudoSTM: Design RSTMJudoSTM LanguageC++C/C++ Programming model Library API, rewrite code atomic{…} Conflict detection Object-level location-based Value-based Memory Allocation Custom“Hoard” scalable parallel allocator Fast commitObject-cloning & pointer-switching Executable write- buffer JudoSTM more flexible, less intrusive; but performance?
53
53 Experimental Framework RSTM micro-benchmarks Linked List, Hash Table, RBTree Equal mix of insert, remove, and lookup Measure throughput (transactions/sec) Test platform 4-way SMP Intel Pentium 4 Xeon - 2.8GHz L1d/L2/L3 cache sizes: 8KB/512KB/2MB Linux 2.6.17.13 with per thread signal handler support
54
54 Linked List Coarse-grained locking best, but not scaling
55
55 Linked List – Zoomed in Single-lock JudoSTM scaling nicely ; RSTM flatlined
56
56 Hash Table Distributed-lock JudoSTM beats CG-locking, tracks RSTM
57
57 RBTree JudoSTM on track to scale past CG-locking; RSTM flatlined
58
58 Conclusions Judo: highly-efficient DBR framework Beats DynamoRIO on SPEC benchmarks JudoSTM: First STM based on DBR Value-based conflict detection Executable read/write buffers Desirable features: Efficient invisible readers (sandboxing) Legacy lock elision Privileged transactions (system call support) Performance comparable to RSTM Facilitates STM for real programs & environments!
59
59 Backups
60
60 JudoSTM Details Programming with JudoSTM
61
61 my_app Programming with JudoSTM #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... Source Code: Executable : Shared Library: glib kernel loader Running Application: #include GTree *tree;... g_tree_insert(tree &key, &val);... Library: judoSTM Instrumented my_app + glib Code Cache #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... #include GTree *tree;... judostm_start() g_tree_insert(tree &key, &val); judostm_stop()... gcc Easy to use, with no compiler support! #ifndef JUDOSTM_H #define JUDOSTM_H extern void judostm_start(void); extern void judostm_stop(void); #define atomic \ asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \ "esi", "flags", "memory");\ int __count = 0; \ judostm_start();\ for (; __count < 1; judostm_stop(), __count++) #endif
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.