Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy.

Similar presentations


Presentation on theme: "A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy."— Presentation transcript:

1 A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy Cutler Greg Steffan

2 2 The Parallel Programming Challenge  Coarse-grained locking Easy to program Scales poorly   Fine-grained locking Scales well Hard to get right   eg., deadlock, priority inversion, etc.  The promise of Transactional Memory As easy to program as coarse-grained locking Performance/scalability of fine-grained locking

3 3 Transactional Memory (TM) Source Code:... atomic {... access_shared_data();... }... TM System Specifies threads/transactions in source code... atomic {... access_shared_data();... }... atomic {... access_shared_data();... } Transactions: Executes transactions optimistically in parallel Programmer: TM System: 1) Checkpoints execution 2) Detects conflicts ?? 3) Commits or aborts and re-executes 

4 4 TM Implementations  Flavors of TM: Hardware (HTM), Software (STM), Hybrid (HyTM)  STM is especially compelling Exploit current commodity hardware (multicores) Learn about real TM systems and apps  Current STM Systems: Java: DSTM, ASTM C or C++: McRT icc, TL2, RSTM, OSTM  object-based or programmer intensive (or both) Our focus: arbitrary C/C++, realistic environment

5 5 my_app Programming with STM #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... STM Compiler Source Code: Executable : Shared Library: glib Running Application : Not handled by current compiler/library-based STMs Loader kernel “Legacy Locks” Pre-compiled Binary System Calls 

6 6 JudoSTM: An Overview  Key design choices: 1)Dynamic Binary Rewriting (DBR)  insert instrumentation to implement STM 2)Value-based conflict detection  Resulting key features: 1)Privileged transactions (support system calls) 2)Legacy lock elision 3)Efficient invisible readers

7 7 JudoSTM Design Choice 1  Dynamic Binary Rewriting (DBR) Judo DBR Framework (user-space version of JIFL †) † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007

8 8 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1

9 9 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1 bb2

10 10 Dynamic Binary Rewriting Original Code:Code Cache: bb1 Judo bb3bb2 bb4 bb1 bb2 bb4 bb1 bb2

11 11 Judo - Performance Normalized Runtime Overhead Overhead low enough to implement STM?

12 12 DBR-Based STM Goal: Perform These Efficiently  For all non-stack write instructions Track write addresses and values (write-set) Write-buffer the values from regular memory  For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set)  When a transaction completes: 1)Acquire commit lock(s) 2)Validate read-set (value-based conflict detection) 3)Commit write-set to memory 4)Release commit lock(s)

13 13 DBR: Attractive Properties for STM  Performance: overheads are amortized code cache  Can handle arbitrary code and shared libraries any/all code is transactionalized as it executes  Sandboxed Transactions Typical STM:  inconsistent values could stray execution i.e., stray to non-transactionalized code (very bad!)  solution: frequent & costly read-set validation DBR-based STM:  any/all code is transactionalized as it executes Tough problems for conventional STMs addressed by DBR

14 14 JudoSTM Design Choice 2  Value-Based Conflict Detection (as opposed to location-based)

15 15 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 000 Strip versions: Strips 2356 235

16 16 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 1: 235 235 0

17 17 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 2: 235 0 0 26 69

18 18 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 000 Legend: ReadWritten 0 2356 Strip versions: Transaction 2: 235 0 0 26 69 Commit step 1) Validate Read Set Commit step 2) Publish Writes (and inc version #s) 9 1

19 19 Location-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: 010 Legend: ReadWritten 0 2356 Strip versions: Transaction 1: 235 0 0 96 Commit step 1) Validate Read Set   Abort! Note: all transactions must maintain strip version #s

20 20 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 235

21 21 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 69

22 22 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 69 Commit step 1) Validate Read Set Commit step 2) Publish Writes 9

23 23 Value-Based Conflict Detection Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 96 Commit step 1) Validate Read Set  Abort!  Note: no version information to maintain

24 24  Privileged transactions Can execute (but not roll back) system calls Grab commit lock(s) when about to make a syscall  Release when transaction completes Only one privileged transaction exists at a time JudoSTM Feature 1:

25 25 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 235

26 26 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 2: 235 26 9 Privileged: can write directly to memory (privileged, syscalls) may be uninstrumented

27 27 Privileged Transactions Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2356 Transaction 1: 235 96 Commit step 1) Validate Read Set  Abort!  Value-based conflict detection facilitates system calls within transactions!

28 28  Legacy Lock Elision Safely ignore locks within legacy code JudoSTM Feature 2:

29 29 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 1: 5 Lock: 2 6 Read/Write lock acquire 0 01

30 30 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 2 6 Read/Write 0 01 01 lock acquire

31 31 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 6 6 9 Read/Write 0 01 016 lock release 0

32 32 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 2 0 Transaction 2: 5 Lock: 6 6 9 Read/Write 0 01 0160 Commit step 1) Validate Read Set Commit step 2) Publish Writes 09 silent store

33 33 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 015 lock release 0 9

34 34 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 0150 Commit step 1) Validate Read Set 9

35 35 Legacy Lock Elision Transaction 1: Transaction 2: Main Memory: Legend: ReadWritten 5 0 Transaction 2: 5 7 Lock: 6 6 Read/Write 0 0150 Commit step 2) Publish Writes 07 9 Value-based conflict detection facilitates the elision of legacy locks!

36 36 JudoSTM Feature 3:  Efficient Invisible Readers

37 37 Supporting Invisible Readers  Invisible Readers: don’t report reads to others good performance but can lead to inconsistent read data: errors!  Data errors: segfault, divide by zero Cheap solution: catch with trap/signal handlers  Control errors: jump to non-instrumented code Typical solution: verify read-set after every load  Expensive! O(N 2 ) DBR solution: prevented by sandboxing  DBR instruments all code as it executes

38 38 JudoSTM Details  Implementation

39 39 (reminder) Goal: Perform These Efficiently  For all non-stack write instructions Track write addresses and values (write-set) Buffer the values from regular memory  For all non-stack read instructions Redirect to the write-buffer If miss: track read addr.s and values (read-set)  When a transaction completes: 1)Acquire commit lock(s) 2)Validate read-set (value-based conflict detection) 3)Commit write-set to memory 4)Release commit lock(s)

40 40 Read/Write Buffer Implementation Read Hashtable: Read Buffer: Write Hashtable: Write Buffer: Linear probed open-addressed hashtables Address Efficient lookup: 5 insts for a hit (+ state-saving?) Efficient validate and commit?

41 41 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

42 42 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

43 43 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

44 44 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

45 45 Efficient Commit: Executable Write-Buffer movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Write Hashtable: Top ptr Write Buffer: Execute the write-buffer to commit!

46 46 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

47 47 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

48 48 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

49 49 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

50 50 cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Read Hashtable: Read Buffer: Efficient Validation: Executable Read-Buffer Top ptr Execute the read-buffer to validate the read-set!

51 51 Evaluation  JudoSTM performance Comparison with Rochester’s RSTM † † http://www.cs.rochester.edu/research/synchronization/rstm

52 52 RSTM vs JudoSTM: Design RSTMJudoSTM LanguageC++C/C++ Programming model Library API, rewrite code atomic{…} Conflict detection Object-level location-based Value-based Memory Allocation Custom“Hoard” scalable parallel allocator Fast commitObject-cloning & pointer-switching Executable write- buffer JudoSTM more flexible, less intrusive; but performance?

53 53 Experimental Framework  RSTM micro-benchmarks Linked List, Hash Table, RBTree Equal mix of insert, remove, and lookup Measure throughput (transactions/sec)  Test platform 4-way SMP Intel Pentium 4 Xeon - 2.8GHz L1d/L2/L3 cache sizes: 8KB/512KB/2MB Linux 2.6.17.13  with per thread signal handler support

54 54 Linked List Coarse-grained locking best, but not scaling

55 55 Linked List – Zoomed in Single-lock JudoSTM scaling nicely ; RSTM flatlined 

56 56 Hash Table Distributed-lock JudoSTM beats CG-locking, tracks RSTM

57 57 RBTree JudoSTM on track to scale past CG-locking; RSTM flatlined 

58 58 Conclusions  Judo: highly-efficient DBR framework Beats DynamoRIO on SPEC benchmarks  JudoSTM: First STM based on DBR Value-based conflict detection Executable read/write buffers  Desirable features: Efficient invisible readers (sandboxing) Legacy lock elision Privileged transactions (system call support) Performance comparable to RSTM Facilitates STM for real programs & environments!

59 59 Backups

60 60 JudoSTM Details  Programming with JudoSTM

61 61 my_app Programming with JudoSTM #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... Source Code: Executable : Shared Library: glib kernel loader Running Application: #include GTree *tree;... g_tree_insert(tree &key, &val);... Library: judoSTM Instrumented my_app + glib Code Cache #include GTree *tree;... atomic { g_tree_insert(tree &key, &val); }... #include GTree *tree;... judostm_start() g_tree_insert(tree &key, &val); judostm_stop()... gcc  Easy to use, with no compiler support! #ifndef JUDOSTM_H #define JUDOSTM_H extern void judostm_start(void); extern void judostm_stop(void); #define atomic \ asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \ "esi", "flags", "memory");\ int __count = 0; \ judostm_start();\ for (; __count < 1; judostm_stop(), __count++) #endif


Download ppt "A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto Marek Olszewski Jeremy."

Similar presentations


Ads by Google