Presentation is loading. Please wait.

Presentation is loading. Please wait.

Log-Based Transactional Memory

Similar presentations


Presentation on theme: "Log-Based Transactional Memory"— Presentation transcript:

1 Log-Based Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Log-Based Transactional Memory Kevin E. Moore UW-Madison Architecture Seminar

2 LogTM: Log-based Transactional Memory
9/17/2018 Motivation Chip-multiprocessors/Multi-core/Many-core are here “Intel has 10 projects in the works that contain four or more computing cores per chip” -- Paul Otellini, Intel CEO, Fall ’05 We must effectively program these systems But programming with locks is challenging “Blocking on a mutex is a surprisingly delicate dance” OpenSolaris, mutex.c 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

3 LogTM: Log-based Transactional Memory
9/17/2018 Locks are Hard // WITH LOCKS void move(T s, T d, Obj key){ LOCK(s); LOCK(d); tmp = s.remove(key); d.insert(key, tmp); UNLOCK(d); UNLOCK(s); } Moreover Coarse-grain locking limits concurrency Fine-grain locking difficult move(a, b, key1); move(b, a, key2); Thread 0 Thread 1 DEADLOCK! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

4 Transactional Memory (TM)
LogTM: Log-based Transactional Memory 9/17/2018 Transactional Memory (TM) void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } Programmer says “I want this atomic” TM system “Makes it so” Software TM (STM) Implementations Currently slower than locks Always slower than hardware? Hardware TM (HTM) Implementations Leverage cache coherence & speculation Fast But hardware overheads and virtualization challenges 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

5 Goals for Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Goals for Transactional Memory Efficient Implementation Make the common case fast Can’t justify expensive HW (yet) Virtualizing TM Don’t limit programming model Allow transactions of any size and duration 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

6 LogTM: Log-based Transactional Memory
9/17/2018 Implementing TM Version Management new values for commit old values for abort Must keep both Conflict Detection Find read-write, write-read or write-write conflicts among concurrent transactions Allows multiple readers OR one writer Large state (must be precise) Checked often (must be fast) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

7 LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Combined Hardware/Software Transactional Memory Conservative hardware conflict detection Software version management (with some hardware support) Eager Version Management Stores new values in place Stores old values in user virtual memory (the transaction log) Eager Conflict Detection Detects transaction conflicts on each load and store Apply this strategy to TM. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

8 LogTM: Log-based Transactional Memory
9/17/2018 LogTM Publications [HPCA 2006] LogTM: Log-based Transactional Memory [ASPLOS 2006] Supporting Nested Transactional Memory in LogTM [HPCA 2007] LogTM-SE: Decoupling Hardware Transactional Memory from Caches [ISCA 2007] Performance Pathologies in Hardware Transactional Memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

9 LogTM: Log-based Transactional Memory
9/17/2018 Outline Introduction Background LogTM Implementing LogTM Evaluation Extending LogTM Related Work Conclusion 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

10 LogTM: Log-based Transactional Memory
9/17/2018 LOGTM UW-Madison Architecture Seminar

11 LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Eager Software-Based Version Management Store new values in place Store old values in the transaction log Undo failed transactions in software Eager All-Hardware Conflict Detection Isolate new values Fast conflict detection for all transactions Apply this strategy to TM. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

12 LogTM’s Eager Version Management
LogTM: Log-based Transactional Memory 9/17/2018 LogTM’s Eager Version Management New values stored in place Old values stored in the transaction log A per-thread linear (virtual) address space (like the stack) Filled by hardware (during transactions) Read by software (on abort) VA Data Block R W 00 12 1 40 24 80 56 Log Base C0 C0 90 34 7c 23 Transaction Log Log Ptr E8 00 15 100 TM count 1 <example> 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

13 Eager Version Management Discussion
LogTM: Log-based Transactional Memory 9/17/2018 Eager Version Management Discussion Advantages: No extra indirection (unlike STM) Fast Commits No copying Common case Disadvantages Slow/Complex Aborts Undo aborting transaction Relies on Eager Conflict Detection/Prevention 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

14 LogTM’s Eager Conflict Detection
Requirements for Conflict Detection in LogTM: Transactions Must Be Well Formed Each thread must obtain read isolation on all memory locations read and write isolation on all locations written Isolation Must be Strict Two-Phase Any thread that acquires read or write isolation on a memory location in a transaction must maintain that isolation until the end of the transaction Isolation Must Be Released at the End of a Transaction Because conflicts may prevent transactions from making progress, a thread completing a transaction must release isolation when it aborts or commits a transaction 9/17/2018 Wisconsin Multifacet Project

15 LogTM’s Conflict Detection in Practice
LogTM: Log-based Transactional Memory 9/17/2018 LogTM’s Conflict Detection in Practice LogTM detects conflicts using coherence Requesting processor issues coherence request to memory system Coherence mechanism forwards to other processor(s) Responding processor detects conflict using local state & informs requesting processor of conflict Requesting processor resolves conflict (discussed later) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

16 Example Implementation (LogTM-Dir)
LogTM: Log-based Transactional Memory 9/17/2018 Example Implementation (LogTM-Dir) P0 store P0 sends get exclusive (GETX) request Directory responds with data (old) P0 executes store Directory GETX [old] I [old] DATA P0 P1 Metadata (--) (--) (W-) Metadata (--) M [new] I [none] M [old] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

17 Example Implementation (LogTM-Dir)
LogTM: Log-based Transactional Memory 9/17/2018 Example Implementation (LogTM-Dir) In-cache transaction conflict P1 sends get shared (GETS) request Directory forwards to P0 P1 detects conflict and sends NACK Directory [old] GETS Fwd_GETS P0 P1 Metadata (W-) Metadata (--) M [new] M [new] I [none] Conflict! NACK 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

18 LogTM: Log-based Transactional Memory
9/17/2018 Conflict Resolution Conflict Resolution Can wait risking deadlock Can abort risking livelock Wait/abort transaction at requesting or responding proc? LogTM resolves conflicts at requesting processor Requesting can wait (using coherence nacks/retries) But must abort if deadlock is possible Requester Stalls Policy Logically order transactions with timestamps On conflict notification, wait unless already causing an older transaction to wait 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

19 LogTM: Log-based Transactional Memory
9/17/2018 LogTM API User System/Library Low-Level begin_transaction() commit_transaction() abort_transaction() Initialize_logtm_transactions() Register_abort_handler(void (*) handler) Undo_log_entry() Complete_abort_with_restart() Complete_abort_wo_restart() 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

20 LogTM: Log-based Transactional Memory
9/17/2018 IMPLEMENTING LOGTM UW-Madison Architecture Seminar

21 Version Management Trade-offs
Hardware vs. Software Register Checkpointing Implicit vs. Explicit Logging Buffered vs. Direct Logging Logging Granularity Logging Location 9/17/2018 Wisconsin Multifacet Project

22 Compiler-Supported Software Logging
Software Register Checkpointing Compiler generates instructions to save registers to transaction log Software-only logging Compiler generates instructions to save old values and to the transaction log Lowest implementation cost All-software version management High overhead Slow to start transactions (save registers) Slow writes (extra load & instructions) 9/17/2018 Wisconsin Multifacet Project

23 In-Cache Hardware Logging
Hardware Register Checkpointing Bulk save architectural registers (like USIII) Hardware Logging Hardware saves old values and virtual address to memory at the first level of writeback cache Best Performance Little or no logging-induced delay Single-cycle transaction begin/commit Complex implementation Shadow register file Buffering and forwarding logic in caches 9/17/2018 Wisconsin Multifacet Project

24 In-Cache Hardware Logging
L1 D Cache L2 Cache ECC Log Target VA Bank 0 Bank 1 ECC Log Target VA ECC Store Target ECC Store Target ECC Data ECC Data CPU Store Buffer L1 D L1 D CPU CPU Store Buffer 9/17/2018 Wisconsin Multifacet Project

25 Hardware/Software Hybrid Buffered Logging
Hardware Register Checkpointing Bulk save architectural registers (like USIII) Buffered Logging Hardware saves old values and virtual address to a small buffer Good Performance Little or no logging-induced delay for small transactions Single-cycle transaction begin/commit Reduces processor-to-cache memory traffic Less-complex implementation Shadow register file No changes to caches 9/17/2018 Wisconsin Multifacet Project

26 Hardware/Software Hybrid Buffered Logging
Cache Log Target VA Store Target CPU Log Buffer Store Buffer Store Buffer Transaction Execution Buffer Spill Register File Register File 9/17/2018 Wisconsin Multifacet Project

27 Implementing Conflict Detection
Existing cache coherence mechanisms can support conflict detection for cached data by adding an R (read) W (write) bit to each cache line Challenges for detecting conflicts on un-cached data differ for broadcast and directory systems Broadcast Easy to find all possible conflicts Hard to filter false conflicts Directory Hard to find all possible conflicts Easy to filter false conflicts 9/17/2018 Wisconsin Multifacet Project

28 LogTM-Bcast Adds a Bloom Filter to track memory blocks touched in a transaction, then evicted from the cache Allows any number of addresses to be added to the filter Detects all true conflicts Allows some false conflicts L2 Cache Tag RW Data Overflow filter R W L1 D CPU L1 I 9/17/2018 Wisconsin Multifacet Project

29 Extends a standard MESI directory with sticky states
LogTM-Dir Extends a standard MESI directory with sticky states The directory continues to forward coherence traffic for a memory location to processors that touch that location in a transaction then evict it from the cache Removes most false conflicts with a single overflow bit per cache 9/17/2018 Wisconsin Multifacet Project

30 LogTM: Log-based Transactional Memory
9/17/2018 Sticky States Directory State M S I E Sticky-M Sticky-S Cache State 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

31 LogTM-Dir Conflict Detection w/ Cache Overflow
LogTM: Log-based Transactional Memory 9/17/2018 LogTM-Dir Conflict Detection w/ Cache Overflow At overflow at processor P0 Set P0’s overflow bit (1 bit per processor) Allow writeback, but set directory state to At (potential) conflicting request by processor P1 Directory forwards P1’s request to P0. P0 tells P1 “no conflict” if overflow is reset But asserts conflict if set (w/ small chance of false positive) At transaction end (commit or abort) at processor P0 Reset P0’s overflow bit Clean sticky states lazily on next access 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

32 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Cache overflow P0 sends put exclusive (PUTX) request Directory acknowledges P0 writes data back to memory Directory [new] [old] PUTX ACK DATA P0 P1 TM count 1 TM count R/W (W-) R/W (--) M [new] I [none] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

33 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible) conflict P0 sends NACK Directory [new] [old] GETS Fwd_GETS P0 P1 TM count 1 TM count Signature (-W) Signature (--) I [none] I (--) [none] M (--) [old] M (-W) [new] I [none] Conflict! NACK 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

34 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Commit P0 clears TM count and Signature Directory [new] [old] P0 P1 TM count 1 TM count Signature (--) (-W) Signature (--) M (--) [old] I [none] I (--) [none] M (-W) [new] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

35 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Lazy cleanup P1 sends GETS request Directory forwards request to P0 P0 detects no conflict, sends CLEAN Directory sends Data to P1 Directory [new] S(P1) [new] GETS CLEAN DATA Fwd_GETS P0 P1 TM count TM count Signature (--) Signature (--) (R-) I (--) [none] M (--) [old] I [none] M (-W) [new] I (--) [none] S [new] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

36 LogTM: Log-based Transactional Memory
9/17/2018 EVALUATION UW-Madison Architecture Seminar

37 System Model LogTM-Dir
In-Cache Hardware Logging & Hybrid Buffered Logging Component Settings Processors 32, 1 GHz, single-issue, in-order, non-memory IPC=1 L1 Cache 16 kB 4-way split, 1-cycle latency L2 Cache 4 MB 4-way unified, 12-cycle latency Memory 4 GB, 80-cycle latency Directory Full-bit-vector sharers list, directory cache, 6-cycle latency Interconnection Network Hierarchical switch topology, 14-cycle link latency 9/17/2018 Wisconsin Multifacet Project

38 Benchmarks Benchmark Synchronization Inputs Shared Counter
Counter lock 2500 cycle random think time B-Tree Transactions only 9-ary tree, 5 levels deep Barnes Locks on tree nodes 512 bodies Cholesky Task queue locks 14 Berkeley DB (BkDB) Locks on object lists 512 operations MP3D Locks 4096 molecules Radiosity Large room Raytrace Work list and counter locks Car 9/17/2018 Wisconsin Multifacet Project

39 Read Set Size 9/17/2018 Wisconsin Multifacet Project

40 Write Set Size 9/17/2018 Wisconsin Multifacet Project

41 Microbenchmark Scalability
Btree 0%, 10% and 20% Updates Shared Counter: LogTM vs. Locks 9/17/2018 Wisconsin Multifacet Project

42 Benchmark Scalability
Barnes BkDB 9/17/2018 Wisconsin Multifacet Project

43 Benchmark Scalability
Cholesky MP3D 9/17/2018 Wisconsin Multifacet Project

44 Benchmark Scalability
Radiosity Raytrace 9/17/2018 Wisconsin Multifacet Project

45 Scalability Summary Benchmarks scale as well or better using LogTM transactions Performance is better for all benchmarks LogTM improves the scalability of some benchmarks, but not others Abort rates are low Next: Write set prediction Buffered Logging Log Granularity 9/17/2018 Wisconsin Multifacet Project

46 Write Set Prediction Predicts if the target of each load will be modified in this transaction Eagerly acquires write isolation Reduces “waits for” cycles that force aborts in LogTM Four Predictors: None -- Never predict 1-Entry -- Remembers a single address Load PC -- History based on PC of load instruction Always -- Acquire write isolation for all loads and stores 9/17/2018 Wisconsin Multifacet Project

47 Abort Rate with Write Set Prediction
9/17/2018 Wisconsin Multifacet Project

48 Performance Impact of WSP
9/17/2018 Wisconsin Multifacet Project

49 Impact of Buffer-Spill Stalls
9/17/2018 Wisconsin Multifacet Project

50 Log Granularity 9/17/2018 Wisconsin Multifacet Project

51 Modeling Abort Penalty
Delays coherence requests Delays transaction restart Penalty consists of: Trap overhead (constant) Rollback overhead (per log entry) Measured performance for 3 settings: Ideal -- single-cycle abort Medium cycle trap, 40-cycle per undo record Slow cycle trap, 200-cycle per undo record 9/17/2018 Wisconsin Multifacet Project

52 Sensitivity to Abort Penalty (no WSP)
9/17/2018 Wisconsin Multifacet Project

53 Sensitivity to Abort Penalty (with WSP)
9/17/2018 Wisconsin Multifacet Project

54 LogTM: Log-based Transactional Memory
9/17/2018 EXTENDING LOGTM UW-Madison Architecture Seminar

55 Extending LogTM Supporting Nesting in LogTM
Support nested VM by segmenting the transaction log Non-transactional escape actions facilitate OS interactions Virtualizing Conflict Detection with Signatures LogTM-Signature Edition (LogTM-SE) tracks read and write sets with signatures (like Bloom Filters) Supports thread switching and paging by saving, restoring and manipulating signatures 9/17/2018 Wisconsin Multifacet Project

56 LogTM: Log-based Transactional Memory
9/17/2018 RELATED WORK UW-Madison Architecture Seminar

57 Related Work Hardware Support for Database Transactions
Early Transactional Memory Systems Hardware TM (HTM) Software TM (STM) Hybrid TM TM Virtualization 9/17/2018 Wisconsin Multifacet Project

58 Early Transactional Memory Systems
Hardware Support for Database Transactions 801 Storage System Database-like transactions on 1-level store (memory and disk) Transactions are durable Early HTM Knight used transactions to parallelize code written in ‘mostly functional’ languages Herlihy and Moss First HTM Implementation based on a separate transaction cache Transactions limited to cached data 9/17/2018 Wisconsin Multifacet Project

59 Unbounded Transactional Memory
Uses Eager VM and Eager CD Supports unbounded transactions in hardware Complex hardware Pointer and state bits for each line in memory Hardware state machine for transaction rollback Global virtual address space 9/17/2018 Wisconsin Multifacet Project

60 Transactional Memory Coherence and Consistency (TCC)
On-Chip Interconnect Broadcast-Based Communication Write buffer ~4 kB, Fully-Associative L2 Cache Logically Shared CPU L1 D R L1 Cache tracks read set 9/17/2018 Wisconsin Multifacet Project

61 Encodes read and write sets in signatures (like bloom filters)
Bulk Encodes read and write sets in signatures (like bloom filters) Like TCC, uses lazy VM and lazy CD Can detect conflicts for non-cached data 9/17/2018 Wisconsin Multifacet Project

62 Hybrid Transactional Memory
Combines HTM and STM Executes small transactions in hardware, large transactions in software Allows program execution on existing hardware (without HTM support) 9/17/2018 Wisconsin Multifacet Project

63 Transaction Virtualization
Virtual Transactional Memory (VTM) Rajwar and Herlihy Adds a virtualization mechanism to limited HTM (e.g. Herlihy and Moss TM) Implements CD and VM for transactions that exceed hardware capabilities in micro-code Page-granularity Transaction Virtualization PTM -- Chuang et al. XTM -- Chung et al. 9/17/2018 Wisconsin Multifacet Project

64 HTM Virtualization Mechanisms
Before Virtualization After Virtualization $Miss Commit Abort $Eviction Paging Thread Switch UTM - H HC VTM S SC SWV UnrestrictedTM A B AS XTM XTM-g ASC SCV PTM-Copy PTM-Select LogTM-SE Shaded = virtualization event - = handled in simple HW H = complex hardware S = handled in software A = abort transaction C = copy values W = walk cache V = validate read set B = block other transactions 9/17/2018 Wisconsin Multifacet Project

65 LogTM: Log-based Transactional Memory
9/17/2018 Conclusion TM can make parallel programs faster and easier to write LogTM provides: Hardware/Software Implementation Simple, flexible hardware Software-Based Eager Version Management Makes the common case (commit) fast Reduces hardware complexity Hardware-Based Eager Conflict Detection Allows blocking to reduce wasted work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

66 Thanks to my Collaborators
LogTM: Log-based Transactional Memory 9/17/2018 Thanks to my Collaborators Jayaram Bobba, Mark Hill, Derek Hower, Steve Jackson, Nick Kidd, Ben Liblit, Mike Marty, Michelle Moravan, Tom Reps, Mike Swift, Haris Volos, David Wood, Luke Yen 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

67 LogTM: Log-based Transactional Memory
9/17/2018 BACKUP SLIDES 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

68 Database Locks and Cache Coherence States
Cache State No Lock I S E, O, S X M Coherence states are analogous to short database locks Most protocols have no provision to hold long locks 9/17/2018 Wisconsin Multifacet Project

69 Herlihy and Moss, ISCA 1993 CPU Transaction cache Memory Long Locks
Stores all data accessed by transactions 2 copies of each updated cache line Fully associative Acts as a victim cache Long Locks Processors are allowed to refuse coherence requests Memory M S XCommit XAbort Cache Transaction Cache CPU 9/17/2018 Wisconsin Multifacet Project

70 Transactions Limited by Cache Size and Associativity
Exposes the size of the transaction cache to the architecture Requires minimum associativity Difficult for dynamic transactions 9/17/2018 Wisconsin Multifacet Project

71 Transactional Lock Removal (TLR)
Uses Speculative Lock Elision (SLE) to elide lock operations in short critical sections Extends SLE with lock-based concurrency control Long locks – processors can defer coherence responses during speculative transactions 9/17/2018 Wisconsin Multifacet Project

72 LogTM-SE Processor Hardware
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 9/17/2018 10/25/06 LogTM-SE Processor Hardware Segmented log, like LogTM Track R / W sets with R / W signatures Over-approximate R / W sets Tracks physical addresses Summary signature used for virtualization Conflict detection by coherence protocol Check signatures on every memory access for SMT Registers Register Checkpoint LogFrame TMcount Read LogPtr Write SummaryRead SummaryWrite SMT Thread Context Tag Data NO TM STATE Data Caches 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 72

73 LogTM: Log-based Transactional Memory
9/17/2018 Escape Actions Allow non-transactional escapes from a transaction (e.g., system calls, I/O) Similar to Zilles’s pause/unpause Escape actions never: Abort Stall Cause other transactions to abort Cause other transactions to stall Commit and compensating actions similar to open nests Not recommended for the average programmer! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

74 Escape Actions in LogTM
LogTM: Log-based Transactional Memory 9/17/2018 Escape Actions in LogTM Loads and stores to non-transactional blocks behave as normal coherent accesses Loads return the latest value in coherent memory Loads to a transactionally modified cache block triggers a writeback (sticky-M state) Memory responds with an uncacheable copy of the block Stores modify coherent memory Stores to transactionally modified blocks trigger writebacks (sticky-M) Updates the value in memory (non-cacheable write through) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

75 Thread Switching Support
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write signatures: If thread t of process P is scheduled to use an active signature, the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 76

76 Handling Thread Switching
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Handling Thread Switching Summary W R Summary W R Summary W R Summary W R OS T2 T3 T1 W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 77

77 Handling Thread Switching
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Handling Thread Switching W OS Summary R Deschedule T2 T3 T1 Summary W R Summary W R Summary W R W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 78

78 Handling Thread Switching
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Handling Thread Switching W Summary W R Summary W R OS Summary R Deschedule T2 T3 T1 Summary W R Summary W R Summary W R W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 79

79 Handling Thread Switching
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Handling Thread Switching W OS Summary R T1 T2 T3 W Summary W R Summary W R W Summary R Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 80

80 Thread Switching Support Summary
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Thread Switching Support Summary Summary Read / Write signatures Summarizes descheduled threads with active transactions One OS structure per process Check summary signature on every memory access Updated on transaction deschedule Similar to TLB shootdown Coherence 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 81

81 LogTM: Log-based Transactional Memory
9/17/2018 Improving LogTM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

82 LogTM: Log-based Transactional Memory
9/17/2018 Comparing HTMs 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

83 Multifacet Group Projects:
LogTM: Log-based Transactional Memory 9/17/2018 Multifacet Group Projects: IEEE Computer - Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill and David A. Wood ASPLOS Timestamp Snooping: An Approach for Extending SMPs, Milo M. K. Martin, Daniel J. Sorin, Anastassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill, and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

84 How Do Transactional Memory Systems Differ?
LogTM: Log-based Transactional Memory 9/17/2018 How Do Transactional Memory Systems Differ? (Data) Version Management Eager: record old values “elsewhere”; update “in place” Lazy: update “elsewhere”; keep old values “in place” (Data) Conflict Detection Eager: detect conflict on every read/write Lazy: detect conflict at end (commit/abort)  Fast commit  Less wasted work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

85 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Initial State LogBase = LogPointer R & W bits are clear 00 40 C0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

86 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Load r1, (00) /* r1 gets 12 */ Set R bit for block (00) (no changes to log) 00 1 40 C0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

87 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Store r2, (c0) /* r2 = 56 */ Set W bit for block (c0) Store address (c0) and old data on the log Increment Log Ptr to 1048 Update memory 00 1 40 C0 1 1000 c0 Log Base 1000 1040 -- Log Ptr 1000 1048 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

88 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Load r3, (78) Set R bit for block (40) R3 = r3 + 1 Store r3, (78) Set W bit for block (40) Store address (40) and old data on the log Increment Log Ptr to 1090 Update memory 00 1 40 1 1 C0 1 1000 c0 Log Base 1000 1040 -- 40 Log Ptr 1048 1090 1080 --23 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

89 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Commit transaction Clear R & W for all blocks Reset Log Ptr to Log Base (1000) Clear TM mode 00 1 40 C0 1000 c0 Log Base 1000 1040 -- 40 Log Ptr 1090 1000 1080 --23 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

90 Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Abort transaction Replay log entries to “undo” the transaction Reset Log Ptr to Log Base (1000) Clear R & W bits for all blocks Clear TM mode 00 1 40 C0 1000 -- c0 Log Base 1000 1040 --23 40 Log Ptr 1090 1000 1048 1080 TM mode 1 Back to Talk 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

91 LogTM: Log-based Transactional Memory
9/17/2018 Primitive: Logging Software defined log location (in virtual memory) Based on log pointer register Hardware copies old values and virtual address to memory at log pointer Overlaps logging with stores Allows logging with library calls 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

92 Primitive: Address Matching
LogTM: Log-based Transactional Memory 9/17/2018 Primitive: Address Matching Software creates and activates multiple contexts Not strictly nested Many uses: Hand-over-hand locking Pointer alias checks Transactional memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

93 LogTM: Log-based Transactional Memory
9/17/2018 LogTM Interface User-Level Begin/commit/abort System/Library Initialize transactions Register conflict handler Low-Level Undo log entry Complete abort with/without restart  currently, undo log to abort, but conflict managers in future 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

94 LogTM: Log-based Transactional Memory
9/17/2018 HTM (in general) Version Management New values in cache Old values in memory Conflict Detection Coherence protocol detects conflicts Invalidate Memory Cache Cache M NEW S I S M NEW CPU CPU 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

95 Conflict Detection in Other TM Schemes
LogTM: Log-based Transactional Memory 9/17/2018 Conflict Detection in Other TM Schemes Cache overflow of transactional data hard for (Hardware) TM Prohibit: Herlihy/Moss TM Action at Overflow: LTM, VTM, & TCC 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

96 LogTM: Log-based Transactional Memory
9/17/2018 Outline Background/Motivation Multicores are her We need to program them Need HW/SW solution HW primitives SW Control TM Clear, intuitive model Likely benfits But, all-hw won’t work LogTM LogTM Family Eager Version Management Basic Log Segmented Log Eager Conflict Detection Signatures Coherence Sticky States Conflict Resolution Requester stalls Write set prediction Operating Systems Interaction Thread switching Skip paging Open Nesting + Escape Actions Future Work Deconstructing Transactional Memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

97 Software Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Software Transactional Memory Transactional programming w/o hardware support Atomic swap of pointers to enforce atomicity Adds a level of indirection 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

98 MSI Coherence 101 (per memory block)
LogTM: Log-based Transactional Memory 9/17/2018 MSI Coherence 101 (per memory block) States: M - one writer S - many readers I - no access Protocol: detects & orders data conflicts write-read read-write write-write E.g., Writer seeks M copy & must invalidate S copies 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

99 Why Hardware Transactional Memory (HTM)?
LogTM: Log-based Transactional Memory 9/17/2018 Why Hardware Transactional Memory (HTM)? Speed: HTMs faster than STMs Leverage cache coherence Mitigate extra indirection & copying Speed: HTMs faster than some lock regimes Auto-magical fine-grain Don’t have to get lock Speed: Whole reason for parallelism But HTM virtualization issues Cache size & associativity, OS Calls Paging, process switching & migration LogTM helps  Needs work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

100 Conflict Detection in HTM
LogTM: Log-based Transactional Memory 9/17/2018 Conflict Detection in HTM Most Hardware TMs Do eager conflict detection (at read/writes) Leveraging invalidation-based cache coherence Most Hardware TMs add Add per-processor transactional write (W) & read (R) bits Setting W bit requires M state; setting R requires S or M Ensures coherence protocol detects transactional data conflicts E.g., Writer seeks M copy, seeks S copies, & finds R bit set 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

101 LogTM: Log-based Transactional Memory
9/17/2018 The State of the World* GHz race is over Frequency increase limited by heat and power constraints Size of processor limited by communication delay, not transistors Increasing wire delay on chip All high-performance processors will be CMP Software must become parallel *(in computer architecture) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

102 Parallel Programming is Hard!
LogTM: Log-based Transactional Memory 9/17/2018 Parallel Programming is Hard! Data races cause subtle bugs Locks are a mess Deadlock Granularity problem Not composable Lock-free solutions still challenging We need a better way to write parallel software 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

103 Solution: Let the hardware help
LogTM: Log-based Transactional Memory 9/17/2018 Solution: Let the hardware help Provide a better interface for parallel software Plenty of transistors Access to run-time information Transactional Memory Intuitive interface -- serial execution High performance -- run transactions in parallel when possible Current cache coherence schemes already do much of the work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

104 LogTM: Log-based Transactional Memory
9/17/2018 LogTM Overview Hardware Transactional Memory promising Most use lazy version management Old values “in place” New values “elsewhere” Commits slower than aborts But commits more common New LogTM: Log-based Transactional Memory Uses eager version management (like most databases) Old values to log in thread-private virtual memory New values “in place” Makes common commits fast! Also allows cache overflow & software abort handling 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

105 What is Transactional Memory?
LogTM: Log-based Transactional Memory 9/17/2018 What is Transactional Memory? void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } LOCK(s); LOCK(d); UNLOCK(d); UNLOCK(s); Atomic and isolated execution Replaces locks for many applications No lock granularity problem No deadlock Composable synchronization move(a, b, key1); move(b, a, key2); Thread 0 Thread 1 DEADLOCK! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

106 LogTM: Log-based Transactional Memory
9/17/2018 Single-CMP System L1 $ Core1 L1$ Core2 L1$ Core14 L1$ Core15 L1$ Core16 Interconnect L2 $ DRAM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

107 LogTM: Log-based Transactional Memory
9/17/2018 Methods Simulated Machine: 32-way non-CMP 32 SPARC V9 processors running Solaris 9 OS 1 GHz in-order processors w/ ideal IPC=1 & private caches 16 kB 4-way split L1 cache, 1 cycle latency 4 MB 4-way unified L2 cache, 12 cycle latency 4 GB main memory, 80-cycle access latency Full-bit vector directory w/ directory cache Hierarchical switch interconnect, 14-cycle latency Simulation Infrastructure Virtutech Simics for full-system function Multifacet GEMS for memory system timing (Ruby only) GPL Release: Magic no-ops instructions for begin_transaction()etc. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

108 Microbenchmark Analysis
LogTM: Log-based Transactional Memory 9/17/2018 Microbenchmark Analysis Shared Counter All threads update the same counter High contention Small Transactions LogTM v. Locks EXP - Test-And-Test-And-Set Locks with Exponential Backoff MCS - Software Queue-Based Locks BEGIN_TRANSACTION(); new_total = total.count + 1; private_data[id].count++; total.count = new_total; COMMIT_TRANSACTION(); 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

109 LogTM: Log-based Transactional Memory
9/17/2018 Shared Counter LogTM (like other HTMs) does not read/write lock LogTM has few aborts despite conflicts 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

110 LogTM: Log-based Transactional Memory
9/17/2018 SPLASH2 Benchmarks Benchmark Input Synchronization Barnes 512 Bodies Locks on tree nodes Cholesky 14 Task queue locks Ocean Contiguous partitions, 258 Barriers Radiosity Room Task queue and buffer locks Raytrace Small image (teapot) Work list and counter locks Raytrace-Opt Water N-Squared 512 Molecules barriers 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

111 SPLASH2 Benchmark Results
LogTM: Log-based Transactional Memory 9/17/2018 SPLASH2 Benchmark Results 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

112 SPLASH2 Benchmark Results
LogTM: Log-based Transactional Memory 9/17/2018 SPLASH2 Benchmark Results Benchmark Transactions % Stalls % Aborts % R-M-W Barnes 3,067 4.89 15.3 27.9 Cholesky 22,309 4.54 2.07 82.3 Ocean 6,693 .30 .52 100 Radiosity 279,750 3.96 1.03 82.7 Raytrace-Base 48,285 24.7 1.24 99.9 Raytrace-Opt 47,884 2.04 .41 Water 35,398 .11 99.6  Conflicts Less Common   Aborts  Very few aborts (except Barnes) Software implementation practical Stalls more frequent than aborts Waiting can eliminate unnecessary aborts Most trans. data read before written 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

113 LogTM: Log-based Transactional Memory
9/17/2018 LogTM Virtual Memory No limit on transaction size New values stored in place (even in main memory) All-hardware conflict detection using “sticky states” Aborts processed in software New Values Transaction Logs Old Values HPCA LogTM: Log-Based Transactional Memory, Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

114 LogTM: Log-based Transactional Memory
9/17/2018 Nested LogTM Transaction Log Supports closed and open nesting by: Splitting log into “frames” (like a stack of activation records) Replicating R/W bits Escape actions provide non-transactional execution for system calls and I/O Header Level 0 Undo record Undo record Header Level 1 Undo record Undo record ASPLOS Supporting Nested Transactional Memory in LogTM, Michelle J. Moravan, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D. Hill, Ben Liblit, Michael M. Swift and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

115 LogTM-SE: Signature Edition
LogTM: Log-based Transactional Memory 9/17/2018 LogTM-SE: Signature Edition Nested LogTM has several implementation issues Nesting depth limited by hardware Multiple R and W bits per cache block SMT makes this worse Mucks with latency critical L1 cache Not easy to virtualize Decouple conflict detection from L1 cache array Use Signatures to conservatively detect conflicts E.g., Bloom filters Small filters sufficient for most transactions 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

116 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-SE and Nesting Single hardware signature Save current signature on nested begin On conflict, abort inner transaction and reload signature Check if conflict resolved, if not repeat Closed nested commit No change to hardware signature Child merges with parent Open nested commit Restore saved signature from log 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

117 Virtualizing LogTM-SE
LogTM: Log-based Transactional Memory 9/17/2018 Virtualizing LogTM-SE Cache overflow Sticky-states or broadcast coherence Ensures conflict detection Filter (conservatively) checks for conflicts Thread suspension/migration Second hardware signature Summarizes suspended transactions OS manages on scheduling events Paging Pageout checks for (potential) conflict, OS saves state Pagein updates filters with new physical address Skip Other >> 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

118 Characterization of Java Middleware:
LogTM: Log-based Transactional Memory 9/17/2018 Characterization of Java Middleware: ICPP Exploring Processor Design Options for Java Based Middleware HPCA Memory System Behavior of Java-Based Middleware Martin Karlsson, Kevin E. Moore, Erik Hagersten and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

119 Closed Nesting in LogTM
LogTM: Log-based Transactional Memory 9/17/2018 Closed Nesting in LogTM Conflict Detection Nested LogTM replicates R/W bits for each level Flash-Or circuit merges child and parent R/W bits Version Management Nested LogTM segments the log into frames (similar to a stack of activation records) R W R W Tag Data 1 1 1 1 Data Caches Registers Register Checkpoint LogFrame LogBase TMcount LogPtr Processor 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

120 LogTM: Log-based Transactional Memory
9/17/2018 Hardware State R and W bit per cache line track read and write sets Overflow bit Register checkpoint Fast save/restore Log Base and Log Pointer registers TM nesting count R W Tag Data Overflow Data Cache Registers Register Checkpoint LogBase TMcount LogPtr Processor 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

121 How Do Transactional Memory Systems Differ?
LogTM: Log-based Transactional Memory 9/17/2018 How Do Transactional Memory Systems Differ? Lazy Version Management Eager Version Management Lazy Conflict Detection Eager Conflict Detection Databases with Optimistic Conc. Ctrl. Not done (yet) Stanford TCC UIUC Bulk Databases with Conservative C. Ctrl. Herlihy/Moss TM MIT LTM Intel/Brown VTM MIT UTM Wisconsin LogTM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

122 Virtualization Challenge
LogTM: Log-based Transactional Memory 9/17/2018 Virtualization Challenge Hardware TM Implementations Finite – Hardware Signatures Mutiplexed – Thread Switching, Virtual Memory LogTM-SE Version Management Transaction Log Virtual Memory Conflict Detection Signatures Physical Addresses Already Virtualized Coming up… 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

123 LogTM: Log-based Transactional Memory
9/17/2018 Open Nesting Child transaction exposes state on commit (i.e., before the parent commits) Raise level of abstraction for isolation and abort Eliminates semantically unnecessary conflicts Increases concurrency Higher-level isolation Release memory-level isolation Programmer enforce isolation at higher level (e.g., locks) Use commit action to release isolation at parent commit Higher-level abort Child’s memory updates not undone if parent aborts Use compensating action to undo the child’s forward action at a higher-level of abstraction E.g., malloc() compensated by free() 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

124 Commit and Compensating Actions
LogTM: Log-based Transactional Memory 9/17/2018 Commit and Compensating Actions Commit Actions Execute when innermost open ancestor commits Outermost transaction is considered open Use to release isolation at higher-level of abstraction Compensating Actions Discard when innermost open ancestor commits Execute in LIFO order when ancestor aborts Execute “in the state that held when its forward action commited” [Moss, TRANSACT ‘06] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

125 LogTM: Log-based Transactional Memory
9/17/2018 Open Nested Example insert(int key, int value) { open_begin; leaf = find_leaf(key); entry = insert_into_leaf(key, value); // lock entry to isolate node entry->lock = 1; open_commit(abort_action(delete(key)), commit_action(unlock(key))); } insert_set(set S) { while ((key,value) = next(S)) insert(key, value); open_commit(abort_action(delete_set(S)));  Isolate entry at higher-level of abstraction  Delete entry if ancestor aborts  Release high-level isolation on ancestor commit  Replace compensating action with higher-level action on commit 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

126 Timing of Compensating Actions
LogTM: Log-based Transactional Memory 9/17/2018 Timing of Compensating Actions // initialize to 0 counter = 0; transaction_begin(); // top-level 1 counter++; // counter gets 1 open_begin(); // level 2 counter++; // counter gets 2 open_commit(abort_action(counter--)); ... // Abort and run compensating action // Expect counter to be restored to 0 transaction_commit(); // not executed LogTM behaves correctly Compensating action sees the state of the counter when the open transaction committed (2) Decrement restores the value to what it was before the open nest executed (1) Undo of the parent restores the value back to (0) TCC doesn’t Counter ends up at 1 Condition O1: No writes to blocks written by an ancestor transaction. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

127 LogTM: Log-based Transactional Memory
9/17/2018 Open Nesting in LogTM Conflict Detection R/W bits cleared on open commit (no flash or) Version Management Open commit pops the most recent frame off the log (Optionally) add commit and compensating action records Compensating actions are run by the software abort handler Software handler interleaves restoration of memory state and compensating action execution 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

128 LogTM: Log-based Transactional Memory
9/17/2018 Open Nested Commit Discard child’s log frame (Optionally) append commit and compensating actions to log Header LogFrame Undo record LogPtr Undo record TM count 1 2 Commit Action Header Comp Action Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

129 LogTM: Log-based Transactional Memory
9/17/2018 Paging Support Why? Support Large Transactions. What? Physical Relocation of Virtual Pages How? Update Signatures on paging activity 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

130 LogTM: Log-based Transactional Memory
9/17/2018 Updating Signatures Suppose: Virtual Page (VP) 0x > Physical Frame(PP) 0x1000 Signature A: {0x1040,0x1080, 0x30c0} At Page Out: Remember 0x40000->0x1000 At Page In: Suppose 0x40000->0x2000 Signature A: {0x1040,0x1080, 0x2040, 0x2080,0x30c0} 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

131 Paging Support Summary
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Paging Support Summary Problem: Changing page frames Need to maintain isolation on transactional blocks Solution: On Page-Out: Save Virtual -> Physical mapping On Page-In: If different page frame, update signatures with physical address of transactional blocks in new page frame. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 133

132 LogTM: Log-based Transactional Memory
9/17/2018 The State of the World* Chip-multiprocessors/Multi-core/Many-core are here “Intel has 10 projects in the works that contain four or more computing cores per chip” -- Paul Otellini, Intel CEO, Fall ’05 GHz race is over Frequency increase limited by heat and power constraints Size of processor limited by communication delay, not transistors Increasing wire delay on chip All high-performance processors will be CMP Software must become parallel *(in computer architecture) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

133 Parallel Programming is Hard!
LogTM: Log-based Transactional Memory 9/17/2018 Parallel Programming is Hard! Data races cause subtle bugs Locks are a mess Deadlock Granularity problem Not composable Lock-free solutions still challenging We need a better way to write parallel software 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

134 Solution: Let the hardware help
LogTM: Log-based Transactional Memory 9/17/2018 Solution: Let the hardware help Provide a better interface for parallel software Plenty of transistors Access to run-time information Transactional Memory Intuitive interface -- serial execution High performance -- run transactions in parallel when possible Current cache coherence schemes already do much of the work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

135 LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Combined Hardware/Software Implementation Conflicts detected in hardware Aborts processed in software Policy-Free Hardware Simple hardware primitives Software-accessible state Supports Transactions with: Large memory footprints Thread switching Unbounded nesting Paging 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

136 LogTM: Log-based Transactional Memory
9/17/2018 Transactional Memory Promising programming technique: begin_transaction { atomic execution } end_transaction Good first step Likely benefits Can be integrated into current hardware and programming languages Will not save the world 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

137 Nested Transactions for Software Composition
LogTM: Log-based Transactional Memory 9/17/2018 Nested Transactions for Software Composition Modules expose interfaces, NOT implementations Example Insert() calls getID() from within a transaction The getID() transaction is nested inside the insert() transaction void insert(object o){ // parent TX begin_transaction(); t.insert(getID(), o); commit_transaction(); } int getID() { // child TX begin_transaction(); id = global_id++; commit_transaction(); return id; } 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

138 LogTM: Log-based Transactional Memory
9/17/2018 Closed Nesting Child transactions remain isolated until parent commits On Commit child transaction is merged with its parent Flat Nested transactions “flattened” into a single transaction Only outermost begins/commits are meaningful Any conflict aborts to outermost transaction Partial rollback Child transaction can be aborted independently Can avoid costly re-execution of parent transaction But child merges transaction state with parent on commit So most conflicts with child end up affecting the parent 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

139 Thesis: We need new hardware and software
LogTM: Log-based Transactional Memory 9/17/2018 Thesis: We need new hardware and software Architects should devote resources to support parallelism Manycore will succeed only if we find a way to program it (only if software is parallel) Using resources to facilitate parallelism is less risky Hardware Primitives & Software Solutions HW Implements difficult functions Coordinated by SW We should be exploring ways in hardware can 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

140 Segmented Transaction Log for Nesting
LogTM: Log-based Transactional Memory 9/17/2018 Segmented Transaction Log for Nesting LogTM’s log is a stack of frames A frame contains: Header (including saved registers and pointer to parent’s frame) Undo records (block address, old value pairs) Garbage headers (headers of committed closed transactions) Commit action records Compensating action records Header LogFrame Undo record LogPtr Undo record TM count 2 1 Header Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

141 LogTM: Log-based Transactional Memory
9/17/2018 Closed Nested Commit Merge child’s log frame with parent’s Mark child’s header as “dummy header” Copy pointer from child’s header to LogFrame Header LogFrame Undo record LogPtr Undo record TM count 2 1 Header Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

142 LogTM: Log-based Transactional Memory
9/17/2018 LogTM-SE Signatures Conflict-detection signatures Summarize read and write sets Similar to Bulk [ISCA 2006] Aliasing is a performance issue Results in false conflicts Rare for current apps Version-management signatures Prevent redundant entries in the log Aliasing is a functional issue Results in incorrect abort Use small full-address filter Some redundant log entries 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

143 LogTM-SE: Unbounded Nesting Support
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 LogTM-SE: Unbounded Nesting Support Why? Composability: libraries Software Constructs: Retry, OrElse [Harris, PPoPP ‘05] What? Signatures for each nesting level How? One R / W signature set per SMT thread Save / Restore signatures using Transaction Log 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 145

144 LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Nested Begin Program Processor State Transaction Log xbegin LD … ST … R W Xact header Undo entry Undo entry TMCount 1 Undo entry Log Frame Xact header Log Ptr 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 146

145 LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Nested Begin Program Processor State Transaction Log xbegin LD … ST … R W Xact header Undo entry Undo entry TMCount 2 Undo entry Log Frame Xact header Log Ptr 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 147

146 LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 9/17/2018 10/25/06 Partial Abort Program Processor State Transaction Log xbegin LD … ST … ABORT! R W Xact header Undo entry Undo entry TMCount 2 1 Undo entry Log Frame Xact header Log Ptr Undo entry Undo entry 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 148

147 LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition 9/17/2018 10/25/06 Nested Commit Program Processor State Transaction Log xbegin LD … ST … xend R W Xact header Undo entry Undo entry TMCount 1 2 Undo entry Log Frame Xact header Log Ptr Undo entry Undo entry 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 149

148 Unbounded Nesting Support Summary
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Commit: restore signatures 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 150

149 LogTM: Log-based Transactional Memory
9/17/2018 Terminology Transaction: A transformation of state that is: Atomic (all or nothing), Consistent, Isolated (serializable) and Durable (permanent) Commit: Successful completion of a transaction Abort: Unsuccessful termination of a transaction, requiring that all updates from the transaction are undone Conflict:Two transactions conflict if both access the same object and at least one of the accesses is an update 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar


Download ppt "Log-Based Transactional Memory"

Similar presentations


Ads by Google