Download presentation
Presentation is loading. Please wait.
1
Log-Based Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Log-Based Transactional Memory Kevin E. Moore UW-Madison Architecture Seminar
2
LogTM: Log-based Transactional Memory
9/17/2018 Motivation Chip-multiprocessors/Multi-core/Many-core are here “Intel has 10 projects in the works that contain four or more computing cores per chip” -- Paul Otellini, Intel CEO, Fall ’05 We must effectively program these systems But programming with locks is challenging “Blocking on a mutex is a surprisingly delicate dance” OpenSolaris, mutex.c 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
3
LogTM: Log-based Transactional Memory
9/17/2018 Locks are Hard // WITH LOCKS void move(T s, T d, Obj key){ LOCK(s); LOCK(d); tmp = s.remove(key); d.insert(key, tmp); UNLOCK(d); UNLOCK(s); } Moreover Coarse-grain locking limits concurrency Fine-grain locking difficult move(a, b, key1); move(b, a, key2); Thread 0 Thread 1 DEADLOCK! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
4
Transactional Memory (TM)
LogTM: Log-based Transactional Memory 9/17/2018 Transactional Memory (TM) void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } Programmer says “I want this atomic” TM system “Makes it so” Software TM (STM) Implementations Currently slower than locks Always slower than hardware? Hardware TM (HTM) Implementations Leverage cache coherence & speculation Fast But hardware overheads and virtualization challenges 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
5
Goals for Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Goals for Transactional Memory Efficient Implementation Make the common case fast Can’t justify expensive HW (yet) Virtualizing TM Don’t limit programming model Allow transactions of any size and duration 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
6
LogTM: Log-based Transactional Memory
9/17/2018 Implementing TM Version Management new values for commit old values for abort Must keep both Conflict Detection Find read-write, write-read or write-write conflicts among concurrent transactions Allows multiple readers OR one writer Large state (must be precise) Checked often (must be fast) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
7
LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Combined Hardware/Software Transactional Memory Conservative hardware conflict detection Software version management (with some hardware support) Eager Version Management Stores new values in place Stores old values in user virtual memory (the transaction log) Eager Conflict Detection Detects transaction conflicts on each load and store Apply this strategy to TM. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
8
LogTM: Log-based Transactional Memory
9/17/2018 LogTM Publications [HPCA 2006] LogTM: Log-based Transactional Memory [ASPLOS 2006] Supporting Nested Transactional Memory in LogTM [HPCA 2007] LogTM-SE: Decoupling Hardware Transactional Memory from Caches [ISCA 2007] Performance Pathologies in Hardware Transactional Memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
9
LogTM: Log-based Transactional Memory
9/17/2018 Outline Introduction Background LogTM Implementing LogTM Evaluation Extending LogTM Related Work Conclusion 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
10
LogTM: Log-based Transactional Memory
9/17/2018 LOGTM UW-Madison Architecture Seminar
11
LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Eager Software-Based Version Management Store new values in place Store old values in the transaction log Undo failed transactions in software Eager All-Hardware Conflict Detection Isolate new values Fast conflict detection for all transactions Apply this strategy to TM. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
12
LogTM’s Eager Version Management
LogTM: Log-based Transactional Memory 9/17/2018 LogTM’s Eager Version Management New values stored in place Old values stored in the transaction log A per-thread linear (virtual) address space (like the stack) Filled by hardware (during transactions) Read by software (on abort) VA Data Block R W 00 12 1 40 24 80 56 Log Base C0 C0 90 34 7c 23 Transaction Log Log Ptr E8 00 15 100 TM count 1 <example> 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
13
Eager Version Management Discussion
LogTM: Log-based Transactional Memory 9/17/2018 Eager Version Management Discussion Advantages: No extra indirection (unlike STM) Fast Commits No copying Common case Disadvantages Slow/Complex Aborts Undo aborting transaction Relies on Eager Conflict Detection/Prevention 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
14
LogTM’s Eager Conflict Detection
Requirements for Conflict Detection in LogTM: Transactions Must Be Well Formed Each thread must obtain read isolation on all memory locations read and write isolation on all locations written Isolation Must be Strict Two-Phase Any thread that acquires read or write isolation on a memory location in a transaction must maintain that isolation until the end of the transaction Isolation Must Be Released at the End of a Transaction Because conflicts may prevent transactions from making progress, a thread completing a transaction must release isolation when it aborts or commits a transaction 9/17/2018 Wisconsin Multifacet Project
15
LogTM’s Conflict Detection in Practice
LogTM: Log-based Transactional Memory 9/17/2018 LogTM’s Conflict Detection in Practice LogTM detects conflicts using coherence Requesting processor issues coherence request to memory system Coherence mechanism forwards to other processor(s) Responding processor detects conflict using local state & informs requesting processor of conflict Requesting processor resolves conflict (discussed later) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
16
Example Implementation (LogTM-Dir)
LogTM: Log-based Transactional Memory 9/17/2018 Example Implementation (LogTM-Dir) P0 store P0 sends get exclusive (GETX) request Directory responds with data (old) P0 executes store Directory GETX [old] I [old] DATA P0 P1 Metadata (--) (--) (W-) Metadata (--) M [new] I [none] M [old] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
17
Example Implementation (LogTM-Dir)
LogTM: Log-based Transactional Memory 9/17/2018 Example Implementation (LogTM-Dir) In-cache transaction conflict P1 sends get shared (GETS) request Directory forwards to P0 P1 detects conflict and sends NACK Directory [old] GETS Fwd_GETS P0 P1 Metadata (W-) Metadata (--) M [new] M [new] I [none] Conflict! NACK 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
18
LogTM: Log-based Transactional Memory
9/17/2018 Conflict Resolution Conflict Resolution Can wait risking deadlock Can abort risking livelock Wait/abort transaction at requesting or responding proc? LogTM resolves conflicts at requesting processor Requesting can wait (using coherence nacks/retries) But must abort if deadlock is possible Requester Stalls Policy Logically order transactions with timestamps On conflict notification, wait unless already causing an older transaction to wait 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
19
LogTM: Log-based Transactional Memory
9/17/2018 LogTM API User System/Library Low-Level begin_transaction() commit_transaction() abort_transaction() Initialize_logtm_transactions() Register_abort_handler(void (*) handler) Undo_log_entry() Complete_abort_with_restart() Complete_abort_wo_restart() 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
20
LogTM: Log-based Transactional Memory
9/17/2018 IMPLEMENTING LOGTM UW-Madison Architecture Seminar
21
Version Management Trade-offs
Hardware vs. Software Register Checkpointing Implicit vs. Explicit Logging Buffered vs. Direct Logging Logging Granularity Logging Location 9/17/2018 Wisconsin Multifacet Project
22
Compiler-Supported Software Logging
Software Register Checkpointing Compiler generates instructions to save registers to transaction log Software-only logging Compiler generates instructions to save old values and to the transaction log Lowest implementation cost All-software version management High overhead Slow to start transactions (save registers) Slow writes (extra load & instructions) 9/17/2018 Wisconsin Multifacet Project
23
In-Cache Hardware Logging
Hardware Register Checkpointing Bulk save architectural registers (like USIII) Hardware Logging Hardware saves old values and virtual address to memory at the first level of writeback cache Best Performance Little or no logging-induced delay Single-cycle transaction begin/commit Complex implementation Shadow register file Buffering and forwarding logic in caches 9/17/2018 Wisconsin Multifacet Project
24
In-Cache Hardware Logging
L1 D Cache L2 Cache ECC Log Target VA Bank 0 Bank 1 ECC Log Target VA ECC Store Target ECC Store Target ECC Data ECC Data CPU Store Buffer L1 D L1 D CPU CPU Store Buffer 9/17/2018 Wisconsin Multifacet Project
25
Hardware/Software Hybrid Buffered Logging
Hardware Register Checkpointing Bulk save architectural registers (like USIII) Buffered Logging Hardware saves old values and virtual address to a small buffer Good Performance Little or no logging-induced delay for small transactions Single-cycle transaction begin/commit Reduces processor-to-cache memory traffic Less-complex implementation Shadow register file No changes to caches 9/17/2018 Wisconsin Multifacet Project
26
Hardware/Software Hybrid Buffered Logging
Cache Log Target VA Store Target CPU Log Buffer Store Buffer Store Buffer Transaction Execution Buffer Spill Register File Register File 9/17/2018 Wisconsin Multifacet Project
27
Implementing Conflict Detection
Existing cache coherence mechanisms can support conflict detection for cached data by adding an R (read) W (write) bit to each cache line Challenges for detecting conflicts on un-cached data differ for broadcast and directory systems Broadcast Easy to find all possible conflicts Hard to filter false conflicts Directory Hard to find all possible conflicts Easy to filter false conflicts 9/17/2018 Wisconsin Multifacet Project
28
LogTM-Bcast Adds a Bloom Filter to track memory blocks touched in a transaction, then evicted from the cache Allows any number of addresses to be added to the filter Detects all true conflicts Allows some false conflicts L2 Cache Tag RW Data Overflow filter R W L1 D CPU L1 I 9/17/2018 Wisconsin Multifacet Project
29
Extends a standard MESI directory with sticky states
LogTM-Dir Extends a standard MESI directory with sticky states The directory continues to forward coherence traffic for a memory location to processors that touch that location in a transaction then evict it from the cache Removes most false conflicts with a single overflow bit per cache 9/17/2018 Wisconsin Multifacet Project
30
LogTM: Log-based Transactional Memory
9/17/2018 Sticky States Directory State M S I E Sticky-M Sticky-S Cache State 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
31
LogTM-Dir Conflict Detection w/ Cache Overflow
LogTM: Log-based Transactional Memory 9/17/2018 LogTM-Dir Conflict Detection w/ Cache Overflow At overflow at processor P0 Set P0’s overflow bit (1 bit per processor) Allow writeback, but set directory state to At (potential) conflicting request by processor P1 Directory forwards P1’s request to P0. P0 tells P1 “no conflict” if overflow is reset But asserts conflict if set (w/ small chance of false positive) At transaction end (commit or abort) at processor P0 Reset P0’s overflow bit Clean sticky states lazily on next access 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
32
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Cache overflow P0 sends put exclusive (PUTX) request Directory acknowledges P0 writes data back to memory Directory [new] [old] PUTX ACK DATA P0 P1 TM count 1 TM count R/W (W-) R/W (--) M [new] I [none] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
33
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible) conflict P0 sends NACK Directory [new] [old] GETS Fwd_GETS P0 P1 TM count 1 TM count Signature (-W) Signature (--) I [none] I (--) [none] M (--) [old] M (-W) [new] I [none] Conflict! NACK 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
34
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Commit P0 clears TM count and Signature Directory [new] [old] P0 P1 TM count 1 TM count Signature (--) (-W) Signature (--) M (--) [old] I [none] I (--) [none] M (-W) [new] I [none] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
35
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-Dir Lazy cleanup P1 sends GETS request Directory forwards request to P0 P0 detects no conflict, sends CLEAN Directory sends Data to P1 Directory [new] S(P1) [new] GETS CLEAN DATA Fwd_GETS P0 P1 TM count TM count Signature (--) Signature (--) (R-) I (--) [none] M (--) [old] I [none] M (-W) [new] I (--) [none] S [new] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
36
LogTM: Log-based Transactional Memory
9/17/2018 EVALUATION UW-Madison Architecture Seminar
37
System Model LogTM-Dir
In-Cache Hardware Logging & Hybrid Buffered Logging Component Settings Processors 32, 1 GHz, single-issue, in-order, non-memory IPC=1 L1 Cache 16 kB 4-way split, 1-cycle latency L2 Cache 4 MB 4-way unified, 12-cycle latency Memory 4 GB, 80-cycle latency Directory Full-bit-vector sharers list, directory cache, 6-cycle latency Interconnection Network Hierarchical switch topology, 14-cycle link latency 9/17/2018 Wisconsin Multifacet Project
38
Benchmarks Benchmark Synchronization Inputs Shared Counter
Counter lock 2500 cycle random think time B-Tree Transactions only 9-ary tree, 5 levels deep Barnes Locks on tree nodes 512 bodies Cholesky Task queue locks 14 Berkeley DB (BkDB) Locks on object lists 512 operations MP3D Locks 4096 molecules Radiosity Large room Raytrace Work list and counter locks Car 9/17/2018 Wisconsin Multifacet Project
39
Read Set Size 9/17/2018 Wisconsin Multifacet Project
40
Write Set Size 9/17/2018 Wisconsin Multifacet Project
41
Microbenchmark Scalability
Btree 0%, 10% and 20% Updates Shared Counter: LogTM vs. Locks 9/17/2018 Wisconsin Multifacet Project
42
Benchmark Scalability
Barnes BkDB 9/17/2018 Wisconsin Multifacet Project
43
Benchmark Scalability
Cholesky MP3D 9/17/2018 Wisconsin Multifacet Project
44
Benchmark Scalability
Radiosity Raytrace 9/17/2018 Wisconsin Multifacet Project
45
Scalability Summary Benchmarks scale as well or better using LogTM transactions Performance is better for all benchmarks LogTM improves the scalability of some benchmarks, but not others Abort rates are low Next: Write set prediction Buffered Logging Log Granularity 9/17/2018 Wisconsin Multifacet Project
46
Write Set Prediction Predicts if the target of each load will be modified in this transaction Eagerly acquires write isolation Reduces “waits for” cycles that force aborts in LogTM Four Predictors: None -- Never predict 1-Entry -- Remembers a single address Load PC -- History based on PC of load instruction Always -- Acquire write isolation for all loads and stores 9/17/2018 Wisconsin Multifacet Project
47
Abort Rate with Write Set Prediction
9/17/2018 Wisconsin Multifacet Project
48
Performance Impact of WSP
9/17/2018 Wisconsin Multifacet Project
49
Impact of Buffer-Spill Stalls
9/17/2018 Wisconsin Multifacet Project
50
Log Granularity 9/17/2018 Wisconsin Multifacet Project
51
Modeling Abort Penalty
Delays coherence requests Delays transaction restart Penalty consists of: Trap overhead (constant) Rollback overhead (per log entry) Measured performance for 3 settings: Ideal -- single-cycle abort Medium cycle trap, 40-cycle per undo record Slow cycle trap, 200-cycle per undo record 9/17/2018 Wisconsin Multifacet Project
52
Sensitivity to Abort Penalty (no WSP)
9/17/2018 Wisconsin Multifacet Project
53
Sensitivity to Abort Penalty (with WSP)
9/17/2018 Wisconsin Multifacet Project
54
LogTM: Log-based Transactional Memory
9/17/2018 EXTENDING LOGTM UW-Madison Architecture Seminar
55
Extending LogTM Supporting Nesting in LogTM
Support nested VM by segmenting the transaction log Non-transactional escape actions facilitate OS interactions Virtualizing Conflict Detection with Signatures LogTM-Signature Edition (LogTM-SE) tracks read and write sets with signatures (like Bloom Filters) Supports thread switching and paging by saving, restoring and manipulating signatures 9/17/2018 Wisconsin Multifacet Project
56
LogTM: Log-based Transactional Memory
9/17/2018 RELATED WORK UW-Madison Architecture Seminar
57
Related Work Hardware Support for Database Transactions
Early Transactional Memory Systems Hardware TM (HTM) Software TM (STM) Hybrid TM TM Virtualization 9/17/2018 Wisconsin Multifacet Project
58
Early Transactional Memory Systems
Hardware Support for Database Transactions 801 Storage System Database-like transactions on 1-level store (memory and disk) Transactions are durable Early HTM Knight used transactions to parallelize code written in ‘mostly functional’ languages Herlihy and Moss First HTM Implementation based on a separate transaction cache Transactions limited to cached data 9/17/2018 Wisconsin Multifacet Project
59
Unbounded Transactional Memory
Uses Eager VM and Eager CD Supports unbounded transactions in hardware Complex hardware Pointer and state bits for each line in memory Hardware state machine for transaction rollback Global virtual address space 9/17/2018 Wisconsin Multifacet Project
60
Transactional Memory Coherence and Consistency (TCC)
On-Chip Interconnect Broadcast-Based Communication Write buffer ~4 kB, Fully-Associative L2 Cache Logically Shared CPU L1 D R L1 Cache tracks read set 9/17/2018 Wisconsin Multifacet Project
61
Encodes read and write sets in signatures (like bloom filters)
Bulk Encodes read and write sets in signatures (like bloom filters) Like TCC, uses lazy VM and lazy CD Can detect conflicts for non-cached data 9/17/2018 Wisconsin Multifacet Project
62
Hybrid Transactional Memory
Combines HTM and STM Executes small transactions in hardware, large transactions in software Allows program execution on existing hardware (without HTM support) 9/17/2018 Wisconsin Multifacet Project
63
Transaction Virtualization
Virtual Transactional Memory (VTM) Rajwar and Herlihy Adds a virtualization mechanism to limited HTM (e.g. Herlihy and Moss TM) Implements CD and VM for transactions that exceed hardware capabilities in micro-code Page-granularity Transaction Virtualization PTM -- Chuang et al. XTM -- Chung et al. 9/17/2018 Wisconsin Multifacet Project
64
HTM Virtualization Mechanisms
Before Virtualization After Virtualization $Miss Commit Abort $Eviction Paging Thread Switch UTM - H HC VTM S SC SWV UnrestrictedTM A B AS XTM XTM-g ASC SCV PTM-Copy PTM-Select LogTM-SE Shaded = virtualization event - = handled in simple HW H = complex hardware S = handled in software A = abort transaction C = copy values W = walk cache V = validate read set B = block other transactions 9/17/2018 Wisconsin Multifacet Project
65
LogTM: Log-based Transactional Memory
9/17/2018 Conclusion TM can make parallel programs faster and easier to write LogTM provides: Hardware/Software Implementation Simple, flexible hardware Software-Based Eager Version Management Makes the common case (commit) fast Reduces hardware complexity Hardware-Based Eager Conflict Detection Allows blocking to reduce wasted work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
66
Thanks to my Collaborators
LogTM: Log-based Transactional Memory 9/17/2018 Thanks to my Collaborators Jayaram Bobba, Mark Hill, Derek Hower, Steve Jackson, Nick Kidd, Ben Liblit, Mike Marty, Michelle Moravan, Tom Reps, Mike Swift, Haris Volos, David Wood, Luke Yen 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
67
LogTM: Log-based Transactional Memory
9/17/2018 BACKUP SLIDES 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
68
Database Locks and Cache Coherence States
Cache State No Lock I S E, O, S X M Coherence states are analogous to short database locks Most protocols have no provision to hold long locks 9/17/2018 Wisconsin Multifacet Project
69
Herlihy and Moss, ISCA 1993 CPU Transaction cache Memory Long Locks
Stores all data accessed by transactions 2 copies of each updated cache line Fully associative Acts as a victim cache Long Locks Processors are allowed to refuse coherence requests Memory M S XCommit XAbort Cache Transaction Cache CPU 9/17/2018 Wisconsin Multifacet Project
70
Transactions Limited by Cache Size and Associativity
Exposes the size of the transaction cache to the architecture Requires minimum associativity Difficult for dynamic transactions 9/17/2018 Wisconsin Multifacet Project
71
Transactional Lock Removal (TLR)
Uses Speculative Lock Elision (SLE) to elide lock operations in short critical sections Extends SLE with lock-based concurrency control Long locks – processors can defer coherence responses during speculative transactions 9/17/2018 Wisconsin Multifacet Project
72
LogTM-SE Processor Hardware
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 9/17/2018 10/25/06 LogTM-SE Processor Hardware Segmented log, like LogTM Track R / W sets with R / W signatures Over-approximate R / W sets Tracks physical addresses Summary signature used for virtualization Conflict detection by coherence protocol Check signatures on every memory access for SMT Registers Register Checkpoint LogFrame TMcount Read LogPtr Write SummaryRead SummaryWrite SMT Thread Context Tag Data NO TM STATE Data Caches 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 72
73
LogTM: Log-based Transactional Memory
9/17/2018 Escape Actions Allow non-transactional escapes from a transaction (e.g., system calls, I/O) Similar to Zilles’s pause/unpause Escape actions never: Abort Stall Cause other transactions to abort Cause other transactions to stall Commit and compensating actions similar to open nests Not recommended for the average programmer! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
74
Escape Actions in LogTM
LogTM: Log-based Transactional Memory 9/17/2018 Escape Actions in LogTM Loads and stores to non-transactional blocks behave as normal coherent accesses Loads return the latest value in coherent memory Loads to a transactionally modified cache block triggers a writeback (sticky-M state) Memory responds with an uncacheable copy of the block Stores modify coherent memory Stores to transactionally modified blocks trigger writebacks (sticky-M) Updates the value in memory (non-cacheable write through) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
75
Thread Switching Support
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write signatures: If thread t of process P is scheduled to use an active signature, the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 76
76
Handling Thread Switching
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Handling Thread Switching Summary W R Summary W R Summary W R Summary W R OS T2 T3 T1 W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 77
77
Handling Thread Switching
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Handling Thread Switching W OS Summary R Deschedule T2 T3 T1 Summary W R Summary W R Summary W R W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 78
78
Handling Thread Switching
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Handling Thread Switching W Summary W R Summary W R OS Summary R Deschedule T2 T3 T1 Summary W R Summary W R Summary W R W Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 79
79
Handling Thread Switching
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Handling Thread Switching W OS Summary R T1 T2 T3 W Summary W R Summary W R W Summary R Summary R W W W W R R R R P1 P2 P3 P4 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 80
80
Thread Switching Support Summary
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Thread Switching Support Summary Summary Read / Write signatures Summarizes descheduled threads with active transactions One OS structure per process Check summary signature on every memory access Updated on transaction deschedule Similar to TLB shootdown Coherence 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 81
81
LogTM: Log-based Transactional Memory
9/17/2018 Improving LogTM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
82
LogTM: Log-based Transactional Memory
9/17/2018 Comparing HTMs 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
83
Multifacet Group Projects:
LogTM: Log-based Transactional Memory 9/17/2018 Multifacet Group Projects: IEEE Computer - Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill and David A. Wood ASPLOS Timestamp Snooping: An Approach for Extending SMPs, Milo M. K. Martin, Daniel J. Sorin, Anastassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill, and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
84
How Do Transactional Memory Systems Differ?
LogTM: Log-based Transactional Memory 9/17/2018 How Do Transactional Memory Systems Differ? (Data) Version Management Eager: record old values “elsewhere”; update “in place” Lazy: update “elsewhere”; keep old values “in place” (Data) Conflict Detection Eager: detect conflict on every read/write Lazy: detect conflict at end (commit/abort) Fast commit Less wasted work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
85
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Initial State LogBase = LogPointer R & W bits are clear 00 40 C0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
86
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Load r1, (00) /* r1 gets 12 */ Set R bit for block (00) (no changes to log) 00 1 40 C0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
87
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Store r2, (c0) /* r2 = 56 */ Set W bit for block (c0) Store address (c0) and old data on the log Increment Log Ptr to 1048 Update memory 00 1 40 C0 1 1000 c0 Log Base 1000 1040 -- Log Ptr 1000 1048 1080 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
88
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Load r3, (78) Set R bit for block (40) R3 = r3 + 1 Store r3, (78) Set W bit for block (40) Store address (40) and old data on the log Increment Log Ptr to 1090 Update memory 00 1 40 1 1 C0 1 1000 c0 Log Base 1000 1040 -- 40 Log Ptr 1048 1090 1080 --23 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
89
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Commit transaction Clear R & W for all blocks Reset Log Ptr to Log Base (1000) Clear TM mode 00 1 40 C0 1000 c0 Log Base 1000 1040 -- 40 Log Ptr 1090 1000 1080 --23 TM mode 1 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
90
Transaction Log Example
LogTM: Log-based Transactional Memory 9/17/2018 Transaction Log Example VA Data Block R W Abort transaction Replay log entries to “undo” the transaction Reset Log Ptr to Log Base (1000) Clear R & W bits for all blocks Clear TM mode 00 1 40 C0 1000 -- c0 Log Base 1000 1040 --23 40 Log Ptr 1090 1000 1048 1080 TM mode 1 Back to Talk 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
91
LogTM: Log-based Transactional Memory
9/17/2018 Primitive: Logging Software defined log location (in virtual memory) Based on log pointer register Hardware copies old values and virtual address to memory at log pointer Overlaps logging with stores Allows logging with library calls 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
92
Primitive: Address Matching
LogTM: Log-based Transactional Memory 9/17/2018 Primitive: Address Matching Software creates and activates multiple contexts Not strictly nested Many uses: Hand-over-hand locking Pointer alias checks Transactional memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
93
LogTM: Log-based Transactional Memory
9/17/2018 LogTM Interface User-Level Begin/commit/abort System/Library Initialize transactions Register conflict handler Low-Level Undo log entry Complete abort with/without restart currently, undo log to abort, but conflict managers in future 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
94
LogTM: Log-based Transactional Memory
9/17/2018 HTM (in general) Version Management New values in cache Old values in memory Conflict Detection Coherence protocol detects conflicts Invalidate Memory Cache Cache M NEW S I S M NEW CPU CPU 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
95
Conflict Detection in Other TM Schemes
LogTM: Log-based Transactional Memory 9/17/2018 Conflict Detection in Other TM Schemes Cache overflow of transactional data hard for (Hardware) TM Prohibit: Herlihy/Moss TM Action at Overflow: LTM, VTM, & TCC 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
96
LogTM: Log-based Transactional Memory
9/17/2018 Outline Background/Motivation Multicores are her We need to program them Need HW/SW solution HW primitives SW Control TM Clear, intuitive model Likely benfits But, all-hw won’t work LogTM LogTM Family Eager Version Management Basic Log Segmented Log Eager Conflict Detection Signatures Coherence Sticky States Conflict Resolution Requester stalls Write set prediction Operating Systems Interaction Thread switching Skip paging Open Nesting + Escape Actions Future Work Deconstructing Transactional Memory 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
97
Software Transactional Memory
LogTM: Log-based Transactional Memory 9/17/2018 Software Transactional Memory Transactional programming w/o hardware support Atomic swap of pointers to enforce atomicity Adds a level of indirection 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
98
MSI Coherence 101 (per memory block)
LogTM: Log-based Transactional Memory 9/17/2018 MSI Coherence 101 (per memory block) States: M - one writer S - many readers I - no access Protocol: detects & orders data conflicts write-read read-write write-write E.g., Writer seeks M copy & must invalidate S copies 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
99
Why Hardware Transactional Memory (HTM)?
LogTM: Log-based Transactional Memory 9/17/2018 Why Hardware Transactional Memory (HTM)? Speed: HTMs faster than STMs Leverage cache coherence Mitigate extra indirection & copying Speed: HTMs faster than some lock regimes Auto-magical fine-grain Don’t have to get lock Speed: Whole reason for parallelism But HTM virtualization issues Cache size & associativity, OS Calls Paging, process switching & migration LogTM helps Needs work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
100
Conflict Detection in HTM
LogTM: Log-based Transactional Memory 9/17/2018 Conflict Detection in HTM Most Hardware TMs Do eager conflict detection (at read/writes) Leveraging invalidation-based cache coherence Most Hardware TMs add Add per-processor transactional write (W) & read (R) bits Setting W bit requires M state; setting R requires S or M Ensures coherence protocol detects transactional data conflicts E.g., Writer seeks M copy, seeks S copies, & finds R bit set 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
101
LogTM: Log-based Transactional Memory
9/17/2018 The State of the World* GHz race is over Frequency increase limited by heat and power constraints Size of processor limited by communication delay, not transistors Increasing wire delay on chip All high-performance processors will be CMP Software must become parallel *(in computer architecture) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
102
Parallel Programming is Hard!
LogTM: Log-based Transactional Memory 9/17/2018 Parallel Programming is Hard! Data races cause subtle bugs Locks are a mess Deadlock Granularity problem Not composable Lock-free solutions still challenging We need a better way to write parallel software 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
103
Solution: Let the hardware help
LogTM: Log-based Transactional Memory 9/17/2018 Solution: Let the hardware help Provide a better interface for parallel software Plenty of transistors Access to run-time information Transactional Memory Intuitive interface -- serial execution High performance -- run transactions in parallel when possible Current cache coherence schemes already do much of the work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
104
LogTM: Log-based Transactional Memory
9/17/2018 LogTM Overview Hardware Transactional Memory promising Most use lazy version management Old values “in place” New values “elsewhere” Commits slower than aborts But commits more common New LogTM: Log-based Transactional Memory Uses eager version management (like most databases) Old values to log in thread-private virtual memory New values “in place” Makes common commits fast! Also allows cache overflow & software abort handling 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
105
What is Transactional Memory?
LogTM: Log-based Transactional Memory 9/17/2018 What is Transactional Memory? void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } LOCK(s); LOCK(d); UNLOCK(d); UNLOCK(s); Atomic and isolated execution Replaces locks for many applications No lock granularity problem No deadlock Composable synchronization move(a, b, key1); move(b, a, key2); Thread 0 Thread 1 DEADLOCK! 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
106
LogTM: Log-based Transactional Memory
9/17/2018 Single-CMP System L1 $ Core1 L1$ Core2 L1$ Core14 L1$ Core15 L1$ Core16 … Interconnect L2 $ DRAM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
107
LogTM: Log-based Transactional Memory
9/17/2018 Methods Simulated Machine: 32-way non-CMP 32 SPARC V9 processors running Solaris 9 OS 1 GHz in-order processors w/ ideal IPC=1 & private caches 16 kB 4-way split L1 cache, 1 cycle latency 4 MB 4-way unified L2 cache, 12 cycle latency 4 GB main memory, 80-cycle access latency Full-bit vector directory w/ directory cache Hierarchical switch interconnect, 14-cycle latency Simulation Infrastructure Virtutech Simics for full-system function Multifacet GEMS for memory system timing (Ruby only) GPL Release: Magic no-ops instructions for begin_transaction()etc. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
108
Microbenchmark Analysis
LogTM: Log-based Transactional Memory 9/17/2018 Microbenchmark Analysis Shared Counter All threads update the same counter High contention Small Transactions LogTM v. Locks EXP - Test-And-Test-And-Set Locks with Exponential Backoff MCS - Software Queue-Based Locks BEGIN_TRANSACTION(); new_total = total.count + 1; private_data[id].count++; total.count = new_total; COMMIT_TRANSACTION(); 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
109
LogTM: Log-based Transactional Memory
9/17/2018 Shared Counter LogTM (like other HTMs) does not read/write lock LogTM has few aborts despite conflicts 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
110
LogTM: Log-based Transactional Memory
9/17/2018 SPLASH2 Benchmarks Benchmark Input Synchronization Barnes 512 Bodies Locks on tree nodes Cholesky 14 Task queue locks Ocean Contiguous partitions, 258 Barriers Radiosity Room Task queue and buffer locks Raytrace Small image (teapot) Work list and counter locks Raytrace-Opt Water N-Squared 512 Molecules barriers 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
111
SPLASH2 Benchmark Results
LogTM: Log-based Transactional Memory 9/17/2018 SPLASH2 Benchmark Results 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
112
SPLASH2 Benchmark Results
LogTM: Log-based Transactional Memory 9/17/2018 SPLASH2 Benchmark Results Benchmark Transactions % Stalls % Aborts % R-M-W Barnes 3,067 4.89 15.3 27.9 Cholesky 22,309 4.54 2.07 82.3 Ocean 6,693 .30 .52 100 Radiosity 279,750 3.96 1.03 82.7 Raytrace-Base 48,285 24.7 1.24 99.9 Raytrace-Opt 47,884 2.04 .41 Water 35,398 .11 99.6 Conflicts Less Common Aborts Very few aborts (except Barnes) Software implementation practical Stalls more frequent than aborts Waiting can eliminate unnecessary aborts Most trans. data read before written 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
113
LogTM: Log-based Transactional Memory
9/17/2018 LogTM Virtual Memory No limit on transaction size New values stored in place (even in main memory) All-hardware conflict detection using “sticky states” Aborts processed in software New Values Transaction Logs Old Values HPCA LogTM: Log-Based Transactional Memory, Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
114
LogTM: Log-based Transactional Memory
9/17/2018 Nested LogTM Transaction Log Supports closed and open nesting by: Splitting log into “frames” (like a stack of activation records) Replicating R/W bits Escape actions provide non-transactional execution for system calls and I/O Header Level 0 Undo record Undo record Header Level 1 Undo record Undo record ASPLOS Supporting Nested Transactional Memory in LogTM, Michelle J. Moravan, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D. Hill, Ben Liblit, Michael M. Swift and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
115
LogTM-SE: Signature Edition
LogTM: Log-based Transactional Memory 9/17/2018 LogTM-SE: Signature Edition Nested LogTM has several implementation issues Nesting depth limited by hardware Multiple R and W bits per cache block SMT makes this worse Mucks with latency critical L1 cache Not easy to virtualize Decouple conflict detection from L1 cache array Use Signatures to conservatively detect conflicts E.g., Bloom filters Small filters sufficient for most transactions 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
116
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-SE and Nesting Single hardware signature Save current signature on nested begin On conflict, abort inner transaction and reload signature Check if conflict resolved, if not repeat Closed nested commit No change to hardware signature Child merges with parent Open nested commit Restore saved signature from log 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
117
Virtualizing LogTM-SE
LogTM: Log-based Transactional Memory 9/17/2018 Virtualizing LogTM-SE Cache overflow Sticky-states or broadcast coherence Ensures conflict detection Filter (conservatively) checks for conflicts Thread suspension/migration Second hardware signature Summarizes suspended transactions OS manages on scheduling events Paging Pageout checks for (potential) conflict, OS saves state Pagein updates filters with new physical address Skip Other >> 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
118
Characterization of Java Middleware:
LogTM: Log-based Transactional Memory 9/17/2018 Characterization of Java Middleware: ICPP Exploring Processor Design Options for Java Based Middleware HPCA Memory System Behavior of Java-Based Middleware Martin Karlsson, Kevin E. Moore, Erik Hagersten and David A. Wood 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
119
Closed Nesting in LogTM
LogTM: Log-based Transactional Memory 9/17/2018 Closed Nesting in LogTM Conflict Detection Nested LogTM replicates R/W bits for each level Flash-Or circuit merges child and parent R/W bits Version Management Nested LogTM segments the log into frames (similar to a stack of activation records) R W R W Tag Data 1 1 1 1 Data Caches Registers Register Checkpoint LogFrame LogBase TMcount LogPtr Processor 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
120
LogTM: Log-based Transactional Memory
9/17/2018 Hardware State R and W bit per cache line track read and write sets Overflow bit Register checkpoint Fast save/restore Log Base and Log Pointer registers TM nesting count R W Tag Data Overflow Data Cache Registers Register Checkpoint LogBase TMcount LogPtr Processor 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
121
How Do Transactional Memory Systems Differ?
LogTM: Log-based Transactional Memory 9/17/2018 How Do Transactional Memory Systems Differ? Lazy Version Management Eager Version Management Lazy Conflict Detection Eager Conflict Detection Databases with Optimistic Conc. Ctrl. Not done (yet) Stanford TCC UIUC Bulk Databases with Conservative C. Ctrl. Herlihy/Moss TM MIT LTM Intel/Brown VTM MIT UTM Wisconsin LogTM 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
122
Virtualization Challenge
LogTM: Log-based Transactional Memory 9/17/2018 Virtualization Challenge Hardware TM Implementations Finite – Hardware Signatures Mutiplexed – Thread Switching, Virtual Memory LogTM-SE Version Management Transaction Log Virtual Memory Conflict Detection Signatures Physical Addresses Already Virtualized Coming up… 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
123
LogTM: Log-based Transactional Memory
9/17/2018 Open Nesting Child transaction exposes state on commit (i.e., before the parent commits) Raise level of abstraction for isolation and abort Eliminates semantically unnecessary conflicts Increases concurrency Higher-level isolation Release memory-level isolation Programmer enforce isolation at higher level (e.g., locks) Use commit action to release isolation at parent commit Higher-level abort Child’s memory updates not undone if parent aborts Use compensating action to undo the child’s forward action at a higher-level of abstraction E.g., malloc() compensated by free() 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
124
Commit and Compensating Actions
LogTM: Log-based Transactional Memory 9/17/2018 Commit and Compensating Actions Commit Actions Execute when innermost open ancestor commits Outermost transaction is considered open Use to release isolation at higher-level of abstraction Compensating Actions Discard when innermost open ancestor commits Execute in LIFO order when ancestor aborts Execute “in the state that held when its forward action commited” [Moss, TRANSACT ‘06] 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
125
LogTM: Log-based Transactional Memory
9/17/2018 Open Nested Example insert(int key, int value) { open_begin; leaf = find_leaf(key); entry = insert_into_leaf(key, value); // lock entry to isolate node entry->lock = 1; open_commit(abort_action(delete(key)), commit_action(unlock(key))); } insert_set(set S) { while ((key,value) = next(S)) insert(key, value); open_commit(abort_action(delete_set(S))); Isolate entry at higher-level of abstraction Delete entry if ancestor aborts Release high-level isolation on ancestor commit Replace compensating action with higher-level action on commit 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
126
Timing of Compensating Actions
LogTM: Log-based Transactional Memory 9/17/2018 Timing of Compensating Actions // initialize to 0 counter = 0; transaction_begin(); // top-level 1 counter++; // counter gets 1 open_begin(); // level 2 counter++; // counter gets 2 open_commit(abort_action(counter--)); ... // Abort and run compensating action // Expect counter to be restored to 0 transaction_commit(); // not executed LogTM behaves correctly Compensating action sees the state of the counter when the open transaction committed (2) Decrement restores the value to what it was before the open nest executed (1) Undo of the parent restores the value back to (0) TCC doesn’t Counter ends up at 1 Condition O1: No writes to blocks written by an ancestor transaction. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
127
LogTM: Log-based Transactional Memory
9/17/2018 Open Nesting in LogTM Conflict Detection R/W bits cleared on open commit (no flash or) Version Management Open commit pops the most recent frame off the log (Optionally) add commit and compensating action records Compensating actions are run by the software abort handler Software handler interleaves restoration of memory state and compensating action execution 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
128
LogTM: Log-based Transactional Memory
9/17/2018 Open Nested Commit Discard child’s log frame (Optionally) append commit and compensating actions to log Header LogFrame Undo record LogPtr Undo record TM count 1 2 Commit Action Header Comp Action Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
129
LogTM: Log-based Transactional Memory
9/17/2018 Paging Support Why? Support Large Transactions. What? Physical Relocation of Virtual Pages How? Update Signatures on paging activity 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
130
LogTM: Log-based Transactional Memory
9/17/2018 Updating Signatures Suppose: Virtual Page (VP) 0x > Physical Frame(PP) 0x1000 Signature A: {0x1040,0x1080, 0x30c0} At Page Out: Remember 0x40000->0x1000 At Page In: Suppose 0x40000->0x2000 Signature A: {0x1040,0x1080, 0x2040, 0x2080,0x30c0} 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
131
Paging Support Summary
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Paging Support Summary Problem: Changing page frames Need to maintain isolation on transactional blocks Solution: On Page-Out: Save Virtual -> Physical mapping On Page-In: If different page frame, update signatures with physical address of transactional blocks in new page frame. 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 133
132
LogTM: Log-based Transactional Memory
9/17/2018 The State of the World* Chip-multiprocessors/Multi-core/Many-core are here “Intel has 10 projects in the works that contain four or more computing cores per chip” -- Paul Otellini, Intel CEO, Fall ’05 GHz race is over Frequency increase limited by heat and power constraints Size of processor limited by communication delay, not transistors Increasing wire delay on chip All high-performance processors will be CMP Software must become parallel *(in computer architecture) 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
133
Parallel Programming is Hard!
LogTM: Log-based Transactional Memory 9/17/2018 Parallel Programming is Hard! Data races cause subtle bugs Locks are a mess Deadlock Granularity problem Not composable Lock-free solutions still challenging We need a better way to write parallel software 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
134
Solution: Let the hardware help
LogTM: Log-based Transactional Memory 9/17/2018 Solution: Let the hardware help Provide a better interface for parallel software Plenty of transistors Access to run-time information Transactional Memory Intuitive interface -- serial execution High performance -- run transactions in parallel when possible Current cache coherence schemes already do much of the work 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
135
LogTM: Log-Based Transactional Memory
9/17/2018 LogTM: Log-Based Transactional Memory Combined Hardware/Software Implementation Conflicts detected in hardware Aborts processed in software Policy-Free Hardware Simple hardware primitives Software-accessible state Supports Transactions with: Large memory footprints Thread switching Unbounded nesting Paging 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
136
LogTM: Log-based Transactional Memory
9/17/2018 Transactional Memory Promising programming technique: begin_transaction { atomic execution } end_transaction Good first step Likely benefits Can be integrated into current hardware and programming languages Will not save the world 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
137
Nested Transactions for Software Composition
LogTM: Log-based Transactional Memory 9/17/2018 Nested Transactions for Software Composition Modules expose interfaces, NOT implementations Example Insert() calls getID() from within a transaction The getID() transaction is nested inside the insert() transaction void insert(object o){ // parent TX begin_transaction(); t.insert(getID(), o); commit_transaction(); } int getID() { // child TX begin_transaction(); id = global_id++; commit_transaction(); return id; } 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
138
LogTM: Log-based Transactional Memory
9/17/2018 Closed Nesting Child transactions remain isolated until parent commits On Commit child transaction is merged with its parent Flat Nested transactions “flattened” into a single transaction Only outermost begins/commits are meaningful Any conflict aborts to outermost transaction Partial rollback Child transaction can be aborted independently Can avoid costly re-execution of parent transaction But child merges transaction state with parent on commit So most conflicts with child end up affecting the parent 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
139
Thesis: We need new hardware and software
LogTM: Log-based Transactional Memory 9/17/2018 Thesis: We need new hardware and software Architects should devote resources to support parallelism Manycore will succeed only if we find a way to program it (only if software is parallel) Using resources to facilitate parallelism is less risky Hardware Primitives & Software Solutions HW Implements difficult functions Coordinated by SW We should be exploring ways in hardware can 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
140
Segmented Transaction Log for Nesting
LogTM: Log-based Transactional Memory 9/17/2018 Segmented Transaction Log for Nesting LogTM’s log is a stack of frames A frame contains: Header (including saved registers and pointer to parent’s frame) Undo records (block address, old value pairs) Garbage headers (headers of committed closed transactions) Commit action records Compensating action records Header LogFrame Undo record LogPtr Undo record TM count 2 1 Header Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
141
LogTM: Log-based Transactional Memory
9/17/2018 Closed Nested Commit Merge child’s log frame with parent’s Mark child’s header as “dummy header” Copy pointer from child’s header to LogFrame Header LogFrame Undo record LogPtr Undo record TM count 2 1 Header Undo record Undo record 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
142
LogTM: Log-based Transactional Memory
9/17/2018 LogTM-SE Signatures Conflict-detection signatures Summarize read and write sets Similar to Bulk [ISCA 2006] Aliasing is a performance issue Results in false conflicts Rare for current apps Version-management signatures Prevent redundant entries in the log Aliasing is a functional issue Results in incorrect abort Use small full-address filter Some redundant log entries 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
143
LogTM-SE: Unbounded Nesting Support
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 LogTM-SE: Unbounded Nesting Support Why? Composability: libraries Software Constructs: Retry, OrElse [Harris, PPoPP ‘05] What? Signatures for each nesting level How? One R / W signature set per SMT thread Save / Restore signatures using Transaction Log 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 145
144
LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Nested Begin Program Processor State Transaction Log xbegin LD … ST … R W Xact header Undo entry Undo entry TMCount 1 Undo entry Log Frame Xact header Log Ptr 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 146
145
LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 10/25/06 9/17/2018 Nested Begin Program Processor State Transaction Log xbegin LD … ST … R W Xact header Undo entry Undo entry TMCount 2 Undo entry Log Frame Xact header Log Ptr 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 147
146
LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition LogTM: Log-based Transactional Memory 9/17/2018 10/25/06 Partial Abort Program Processor State Transaction Log xbegin LD … ST … ABORT! R W Xact header Undo entry Undo entry TMCount 2 1 Undo entry Log Frame Xact header Log Ptr Undo entry Undo entry 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 148
147
LogTM: Log-based Transactional Memory
LogTM-SE: Log-based Transactional Memory: Signature Edition 9/17/2018 10/25/06 Nested Commit Program Processor State Transaction Log xbegin LD … ST … xend R W Xact header Undo entry Undo entry TMCount 1 2 Undo entry Log Frame Xact header Log Ptr Undo entry Undo entry 9/17/2018 Wisconsin Multifacet Project © 2007 Mulitfacet Project UW-Madison Architecture Seminar 149
148
Unbounded Nesting Support Summary
LogTM: Log-based Transactional Memory LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 9/17/2018 Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Commit: restore signatures 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar © 2007 Mulitfacet Project 150
149
LogTM: Log-based Transactional Memory
9/17/2018 Terminology Transaction: A transformation of state that is: Atomic (all or nothing), Consistent, Isolated (serializable) and Durable (permanent) Commit: Successful completion of a transaction Abort: Unsuccessful termination of a transaction, requiring that all updates from the transaction are undone Conflict:Two transactions conflict if both access the same object and at least one of the accesses is an update 9/17/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.