Download presentation
Presentation is loading. Please wait.
Published byDennis Osborne Modified over 9 years ago
1
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs
2
Hybrid Transactional Memory2 Promise of Transactional Memory (TM) 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance transaction { A = A – 10; B = B + 10; } lock(l1); lock(l2); A = A – 10; B = B + 10; unlock(l1); unlock(l2); Simplify Parallel Programming... if ( error ) abort_transaction;... if ( error ) recovery_code();
3
Intel LabsHybrid Transactional Memory3 Flavors of Transactional Memory 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance Our Work: Efficient support for a TM that supports all these features Basic Support programmer abort Support nonblocking
4
Intel LabsHybrid Transactional Memory4 TM Implementations Requires versioning support and conflict detection Hardware approach [ Herlihy’93 ] Bounded number of locations Maintain versions in cache → Low overhead Pure-software approach [ Herlihy’03, Harris’03 ] Unbounded number of locations can be accessed within a transaction Slow due to overhead of maintaining multiple copies ─ Potentially orders of magnitude Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ] Require significant hardware support Discussed in more detail in the paper
5
Intel LabsHybrid Transactional Memory5 Hardware vs. Software TM Hardware Approach Low overhead Buffers transactional state in Cache More concurrency Cache-line granularity Bounded resource Assembly Within a module Software Approach High overhead Uses Object copying to keep transactional state Less Concurrency Object granularity No resource limits High-level languages Across modules Useful BUT Limited to library writers Useful BUT Limited to special data structures Neither is satisfactory for broader use
6
Intel LabsHybrid Transactional Memory6 This Work A Hybrid Transactional Memory Scheme Requires modest hardware support Changes are localized Supports unbounded number of locations Performance of hardware when within hardware resource limits ( Low Overhead of pure Hardware TM ) Gracefully fall back to software if the hardware resource limits are exceeded ( Unbounded resources of Pure software TM ) Experimentally demonstrate effectiveness of our approach
7
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
8
Intel LabsHybrid Transactional Memory8 ISA Extensions Start of a Transaction Begin Transaction All ( XBA ) or Select ( XBS ) Save Register State ( SSTATE ) Specify handler on abort due to conflict ( XHAND ) During a Transaction Perform memory loads and store Override defaults ( LDX, STX, LDR, STR ) On Transaction Abort Explicit Abort Transaction ( XA ) Restore Register State ( RSTATE ) On Transaction Commit Commit Transaction ( XC )
9
Intel LabsHybrid Transactional Memory9 Baseline CMP Architecture Our proposed changes Modest and Localized Modifications to Core L1 $ No changes to Interconnect Coherence Protocol L2 $ Memory L2 $ Interconnect L1 $ Core
10
Intel LabsHybrid Transactional Memory10 Hardware Support for TM Three requirements: Maintain two versions Detect conflict Same core: Tag Another core: Cache coherence Atomic commit and abort Bounded Capacity of TM $ Associativity of TM $ and L2 Core Regular Accesses Transactional $L1 $ Tag Data Tag Addl. Tag Old Data New Data To Interconnect Transactional Accesses L1 $
11
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Existing pure software scheme Our hybrid scheme Performance Evaluation Conclusions
12
Intel LabsHybrid Transactional Memory12 Pure Software TM [ Herlihy’03 ] We use this Pure Software TM as a starting point Implemented without any special architectural support using two techniques Use copies of objects to keep transactional state ─ Make modifications on the copy during a transaction Add a level of indirection ─ Switch the versions on when a transaction is committed Object Contents Object Pointer Object Contents State Pointer Old New State Valid Copy ActiveOld AbortedOld CommittedNew
13
Intel LabsHybrid Transactional Memory13 Pure Software TM Scheme Cont’d Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X Valid Copy Before accessing an object within a transaction Modify
14
Intel LabsHybrid Transactional Memory14 Our Hybrid Transactional Memory Two modes: Hardware and Software mode The two modes need to coexist Non-solution: Make all threads transition modes in lockstep Avoid versioning overheads (allocation and copying) in the hardware mode Still incur the indirection overheads Tricky because it needs to bridge the hardware and software schemes Hardware mode needs to modify data in-place ─ Pure Software TM assumes data is never modified in-place Different sharing granularity ─ Cache-line (Hardware) vs. Object (Software) Different conflict detection scheme ─ Data (Hardware) vs. State (Software)
15
Intel LabsHybrid Transactional Memory15 Hybrid Scheme Example Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X In the Software Mode Copy and Modify In the Hardware Mode Modify in place Thread 1: HW mode Thread 2: HW mode Thread 3: SW mode Conflict detected by the threads in the hardware mode
16
Intel LabsHybrid Transactional Memory16 Hybrid Scheme Summary Object Contents Object Pointer Object Contents State Pointer Old New State Conflict Detection Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareContentsState SoftwareObject PointerState Sharing Granularity Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareCache lineObject SoftwareObject
17
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
18
Intel LabsHybrid Transactional Memory18 Experimental Framework Infrastructure Cycle-accurate execution-driven Multi-core simulator Modified GCC Three microbenchmarks Two scenarios: Low and High Contention Compare four synchronization implementations Lock Pure Hardware Transactional Memory Pure Software Transactional Memory Hybrid Transactional Memory
19
Intel LabsHybrid Transactional Memory19 Performance Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low
20
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
21
Intel LabsHybrid Transactional Memory21 Conclusions Transactional Memory is a promising approach Makes parallel programming an easier task Easier to achieve parallel speedup Hybrid Transactional Memory approach works Requires only modest hardware support Common case: Good performance for most transactions Uncommon case: Graceful fallback to software mode when a transaction cannot complete within the hardware bounds
22
Questions ?
23
Intel LabsHybrid Transactional Memory23 Transactions A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks) Transaction: A group of operations on shared data Transaction { A = A – 10; B = B + 10;... if (error) abort_transaction; } An API Enhancement: 1. Abort in middle of a transaction o On encountering a error
24
Intel LabsHybrid Transactional Memory24 Transactional Memory (TM) A transaction satisfies the following properties 1) Atomicity: All-or-nothing On Commit: all operations become visible On Abort: none of the operations are performed 2) Isolation (Serializable) The transactions committed appear to have been performed in some serial order Additional Properties 3) Optimistic concurrency control Necessary for achieving good parallel speedup 4) Non-blocking (Optional) Avoid Priority Inversion Avoid Convoying
25
Intel LabsHybrid Transactional Memory25 Advantage 1: Performance Locks A B L1 A C D Serialized on Locks Finer granularity locks helps Burden on programmer Transactions A B C D Optimistically execute concurrently Abort and restart on data conflict Automatically done by runtime AA Data Conflict
26
Intel LabsHybrid Transactional Memory26 Advantage 2: Reduces Bugs With locks, programmers need to Remember mapping between shared data and locks that guard them ─ Make sure the appropriate locks are held while accessing shared data Make lock granularity as small as possible Avoid deadlocks due to locks All of these can cause subtle bugs With TM, programmer does not have to deal with these problems
27
Intel LabsHybrid Transactional Memory27 Other Advantages Allows new programming paradigms Simplifies error handling A new style of programming: Speculate and Verify Programmer can abort offending transactions Avoids other problems that locks suffer from Priority Inversion: A low-priority thread can grab a lock and block a higher-priority thread Convoying: If a thread holding a lock blocks on a high-latency event (like context-switch or I/O), it can cause other threads to wait for long periods Fault Tolerant: If a process holding a lock dies, other processes will hang forever Runtime system can abort offending transactions
28
Intel LabsHybrid Transactional Memory28 Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low
29
Intel LabsHybrid Transactional Memory29 ABCDEF Abcdef Ghijk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.