Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware Transactional Memory

Similar presentations


Presentation on theme: "Hardware Transactional Memory"— Presentation transcript:

1 Hardware Transactional Memory
Eyal Widder Nimrod Reiss Instructor: Yehuda Afek Tel Aviv University 10/06/2007

2 References Thread-Level Transactional Memory
Kevin E. Moore, Mark D. Hill & David A. Wood [2005] LogTM : Log-based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill & David A. Wood [2006]

3 Outline Locks Vs. Transactional Memory Introduction to LogTM
LogTM Version Management LogTM Conflict Detection Conclusions

4 The Challenge of Multithreaded SW
Goal: Parallelization Problem: Unrestricted concurrency  bugs Solution: Synchronization New problem: Synchronization Tension between performance and correctness

5 Current Mechanism: Locks
Locks: objects only one thread can hold at a time Organization: lock for each shared structure Usage: (block)  acquire  access  release Correctness issues Under-locking  data races Acquires in different orders  deadlock Performance issues Conservative serialization Overhead of acquiring Difficult to find right granularity

6 Transactions vs. Locks Locks  simplicity/performance tension
Lock issues: Under-locking Acquires in different orders Blocking Conservative serialization How transactions help: Simpler interface No ordering Can cancel transactions Serialization only on conflicts Locks  simplicity/performance tension Transactions  (potentially) simple and efficient

7 Transaction Semantics - ACI Properties
Atomicity – All or Nothing Consistency – Correct at beginning and end Isolation – Partially done work not visible to other threads

8 Thread-Level Transactional Memory
Separate semantics from implementation Adapt DBMS(database management systems) concepts: Concurrency control algorithms Conflict detection Taking the appropriate action (commit\abort\delay) Main challenge: Reduce the overhead of enforcing the ACI properties!

9 Basic Idea Module TM like virtual memory: A thread level abstraction
Use 3 types of interfaces – User, System\Library, Low-level An interface independent of implementation Combine HW and SW in implementation

10 How Do Transactional Memory Systems Differ?
(Data) Version Management Keep old values for abort AND new values for commit Eager: record old values “elsewhere”; update “in place” Lazy: update “elsewhere”; keep old values “in place” (Data) Conflict Detection Find read-write, write-read or write-write conflicts among concurrent transactions Eager: detect conflict on every read/write Lazy: detect conflict at end (commit/abort)  Fast commit  Less wasted work

11 Outline Locks Vs. TM Introduction to LogTM LogTM Version Management
LogTM Conflict Detection Conclusions

12 Log Based Transactional Memory – LogTM
(Hardware) Transactional Memory promising Most use lazy version management Old values “in place” New values “elsewhere” Commits slower than aborts New LogTM: Log-based Transactional Memory Uses eager version management (like most databases) Old values to log in thread-private virtual memory New values “in place” Makes common commits fast! Hardware traps to Software handler Aborts handled in software

13 Outline Locks Vs. TM Introduction to LogTM LogTM Version Management
LogTM Conflict Detection Conclusions

14 LogTM’s Eager Version Management
Old values stored in the transaction log A per-thread linear (virtual) address space (like the stack) Filled by hardware (during transactions) Read by software (on abort) New values stored “in place”

15 Transaction Log Example
VA Data Block R W Initial State LogBase = LogPointer TM count > 0 00 40 Where is all the extra data stored? (R, W bits, Log Base and Log Ptr and TM count) C0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM count 1

16 Transaction Log Example
Store r2, (c0) /* r2 = 56 */ Set W bit for block (c0) Store address (c0) and old data on the log Increment Log Ptr to 1048 Update memory VA Data Block R W 00 40 C0 1 1000 c0 Log Base 1000 1040 -- Log Ptr 1000 1048 1080 TM count 1

17 Transaction Log Example
VA Data Block R W Commit transaction Clear R & W for all blocks Reset Log Ptr to Log Base (1000) Clear TM count 00 1 40 C0 1000 c0 Log Base 1000 1040 -- Log Ptr 1048 1000 1080 TM count 1

18 Transaction Log Example
Abort transaction Replay log entries to “undo” the transaction Reset Log Ptr to Log Base (1000) Clear R & W bits for all blocks Clear TM count VA Data Block R W 00 1 40 C0 1000 -- c0 Log Base 1000 1040 Log Ptr 1048 1000 1048 1080 TM count 1

19 Eager Version Management Discussion
Advantages: Fast Commits No copying Common case Disadvantages: Slow/Complex Aborts Undo aborting transaction Relies on Eager Conflict Detection/Prevention

20 Outline Locks Vs. TM Introduction to LogTM LogTM Version Management
LogTM Conflict Detection Conclusions

21 LogTM’s Eager Conflict Detection
Requesting processor sends a coherence request to the directory. The directory responds and possibly forwards the request to one or more processors. Each responding processor examines some local state to detect a conflict. The responding processors each ack or nack the request. The requesting processor resolves any conflict. What is – “Most Hardware TM Leverage Invalidation Cache Coherence” What is - e.g., Writer seeks M copy, seeks S copies, & finds R bit set

22 Conflict Detection Validation is retained by using the R,W bits and the directory MOESI states. A “Sticky State” is used to detect possible conflicts from overflows

23 Conflict Detection (example)
P0 store P0 sends get exclusive (GETX) request Directory responds with data (old) P0 executes store Directory GETX I [old] [old] DATA P0 P1 TM mode 1 TM mode Overflow Overflow I (--) [none] M (--) [old] M (-W) [new] I (--) [none]

24 Conflict Detection (example)
In-cache transaction conflict P1 sends get shared (GETS) request Directory forwards to P0 P1 detects conflict and sends NACK Directory Fwd_GETS [old] GETS P0 P1 TM mode 1 TM mode Overflow Overflow M (-W) [new] M (-W) [new] I (--) [none] Conflict! NACK

25 Conflict Detection (example)
Cache overflow P0 sends put exclusive (PUTX) request Directory acknowledges P0 sets overflow bit P0 writes data back to memory Directory PUTX [new] [old] ACK DATA P0 P1 TM mode 1 TM mode Overflow Overflow 1 I (--) [none] M (-W) [new] I (--) [none]

26 Conflict Detection (example)
Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible) conflict P0 sends NACK Directory [new] [old] GETS Can this be a false NACK? Fwd_GETS P0 P1 TM mode 1 TM mode Overflow Overflow 1 1 I (--) [none] M (--) [old] I (--) [none] M (-W) [new] I (--) [none] NACK Conflict!

27 Conflict Detection (example)
Commit P0 clears TM mode and Overflow bits Directory [old] [new] P0 P1 TM mode 1 TM mode Overflow Overflow 1 I (--) [none] I (--) [none] M (--) [old] M (-W) [new] I (--) [none]

28 Conflict Detection (example)
Lazy cleanup P1 sends GETS request Directory forwards request to P0 P0 detects no conflict, sends CLEAN Directory sends Data to P1 Directory Fwd_GETS [new] S(P1) [new] CLEAN GETS DATA P0 P1 TM mode TM mode Overflow Overflow I (--) [none] M (--) [old] I (--) [none] M (-W) [new] S (--) [new] I (--) [none]

29 LogTM’s Conflict Detection w/ Cache Overflow
At overflow at processor P Set P’s overflow bit (1 bit per processor) Allow writeback, but set directory state to At transaction end (commit or abort) at processor P Reset P’s overflow bit At (potential) conflicting request by processor R Directory forwards R’s request to P. P tells R “no conflict” if overflow is reset But asserts conflict if set (w/ small chance of false positive)

30 Conflict Resolution Conflict Resolution
Can wait risking deadlock Can abort risking livelock Wait/abort transaction at requesting or responding proc? LogTM resolves conflicts at requesting processor Requesting processor waits (using coherence nacks/retries) But aborts if other processor is waiting (deadlock possible) & it is logically younger (using timestamps) Future: Requesting processor traps to software contention manager that decides who waits/aborts

31 Outline Locks Vs. TM Introduction to LogTM LogTM Version Management
LogTM Conflict Detection Conclusions

32 Conclusion Commits are far more common than aborts
Conflicts are rare Most conflicts can be resolved w/o aborts Software aborts do not impact performance Overflows are rare (in current benchmarks) LogTM Eager Version Management makes the common case (commit) fast Sticky States/Lazy Cleanup detects conflicts outside the cache (if overflows are infrequent)

33 QUESTIONS?

34 Break Time!

35 References LogTM : Log-based Transactional Memory
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill & David A. Wood [2006] Supporting Nested Transactional Memory in LogTM Michelle J. Moravam, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D. Hill, Ben Liblit, Michael M. Swift & David A. Wood [2006]

36 Motivation Till now: Transactional Memory promises lock-free atomic, consistent and isolated execution. But what should occur when a transaction executes another transaction within ?

37 LogTM enables flattening
In the last lecture we’ve introduced LogTM which enables subsuming inner transactions into the top-level transaction. A counter is used to count the nesting level, Transaction_begin() increments and Transaction_end() decrements. A conflict on an inner transaction may cause a complete abort to the beginning of the top-level one.

38 Challenges in nesting transactions
Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems.

39 Facilitating Software Composition
Calling modules that use locks within requires caller knowledge of internal module implementation details. In order to aid modular programming, transactional memory should support nesting.

40 Challenges in nesting transactions
Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems.

41 Enhancing Concurrency
Closed nesting does not eliminate all problems posed by modular software. Concurrency is limited by maintaining isolation until the top-level transaction commits.

42 Example How would you do it differently ? P2 P1 Transaction L
Transaction T Transaction L conflict conflict Transaction S Ask guiding question. Emphasize that atomicity must be relaxed in order to release isolation. pNextFree Transaction S How would you do it differently ? Ideally S should release pNextFree so that other transactions can access the allocator without conflicting with transaction L.

43 Challenges in nesting transactions
Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems.

44 Escaping to non-transactional systems.
Many TM systems will run on top of non-transactional base systems that may include: Runtime libraries Operation systems Language virtual machines (e.g. JVM) STMs handle such escapes easily. An escape to non-transactional system must disable HTM mechanisms to allow correct operation. Allow Inter-Transaction / Device communication. HTM implement low level mechanisms which operate lower than the OS (or other runtime library). STM implement those mechanisms in a higher abstraction level  easier to cope with escape actions.

45 Outline Motivation and challenges. Closed vs. Open Nesting.
Nested LogTM. Supporting Closed Nesting. Partial aborts. Supporting Open Nesting. Abort actions / Commit actions. Condition O1. Escape actions. Conclusions.

46 Closed vs. Open nesting Closed Nested Transactions extends isolation of an inner transaction until the top-level transaction commits. Open Nested Transactions allow committing inner transaction to immediately release isolation.

47 Closed Nested Transactions
May flatten transactions into the top-level one (as we’ve already seen) . May allow partial roll-back.

48 Open Nested Transactions
Increase concurrency and expressiveness. May increase both SW & HW complexity. Higher-level atomicity Child’s memory updates not undone if parent aborts Use abort action to undo the child’s forward action at a higher-level of abstraction E.g., malloc() compensated by free() Higher-level isolation Release memory-level isolation Programmer enforce isolation at higher level (e.g., locks) Use commit action to release isolation at parent commit

49 Outline Motivation and challenges. Closed vs. Open Nesting.
Nested LogTM. Supporting Closed Nesting. Partial aborts. Supporting Open Nesting. Compensating actions / commit actions. Condition O1. Escape actions. Conclusions.

50 Nested LogTM  Nested LogTM extends Flat LogTM (last lecture).
Splits the log into “frames”. Header contains Frame Pointer to the parent’s Header. Header contains register checkpoint. Header Undo record Undo record Header Log Frame Undo record Level 1 Undo record Log Ptr

51 Nested LogTM  Replicates R/W bits.
Maintains a separate Read set, Write set for each nesting level. Use constant (k) number of R/W sets, and flatten transactions whose nesting level is bigger than k.

52 If ( 1 < curr_level ≤ k) :
Closed Nested LogTM On Commit : Top Level Transactions commit normally. t If ( 1 < curr_level ≤ k) : Merge the current log frame with parent’s. “Flash – OR” R/W bits of curr_level – 1 with curr_level ‘s. Decrement curr_level . Otherwise: Merge the current log frame with parent’s. Decrement curr_level .

53 Closed Nested LogTM Conflict detection : An incoming read from memory location m conflicts with another thread’s level j transaction if j is the minimal level where block(m)’s Write bit is set. An incoming write to memory location m conflicts with another thread’s level j transaction if j is the minimal level where block(m)’s Write or Read bit is set.

54 Closed Nested LogTM On Abort :
An abort of the current transaction at curr_level traps to a software handler. Suppose the transaction aborts for a conflict in abort_level transaction. The software handler walks the log frame backwards and undoes curr_level – abort_level + 1 log frames. Finally it restores the register state save in header.

55 a = 2; b = 4; c = 6; // Initialize
Log frame pointer end pointer 2, a garbage header 6, c 4, b // thread i at level 0 (Non-transactional) a = 2; b = 4; c = 6; // Initialize transaction_begin() // top-level (level 1) a = b + 1; // a gets 5. transaction_begin(); // level 2 c = b – 3; // c gets 1. b = a + 2; // b gets 7. a = c + 7; // a gets 8. transaction_commit(); // level 2. transaction_commit(); // level 1. 5, a Cache Var R1 R2 W1 W2 Val 1 0 8 7 1 a b c Cache Var R1 R2 W1 W2 Val 1 0 0 0 0 1 5 4 1 1 1 a b c Cache Var R1 R2 W1 W2 Val 1 0 0 0 5 4 6 a b c Cache Var R1 R2 W1 W2 Val 0 0 2 4 6 a b c Cache Var R1 R2 W1 W2 Val 1 0 0 1 5 7 1 1 1 0 0 a b c Cache Var R1 R2 W1 W2 Val 1 1 0 1 8 7 1 a b c TBD : Remove partial abort comment ? 2 Level 1 Level Level

56 Supporting Open Transactions
When an open nested transaction Topen at level j commits: Its frame is discarded from the log. R/W bits for level j are cleared. (Optionally) Append commit and abort action records, Copen and Aopen to the newly exposed end of Topen’s parent’s frame.

57 Commit and Abort Actions
To ensure consistency, open nested transactions must raise the abstraction level of both isolation and rollback. Commit actions are executed in FIFO order while Abort actions are executed in LIFO order. Orders - It promises that the memory will be in the same snapshot as before the transaction committed and registered it’s actions.

58 a = 2; b = 4; c = 6; // Initialize
Log frame pointer end pointer 2, a Aopen 6, c 4, b // thread i at level 0 (Non-transactional) a = 2; b = 4; c = 6; // Initialize transaction_begin() // top-level (level 1) a = b + 1; // a gets 5. transaction_begin(); // level 2 c = b – 3; // c gets 1. b = a + 2; // b gets 7. a = c + 7; // a gets 8. transaction_commit(); // level 2. transaction_commit(); // level 1. 5, a Cache Var R1 R2 W1 W2 Val 1 0 0 0 8 7 1 a b c Cache Var R1 R2 W1 W2 Val 1 1 0 1 8 7 1 a b c TBD : Remove partial abort comment ? 1 Level 2 Level

59 Condition O1 No Writes to Data Written by Ancestors
Neither an open transaction Topen nor its commit and abort actions, Copen and Aopen writes any data written by Topen’s ancestors.

60 Example counter = 0; // initialize
transaction_begin ( ) ; // top-level counter++; // counter gets 1. open_begin ( ) ; // level 2 counter++; // counter gets 2. // commit with an abort action. open_commit ( abort_action( decr(counter) ) ); ….. // Abort and run abort action // Expect counter to be 0. …. transaction_commit(); // not executed. Example O1 Violation In nested logTM this example will work as expected But in Other implementations (that does not do eager updates) it might fail…

61 Escape Actions “Real world” is not transactional
Current OS’s are not transactional Systems should allow non-transactional escapes from a transaction Interact with OS, VM, devices, etc.

62 Escape Actions – First Class
Keep a per-thread “Escape” bit. Escape Actions read most recent values from memory (Even uncommitted). Escape Actions never aborts or stalls. Similar to Open Transaction, an escape action may register Commit/Abort actions. Commit/Abort actions of an escape action run in escape mode.

63 Conclusions Closed Nesting is easy to implement, and may allow partial rollback to improve efficiency. Open Nesting improves concurrency in cost for higher level atomicity and isolation and the complexity of software implementation. Using open nesting it is possible to provide non-transactional operations inside transactions.

64 QUESTIONS?

65 The End 10/06/2007


Download ppt "Hardware Transactional Memory"

Similar presentations


Ads by Google