Lecture 17: Transactional Memories I

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
Cache Optimization Summary
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
Scalable, Reliable, Power-Efficient Communication for Hardware Transactional Memory Seth Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev.
1 Lecture 5: TM – Lazy Implementations Topics: TM design (TCC) with lazy conflict detection and lazy versioning, intro to eager conflict detection.
1 Lecture 9: TM Implementations Topics: wrap-up of “lazy” implementation (TCC), eager implementation (LogTM)
1 Lecture 7: Lazy & Eager Transactional Memory Topics: details of “lazy” TM, scalable lazy TM, implementation details of eager TM.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
The University of Adelaide, School of Computer Science
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Lecture 8: Snooping and Directory Protocols
Lecture 20: Consistency Models, TM
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Multiprocessor Cache Coherency
Ivy Eva Wu.
Lecture 19: Transactional Memories III
Lecture 2: Snooping-Based Coherence
Lecture 11: Transactional Memory
Lecture 5: Snooping Protocol Design Issues
Lecture: Consistency Models, TM
Lecture 6: Transactions
Lecture 21: Transactional Memory
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
11 – Snooping Cache and Directory Based Multiprocessors
Lecture 22: Consistency Models, TM
Lecture: Consistency Models, TM
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 10: Consistency Models
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
Lecture 26: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 19: Coherence Protocols
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Coherence and Synchronization
Lecture: Consistency Models, TM
Lecture 19: Coherence and Synchronization
Lecture: Transactional Memory
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Lecture 17: Transactional Memories I Papers: A Scalable Non-Blocking Approach to Transactional Memory, HPCA’07, Stanford The Common Case Transactional Behavior of Multi-threaded Programs, HPCA’06, Stanford Characterization of TCC on Chip Multiprocessors, PACT’05, Stanford

TM Overview Recall the basic TM implementation: Every transaction maintains read-set and write-set Writes are not propagated until commit time At commit, acquire permission to commit from a central arbiter (no parallel commit for now) Send write-set to other nodes – if the write-set intersects with the node’s read-set, the node’s transaction is aborted and re-started

Implementation Issues Design details: On attempting a write (the Tx is not yet ready to commit), must obtain the most recently committed version of the block in read-only state (the block is not yet part of the read-set, though) At the time of commit, either write-thru to memory (there is no M state in the coherence protocol), or move from S to M state (write-back policy) (a dirty line can’t handle speculative writes, though) For parallel commits: is an ordering implied between these transactions? is a write-set/read-set conflict allowed? is a read-set/write-set conflict allowed? is a write-set/write-set conflict allowed?

Parallel Commits I Ordering is implied: a programmer believes that a transaction executes “in isolation” Write-set/Read-set conflict should cause an abort: hence, the second transaction must: confirm there is no conflict before propagating writes or propagate writes in a manner that does not affect correctness (can’t employ write-thru or write-update, can’t respond to others’ read requests / or must keep track of dependences) Read-set/Write-set conflict need not cause an abort: the ordering should indicate that the first transaction need not abort Write-set/Write-set conflict need not cause an abort: a mechanism is required to ensure that everyone sees writes in the same correct order * Reading your own write is truly not a problem, but since info is maintained at block granularity, the block is included in the read-set

Parallel Commits II Conflicting writes must be merged correctly (note that the writes may be to different words in the same line) If we’re not checking early for Write-set/Read-set conflicts, the first transaction must inform the next transaction after it is done (received all acks) Consistency model: two parallel transactions are sending their write-sets to other nodes over unordered networks: Just as we saw for SC, a reader must not proceed unless everyone has seen the write (the entire transaction need not have committed) Writes to a location must be serialized

The Stanford Approach Transactions are ordered (by contacting a central agent) A transaction first engages in validation and proceeds with commit only if it is guaranteed to not have any conflicts Write-set/Read-set conflicts are not allowed (Ti checks its read-set directories and proceeds only after those directories have seen previous writes) Read-set/Write-set conflicts are allowed (Ti will ignore write notices from transactions >i) Write-set/Write-set conflicts are allowed To maintain write serialization, Ti confirms that a directory is done with all transactions <i

Algorithm Probe your write-set to see if it is your turn to write (helps serialize writes) Let others know that you don’t plan to write (thereby allowing parallel commits to unrelated directories) Mark your write-set (helps hide latency) Probe your read-set to see if previous writes have completed Validation is now complete – send the actual commit message to the write set

Example TID Vendor P1 P2 Rd X Wr X Rd Y Wr Z NS: 1 NS: 1 D: X Z D: Y

Example TID Vendor TID=2 P1 P2 Rd X Wr X TID=1 Rd Y Wr Z NS: 1 NS: 1 D: X Z D: Y

Example TID Vendor TID=2 P1 P2 TID=1 Rd X Wr X Rd Y Wr Z Probe to write-set to see if it can proceed No writes here. I’m done with you. NS: 1 NS: 3 D: X Z D: Y P2 sends the same set of probes/notifications

Example Must wait my turn Can go ahead with my wr TID Vendor TID=2 P1 Rd X Wr X Rd Y Wr Z Mark X NS: 1 NS: 3 D: X Z D: Y Mark messages are hiding the latency for the subsequent commit

Example Keep probing and waiting TID Vendor TID=1 TID=2 P1 P2 Rd X Wr X Rd Y Wr Z Probe read set and make sure they’re done NS: 1 NS: 3 D: X Z D: Y

Example Keep probing and waiting TID Vendor TID=1 TID=2 P1 P2 Rd X Wr X Rd Y Wr Z Commit NS: 2 Invalidate sharers; May cause aborts NS: 3 D: X Z D: Y

Example TID Vendor TID=1 TID=2 P1 P2 Rd X Wr X Rd Y Wr Z NS: 2 NS: 3 D: X Z D: Y P2: Probe finally successful. Can mark Z. Will then check read-set. Then proceed with commit

Algorithm Probe your write-set to see if it is your turn to write (helps serialize writes) Let others know that you don’t plan to write (thereby allowing parallel commits to unrelated directories) Mark your write-set (helps hide latency) Probe your read-set to see if previous writes have completed Validation is now complete – send the actual commit message to the write set

Issues The protocol requires many messages: enables latency hiding for the commit process, though There may be high directory locality for a NUMA machine, but possibly not for a single large-scale CMP with a directory-based cache coherence protocol To support a write-back cache policy, a new non-speculative cache is introduced (so that a transaction does not speculatively over-write a dirty line)

Evaluation

Results Applications with small transactions suffer more from commit latency

Results

Transaction Characteristics An evaluation of 35 multi-threaded programs Transaction length Sizes of read and write sets

Transaction Characteristics

Title Bullet