1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.
1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
Cache Optimization Summary
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 10: TM Pathologies Topics: scalable lazy implementation, paper on TM performance pathologies.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Eager Transactional Memory Topics: implementation details of eager TM, various TM pathologies.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
1 Lecture 6: TM – Eager Implementations Topics: Eager conflict detection (LogTM), TM pathologies.
Scalable, Reliable, Power-Efficient Communication for Hardware Transactional Memory Seth Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev.
1 Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of “lazy” TM.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 5: TM – Lazy Implementations Topics: TM design (TCC) with lazy conflict detection and lazy versioning, intro to eager conflict detection.
1 Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 9: TM Implementations Topics: wrap-up of “lazy” implementation (TCC), eager implementation (LogTM)
1 Lecture 7: Lazy & Eager Transactional Memory Topics: details of “lazy” TM, scalable lazy TM, implementation details of eager TM.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis.
Distributed Deadlocks and Transaction Recovery.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
1 Lecture 10: Transactional Memory Topics: lazy and eager TM implementations, TM pathologies.
Lecture 8: Snooping and Directory Protocols
Lecture 20: Consistency Models, TM
Cache Coherence: Directory Protocol
Multiprocessor Cache Coherency
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Lecture 19: Transactional Memories III
Lecture 9: Directory-Based Examples II
Lecture 2: Snooping-Based Coherence
Lecture 11: Transactional Memory
Lecture 12: TM, Consistency Models
Lecture: Transactional Memory, Networks
Lecture 6: Transactions
Lecture 17: Transactional Memories I
Lecture 21: Transactional Memory
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
Lecture 22: Consistency Models, TM
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 10: Consistency Models
Lecture 4: Synchronization
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture 3: Coherence Protocols
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
Lecture 18: Coherence and Synchronization
Lecture 10: Directory-Based Examples II
Lecture 11: Consistency Models
Presentation transcript:

1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation

2 Eager Overview Topics: Logs Log optimization Conflict examples Handling deadlocks Sticky scenarios Aborts/commits/parallelism

3 “Eager” Implementation (Based Primarily on LogTM) A write is made permanent immediately (we do not wait until the end of the transaction) This means that if some other transaction attempts a read, the latest value is returned and the memory may also be updated with this latest value Can’t lose the old value (in case this transaction is aborted) – hence, before the write, we copy the old value into a log (the log is some space in virtual memory -- the log itself may be in cache, so not too expensive) This is eager versioning

4 Versioning Every write first requires a read and a write to log the old value – the log is maintained in virtual memory and will likely be found in cache Aborts are uncommon – typically only when the contention manager kicks in on a potential deadlock; the logs are walked through in reverse order If a block is already marked as being logged (wr-set), the next write by that transaction can avoid the re-log Log writes can be placed in a write buffer to reduce contention for L1 cache ports

5 Conflict Detection and Resolution Since Transaction-A’s writes are made permanent rightaway, it is possible that another Transaction-B’s rd/wr miss is re-directed to Tr-A At this point, we detect a conflict (neither transaction has reached its end, hence, eager conflict detection): two transactions handling the same cache line and at least one of them does a write One solution: requester stalls: Tr-A sends a NACK to Tr-B; Tr-B waits and re-tries again; hopefully, Tr-A has committed and can hand off the latest cache line to B  neither transaction needs to abort

6 Deadlocks Can lead to deadlocks: each transaction is waiting for the other to finish Need a separate (hw/sw) contention manager to detect such deadlocks and force one of them to abort Tr-A Tr-B write X write Y … … read Y read X Alternatively, every transaction maintains an “age” and a young transaction aborts and re-starts if it is keeping an older transaction waiting and itself receives a nack from an older transaction

7 Block Replacement If a block in a transaction’s rd/wr-set is evicted, the data is written back to memory if necessary, but the directory continues to maintain a “sticky” pointer to that node (subsequent requests have to confirm that the transaction has committed before proceeding) The sticky pointers are lazily removed over time (commits continue to be fast)

8 Scalable Algorithm – Lazy Implementation Probe your write-set to see if it is your turn to write (helps serialize writes) Let others know that you don’t plan to write (thereby allowing parallel commits to unrelated directories) Mark your write-set (helps hide latency) Probe your read-set to see if previous writes have completed Validation is now complete – send the actual commit message to the write set

9 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 1

10 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 1 TID=1 TID=2

11 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 1NS: 3 TID=1 TID=2 Probe to write-set to see if it can proceed No writes here. I’m done with you. P2 sends the same set of probes/notifications

12 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 1NS: 3 TID=1 TID=2 Must wait my turn Can go ahead with my wr Mark X Mark messages are hiding the latency for the subsequent commit

13 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 1NS: 3 TID=1TID=2 Keep probing and waiting Probe read set and make sure they’re done

14 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 2NS: 3 TID=1TID=2 Keep probing and waiting Commit Invalidate sharers; May cause aborts

15 Example P1 D: X Z P2 D: Y TID Vendor Rd X Wr X Rd Y Wr Z NS: 2NS: 3 TID=1TID=2 P2: Probe finally successful. Can mark Z. Will then check read-set. Then proceed with commit

16 Title Bullet