Transactional Memory Coherence and Consistency

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Lecture 9: TM Implementations Topics: wrap-up of “lazy” implementation (TCC), eager implementation (LogTM)
1 Lecture 7: Lazy & Eager Transactional Memory Topics: details of “lazy” TM, scalable lazy TM, implementation details of eager TM.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
1 Recovery Control (Chapter 17) Redo Logging CS4432: Database Systems II.
Input / Output CS 537 – Introduction to Operating Systems.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
CS267 Lecture 61 Shared Memory Hardware and Memory Consistency Modified from J. Demmel and K. Yelick
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Multiprocessors – Locks
COSC6385 Advanced Computer Architecture
Presented by: Nick Kirchem Feb 13, 2004
Cache Coherence: Directory Protocol
Cache Coherence: Directory Protocol
Transactional Memory : Hardware Proposals Overview
Lecture 21 Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
A Study on Snoop-Based Cache Coherence Protocols
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Cache Coherence Protocols:
Cache Coherence Protocols:
CMSC 611: Advanced Computer Architecture
Lecture 21: Synchronization and Consistency
Lecture: Coherence and Synchronization
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
CS 213 Lecture 11: Multiprocessor 3: Directory Organization
Lecture 25: Multiprocessors
Lecture 10: Consistency Models
High Performance Computing
Lecture 25: Multiprocessors
Lecture 26: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Lecture 23: Transactional Memory
Lecture: Coherence and Synchronization
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
University of Wisconsin-Madison Presented by: Nick Kirchem
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Transactional Memory Coherence and Consistency Presented to CS258 on 4/28/08 by David McGrogan

Review: Multiproc configurations Message passing Simple hardware Tricky to program Shared memory Complex hardware No implicit synchronization Sometimes easier to program

Transactional memory Coherence Something new: TCC Transactional memory Coherence and Consistency model Provides transactional shared memory Requires specialty hardware Relies on broadcast and snooping Not infinitely scalable

Transactional memory Coherence Something new: TCC Transactional memory Coherence and Consistency model Implements CCR (Conditional Critical Regions): atomic (condition) { statements; }

TCC protocol Writes in a transaction stored locally System-wide commit arbitration All writes broadcasted as a packet Different commits cannot overlap Transactions can be ordered via hardware-managed ‘phase numbers’

TCC hardware Extra cache bits Write buffer Commit control Optional: Shadow registers Victim buffer

TCC hardware Cache bits Read (can be multiple per line) Modified Renamed (multiple) – entire associated word/byte has been written, so future reads can’t cause violations

TCC hardware Write buffer Commit control Hardware-based packet buffer Lets processor do more important things Commit control Stores phase numbers for all other nodes Reduces arbitration request traffic greatly

Optional TCC hardware Shadow registers Victim buffer Copy core registers upon transaction start Permits rollback Can be done in software Victim buffer Holds flushed read/modified cache lines If absent, must wait for and hold commit permission until transaction commits

Errata I/O Hardware optimizations Input can only be accepted if rollback is impossible Commit permission requested immediately Hardware optimizations Break up large transactions Merge small transactions Force forward progress by increasing phase

Cache state size Most benchmarks operate within 12 kB of read state and 8 kB of write state Easily fits in the smallest of modern caches

Performance (ideal) Limited by sequential code, load imbalance, inter-transaction dependencies

Performance Errata More realistic simulations Viability retained in most cases Sometimes crippled by very high commit arbitration times (200 cycles) Various optimizations had little effect Double-buffering write buffer Extra read bits (sub-line read records)