Cache Coherence Protocols:

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Multiprocessor Cache Coherency
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Additional Material CEG 4131 Computer Architecture III
The University of Adelaide, School of Computer Science
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
Lecture 8: Snooping and Directory Protocols
Lecture 20: Consistency Models, TM
COSC6385 Advanced Computer Architecture
COMP 740: Computer Architecture and Implementation
תרגול מס' 5: MESI Protocol
Computer Engineering 2nd Semester
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
A Study on Snoop-Based Cache Coherence Protocols
Cache Coherence for Shared Memory Multiprocessors
12.4 Memory Organization in Multiprocessor Systems
Multiprocessor Cache Coherency
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
The University of Adelaide, School of Computer Science
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Lecture 9: Directory-Based Examples II
Cache Coherence Protocols:
Cache Coherence (controllers snoop on bus transactions)
Lecture 2: Snooping-Based Coherence
James Archibald and Jean-Loup Baer CS258 (Prof. John Kubiatowicz)
CMSC 611: Advanced Computer Architecture
Interconnect with Cache Coherency Manager
Lecture 5: Snooping Protocol Design Issues
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
High Performance Computing
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CSE 486/586 Distributed Systems Cache Coherence
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Cache Coherence Protocols:

What is Cache Coherence? When one Core writes to its own cache the other core gets to see it, when they read it out of its own cache. Provides underlying guarantees for the programmer with respect to data validation. Even one large L1 Cache per core will not be able to update itself fast enough to processor requests. Less throughput

Cache Coherence: Do we need it?

Coherence Property - I: Read R from Address X on Core C0 returns the value written by the most recent write W on X on C0, if no other core has written to X between W and R.

Coherence Property - II If C0 writes to X and C1 reads after a sufficient time and there are no other writes in between, then C1’s read returns the value from C0’s write.

Coherence Property –III: Writes to the same location are serialized: Any 2 OR multiple writes to X must be seen to occur in the same order on all Cores.

How to get Cache Coherence? No Caches (Bad Performance) All Cores share the same L1 Cache (Bad Performance) Force Read in One Cache to see Write made in another: Broadcasts writes to update other caches (Write Update Coherence)

Without Write Update Snooping Coherence(Initial issue):

Write Update Snooping (Issue Resolved)– II Snooping: Cache 0 monitors the write of 1 in A through the bus Update: When Write is seen, the value is updated in relevant Core’s Cache having Memory block A

Multiple Writes maintains synchronized(Broadcast):

Write Update Enhanced Version – I (Avoid Memory Writes): In previous write update protocol every Processor needs to broadcast it on the bus and the Memory(Write Through Caches) Add a dirty bit to each Cache. It would allow us delay a Memory Write until replaced from Cache.

Core 0 Block Refreshed and Read with A (value?) Dirty Bit Represents: Memory needs to be updated(WB to RAM) Dirty Bit Cache only has the updated value

Multi – Writes and Dirty Block Replacement: Memory won’t be updated until Dirty block is replaced

Writing from a different Cache:

Dirty Bit Benefits: Write to Memory only when Dirty Block replaced Read from Memory only if no block in a dirty state, else all reads from the Dirty Bit Cache Significantly reduces Read and Write transactions to Memory

Write Update Optimization # 2 (Bus Optimization): Motivation:  We have reduced traffic on the Memory RD/WR, need to do the same on the bus Bus is the bottleneck of the system Read A from Memory

Write to same Memory Location (S = 1)

Broadcast only when shared among cores: