James Archibald and Jean-Loup Baer CS258 (Prof. John Kubiatowicz)

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Cache Optimization Summary
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
CS 152 Computer Architecture and Engineering Lecture 21: Directory-Based Cache Protocols Scott Beamer (substituting for Krste Asanovic) Electrical Engineering.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
April 18, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.
Multiprocessor Cache Coherency
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Tufts University Department of Electrical and Computer Engineering
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories Also known as “Snoopy cache” Paper by: Mark S. Papamarcos and Janak H.
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
CAM Content Addressable Memory
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
Performance of Snooping Protocols Kay Jr-Hui Jeng.
The University of Adelaide, School of Computer Science
COSC6385 Advanced Computer Architecture
Cache Memory and Performance
CAM Content Addressable Memory
Multilevel Memories (Improving performance using alittle “cash”)
Computer Engineering 2nd Semester
CS 704 Advanced Computer Architecture
A Study on Snoop-Based Cache Coherence Protocols
Cache Coherence for Shared Memory Multiprocessors
Multiprocessor Cache Coherency
Cache Memory Presentation I
Lecture 9 Outline MESI protocol Dragon update-based protocol
CMSC 611: Advanced Computer Architecture
Lecture 21: Memory Hierarchy
Krste Asanovic Electrical Engineering and Computer Sciences
Example Cache Coherence Problem
Lecture 21: Memory Hierarchy
Cache Coherence Protocols:
Cache Coherence Protocols:
Cache Coherence (controllers snoop on bus transactions)
Lecture 2: Snooping-Based Coherence
Cache Coherence Protocols 15th April, 2006
Lecture 08: Memory Hierarchy Cache Performance
CMSC 611: Advanced Computer Architecture
Multiprocessors - Flynn’s taxonomy (1966)
Performance metrics for caches
Performance metrics for caches
Lecture 8: Directory-Based Cache Coherence
CDA 5155 Caches.
Adapted from slides by Sally McKee Cornell University
Lecture 7: Directory-Based Cache Coherence
11 – Snooping Cache and Directory Based Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 25: Multiprocessors
Lecture 22: Cache Hierarchies, Memory
CS 3410, Spring 2014 Computer Science Cornell University
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 21: Memory Hierarchy
Cache coherence CEG 4131 Computer Architecture III
Update : about 8~16% are writes
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Prof John D. Kubiatowicz
CPE 631 Lecture 20: Multiprocessors
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
CSE 486/586 Distributed Systems Cache Coherence
10/18: Lecture Topics Using spatial locality
Multiprocessors and Multi-computers
Presentation transcript:

Cache Coherence Protocols: Evaluation Using a Microprocessor Simulation Model James Archibald and Jean-Loup Baer CS258 (Prof. John Kubiatowicz) March 19, 2008 Presentation by: Marghoob Mohiuddin

Outline Cache coherence protocols for shared bus multiprocessors Write-back caches Write-once, Synapse, Berkeley, Illinois, Firefly, Dragon Simulation Workload modeled probabilistically Private blocks and shared blocks Cache hits, misses occur with fixed probability

Write-Once Dirty  mem write on replace Invalidates Read miss: Reserved is dirty, but up to date in memory Invalidates Read miss: Dirty copy or from memory Dirty  Valid Write hit: No bus transaction if written once (Reserved  Dirty, Dirty  Dirty) Valid  mem write, other caches invalidate Write Miss: Other caches invalidate

Synapse Dirty  mem write on replace No invalidates Owner: Cache with Dirty copy or memory 1-bit tag per block in memory Memory owns the block Block always comes from memory Read miss: Dirty copy written to memory Dirty  Invalid Write hit: Dirty  no bus transaction Valid  treat as write miss Write Miss: Same as read miss Load as Dirty

Berkeley Dirty/Shared-Dirty  mem write on replace Invalidations, cache-to-cache transfers Dirty blocks not written to memory on being shared Read miss: Owner supplies block Dirty  Shared-Dirty Write hit: Invalidate other copies Change to Dirty Write miss:

Illinois Dirty  mem write on replace Invalidations, requesting cache able to determine block source Read miss: Cached copy if possible Dirty copy written to memory All copies now Shared No cached copies  Valid-Exclusive Write hit: Shared copies invalidated Write miss: Similar to read miss Other copies invalidated

Firefly Dirty  mem write on replace No invalidations, SharedLine Read miss: Cached copy supplied if possible SharedLine raised Dirty block written to memory No cached copies  Valid-Exclusive Write hit: Shared  Write to memory Shared copies updated SharedLine decides Valid/Valid-Exclusive Write Miss: Cached copy if possible Write on bus to update shared copies

Dragon Shared-Dirty/Dirty  mem write on replace No invalidations, SharedLine Read miss: Dirty copy or from memory SharedLine decides Shared-Clean/Valid-Exclusive Write hit: No mem write Shared  caches update copy SharedLine decides Shared-Dirty/Dirty Write miss: Cached copy if possible Write bus to update shared copies

Simulation Model: Multiprocessor Work for w cycles, generate mem request, wait for response from cache Cache: Bus commands higher priority than processor requests Bus: Service requests from caches in FIFO order Requests: read miss, write miss, dirty block write back, request-for-write permission/invalidate/write broadcast

Simulation Model: Workload Shared and private cache blocks Private never present in other caches Processor generates reqs: P(shared)=shd, P(read)=rd Private block reqs modeled probabilistically P(hit)=h, write hit  P(modified)=wmd Fixed num of shared blocks represented explicitly Higher prob. of accessing a recently accessed block More blocks  less actual sharing Replacement P(shared block chosen) no. of shared blocks in cache P(private block replaced modified)=md Blocks chosen at random md, wmd, rd not independent

Simulation Memory/cache mismatch small compared to today Small caches Cache stalls until full block loaded Block = 4 words Invalidate takes 1 cycle Run for 25000 cycles System power Sum of proc. Utilizations Write-through also simulated No write-allocate

Simulation Results: Private Block Handling Efficiency in handling private blocks Write hits to unmodified blocks Illinois, Firefly, Dragon efficient due to Valid-Exclusive state Berkeley has 1 cycle invalidate overhead Write-once has mem write overhead for 1 word Synapse has mem write overhead for 1 block Write-once, Synapse have high overhead if memory latency is 100s of cycles Replacement strategy Write-once: P(mem write for repl. block) smaller Written-once blocks up to date in memory

Simulation Results: Private Block Handling

Simulation Results: Shared Block Handling Efficiency in handling shared blocks Dragon and Firefly best Updates instead of invalidates Performance decreases with decreasing contention Cache hit rates decrease due to increased no. of shared blocks Firefly has overhead of mem write on write hit Berkeley beats Illinois (under high contention) Illinois updates main memory on a miss for a dirty block Write-once low performance Memory update on a miss for dirty block

Simulation Results: Shared Block Handling

Simulation Results: Shared Block Handling