© Wenisch 2012 Lecture 11 Slide 1 EECS 570 Designing a Directory Protocol: Nomenclature Local Node (L) r Node initiating the transaction we care about.

Slides:



Advertisements
Similar presentations
CSE 502: Computer Architecture
Advertisements

1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
4/16/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
CS 258 Spring An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing Per Stenström, Mats Brorsson, and Lars Sandberg Presented by Allen.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
ECE669 L18: Scalable Parallel Caches April 6, 2004 ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B.
CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Performance of the Shasta distributed shared memory protocol Daniel J. Scales Kourosh Gharachorloo 創造情報学専攻 M グェン トアン ドゥク.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
 Copyright, Lawrence Snyder, Snooping and Distributed Multiprocessor Design We consider more details about how a bus- based SMP works, and then.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
ECE/CS 552: Shared Memory © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith.
CS267 Lecture 61 Shared Memory Hardware and Memory Consistency Modified from J. Demmel and K. Yelick
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting
The University of Adelaide, School of Computer Science
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Lecture 8: Snooping and Directory Protocols
Cache Coherence: Directory Protocol
Cache Coherence: Directory Protocol
Crossing Guard: Mediating Host-Accelerator Coherence Interactions
Architecture and Design of AlphaServer GS320
Lecture 19: Coherence and Synchronization
CS 704 Advanced Computer Architecture
Lecture 18: Coherence and Synchronization
Directory-based Protocol
Lecture 9: Directory-Based Examples II
CS5102 High Performance Computer Systems Distributed Shared Memory
Lecture 2: Snooping-Based Coherence
Lecture 17: Transactional Memories I
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
Lecture 21: Synchronization and Consistency
11 – Snooping Cache and Directory Based Multiprocessors
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
Lecture 26: Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Lecture 10: Directory-Based Examples II
Presentation transcript:

© Wenisch 2012 Lecture 11 Slide 1 EECS 570 Designing a Directory Protocol: Nomenclature Local Node (L) r Node initiating the transaction we care about Home Node (H) r Node wore directory/main memory for the block live Remote Node (R) r Any other node that participates in the transaction

© Wenisch 2012 Lecture 11 Slide 2 EECS 570 Read Transaction L has a cache miss on a load instruction L L H H 1: Get-S 2: Data

© Wenisch 2012 Lecture 11 Slide 3 EECS hop Read Transaction L has a cache miss on a load instruction r Block was previously in modified state at R L L H H 1: Get-S 4: Data R R State: M Owner: R 2: Recall 3: Recall-Ack+Data State: S D Req: L State: S Sharers: R, L State: S Sharers: R, L

© Wenisch 2012 Lecture 11 Slide 4 EECS hop Read Transaction L has a cache miss on a load instruction r Block was previously in modified state at R L L H H 1: Get-S 3: Data R R State: M Owner: R 2: Fwd-Get-S 3: Data State: S D Sharers: L, R State: S Sharers: L, R

© Wenisch 2012 Lecture 11 Slide 5 EECS 570 An Example Race: Writeback & Read L has dirty copy, wants to write back to H R concurrently sends a read to H L L H H 1: Put-M+Data 5: Data R R State: M Owner: L 2: Get-S 3: Fwd-Get-S 4: Race ! Put-M & Fwd-Get-S 6: MI A State: S D Sharers: L,R SIASIA SIASIA Race! Final State: S Race! Final State: S 7: Put-Ack To make your head really hurt: Can optimize away SI A & Put-Ack! L and H each know the race happened, don’t need more msgs. To make your head really hurt: Can optimize away SI A & Put-Ack! L and H each know the race happened, don’t need more msgs.

© Wenisch 2012 Lecture 11 Slide 6 EECS 570 Store-Store Race Line is invalid, both L and R race to obtain write permission L L H H 1: Get-M 6: Fwd-Get-M R R State: M Owner: L Get-M 4: Data [ack=0] 7: Race! Stall for Data, do 1 store, then Fwd to R 3: Fwd-Get-M to L; New Owner: R Fwd-Get-M to L; New Owner: R 5: 8: Data [ack=0] IM AD

© Wenisch 2012 Lecture 11 Slide 7 EECS 570 Worst-case scenario? L evicts dirty copy, R concurrently seeks write permission L L H H 1: Put-M 6: Put-Ack R R State: M Owner: L 2: Get-M 3: Fwd-Get-M Race! Put-M floating around! Wait till its gone… 5: Put-M from NonOwner: Race! L waiting to ensure Put-M gone… Put-M from NonOwner: Race! L waiting to ensure Put-M gone… 4: Data [ack=0] MI A II A