E. Bilir, R. Dickson, Y. Hu, M. Plakal, D. Sorin,

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

The University of Adelaide, School of Computer Science
Cache Coherence Mechanisms (Research project) CSCI-5593
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Lecture 3. Directory-based Cache Coherence Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture & Programming.
Analytic Evaluation of Shared-Memory Systems with ILP Processors Daniel J. Sorin, Vijay S. Pai, Sarita V. Adve, Mary K. Vernon, and David A. Wood Presented.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)
ECE/CS 552: Shared Memory © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith.
Additional Material CEG 4131 Computer Architecture III
March University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Lecture 8: Snooping and Directory Protocols
Analytic Evaluation of Shared-Memory Systems with ILP Processors
Architecture and Design of AlphaServer GS320
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
A New Coherence Method Using A Multicast Address Network
Lecture 18: Coherence and Synchronization
A Study on Snoop-Based Cache Coherence Protocols
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
CS5102 High Performance Computer Systems Distributed Shared Memory
Cache Coherence Protocols:
Cache Coherence Protocols:
Lecture 2: Snooping-Based Coherence
Lecture 11: Transactional Memory
CMSC 611: Advanced Computer Architecture
Multiprocessors - Flynn’s taxonomy (1966)
Lecture 8: Directory-Based Cache Coherence
Improving Multiple-CMP Systems with Token Coherence
Lecture 7: Directory-Based Cache Coherence
11 – Snooping Cache and Directory Based Multiprocessors
CS 213 Lecture 11: Multiprocessor 3: Directory Organization
/ Computer Architecture and Design
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
CPE 631 Lecture 20: Multiprocessors
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

E. Bilir, R. Dickson, Y. Hu, M. Plakal, D. Sorin, Multicast Snooping E. Bilir, R. Dickson, Y. Hu, M. Plakal, D. Sorin, M. Hill, D. Wood Presented By Derek Hower

Why Multicast? Goal: Solution: Reduce communication overhead in cache coherent multiprocessors Scalable snooping Reduced latency directories Solution: Hybrid snoop/directory protocol

What is it? Replace snooping bus with Multicast Address Network Predict snoop transaction participants Backup speculation with directory Back end is Point-to-point data network (like Starfire)

The Protocol Snooping communication only with processors thought to be involved in the transaction assume transaction is correct until told otherwise Incorrect predictions are handled via nack and semiack Small, predictive, directory protocol backs up the speculative snooping

Mask Prediction Node locality makes prediction feasible local data (stack, some parts of the heap) misses to the same block Sticky-Spatial(k) prediction Tracks block access, last invaldator Introduced locality by using adjacent blocks in the prediction table Possible for unrelated block to influence prediction Memory corrects mistakes

Address Network Built as a fat tree (Modified Isotach) Total ordering accomplished with timestamps no need for synchronized delivery Capable of multiple broadcasts in parallel

Evaluation “Big picture” simulations mean number of sharers prediction capability mask set size network availability Simulated a MSI (not MOSI) protocol only hurts results

Results Prediction accuracy: 73 – 95% Avg. Nodes in Multicast: 2.4 – 5.6 (out of 32) Avg. excess nodes predicted: 0.3 – 3.4 Implementation better than half of optimal

Deep Thinking Evaluation of specifics Complexity Timing: what if time to traverse fat tree overwhelms the benefits of decreased communication? Complexity What is the range (in system size) for which the benefits of multicast networking overcome complexity Much room for improvement: Better prediction Smarter address network