Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies Jesse G. Beu Michael C. Rosier Thomas M. Conte Tinker Research Georgia Institute.

Slides:



Advertisements
Similar presentations
UPC MICRO35 Istanbul Nov Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor Enric Gibert 1 Jesús Sánchez.
Advertisements

A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
ICS’02 UPC An Interleaved Cache Clustered VLIW Processor E. Gibert, J. Sánchez * and A. González * Dept. d’Arquitectura de Computadors Universitat Politècnica.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
1 Memory Performance and Scalability of Intel’s and AMD’s Dual-Core Processors: A Case Study Lu Peng 1, Jih-Kwon Peir 2, Tribuvan K. Prakash 1, Yen-Kuang.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures Per Stenstrom, Truman Joe and Anoop Gupta Presented by Colleen Lewis.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS.
Architectural Impact of SSL Processing Jingnan Yao.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Multiprocessor Cache Coherency
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Distributed File Systems
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Multi-core Systems and Coherence.
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University.
Chapter Twelve Memory Organization
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee Georgia Institute of Technology.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caching Chapter 7.
Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
By Islam Atta Supervised by Dr. Ihab Talkhan
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
March University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.
Performance of Snooping Protocols Kay Jr-Hui Jeng.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Multilevel Memories (Improving performance using alittle “cash”)
A New Coherence Method Using A Multicast Address Network
12.4 Memory Organization in Multiprocessor Systems
CSCI206 - Computer Organization & Programming
Multiprocessor Cache Coherency
Cache Coherence Protocols:
Interconnect with Cache Coherency Manager
Improving Multiple-CMP Systems with Token Coherence
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 25: Multiprocessors
High Performance Computing
Lecture 25: Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies Jesse G. Beu Michael C. Rosier Thomas M. Conte Tinker Research Georgia Institute of Technology

The Problem Coherence protocols can be difficult to design properly Integration of coherence protocols is even more difficult Leads to monolithic, homogenous coherence in a heterogeneous future Directory MESI

The Solution Use existing protocols as building blocks Enable coherence integration and composition Leads to heterogeneous hierarchies in a heterogeneous future Design using best local protocol for the ‘common case’ Directory MESI Broadcast MSI Broadcast MSI Token Rings

Width Variation Observation Ocean_C while varying tier width at fixed 2-level L2 Hit Off-Chip

Manager-Client Pairing

Outline Motivation Introduce Manager-Client Pairing Communication Similarity and Recursion Types of Action Query, Get and Grant MCP Algorithm and Example Impact of Tier Width and Hierarchy Height Future Work and Conclusion

Self-Similarity for Recursion Processor Cache Request Data Transparently asks if we have permission Gets permission if not Cache supplies Data Cache Memory Request Data Memory supplies Data Add ‘asking’ feature Internals of each layer can be ‘black-boxed’

Types of Actions Query – Permission Query to check permission level Get – Request permissions and Data Read and Write Permission, supplying Data Permission upgrade (e.g. Shared -> Modified) Grant – Response to earlier Get request

Manager and Client Pair

MCP Algorithm Load Get Grant Processor

Example – Realm Hit

Example – Realm Miss Downgrade II I I E M M

Latency Impact of Hierarchies Strong analogy with cache design Tier width (# of clients) cache sizing Smaller Tiers result in ‘lower capacity’ with ‘faster access’ Larger Tiers have ‘higher capacity’ with ‘slower access’ Hierarchy height (# of tiers) cache levels Motivation of this work! Single flat protocol won’t scale Analogous to having a monolithic cache Deeper hierarchies are not always good Benefit of smaller, fast tiers while retaining capacity Make too small and the lowest level will frequently miss Additional penalty of hierarchy indirection Consider L3/L4 Caches vs. larger L2/L3 caches

Tier Width Hom e Node Realm Hit Realm Miss

Width Variation Observation Ocean_C while varying tier width at fixed 2-level

Future Work MCP’s role in Validation Willing to discuss off-line Protocol interactions/selection Protocol and NOC topology co-design Hierarchical topologies Cross-vendor coherence integration

Conclusion MCP does address concerns regarding future coherence Uses existing protocols as building blocks Enables coherence integration and composition Demonstration of rapid development of a variety of hierarchy configurations MCP provides a generic coherence hierarchy composition framework to support continued scaling of diverse, massively coherent systems

Questions? Thank you!