Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University.

Slides:



Advertisements
Similar presentations
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Advertisements

1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
To Include or Not to Include? Natalie Enright Dana Vantrease.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
University of Utah1 Interconnect-Aware Coherence Protocols for Chip Multiprocessors Liqun Cheng Naveen Muralimanohar Karthik Ramani Rajeev Balasubramonian.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
The University of Adelaide, School of Computer Science
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
(C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.
The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor Computer System Laboratory Stanford University Daniel Lenoski, James Laudon, Kourosh.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Prefetching Challenges in Distributed Memories for CMPs Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC – BarcelonaTech.
A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories Also known as “Snoopy cache” Paper by: Mark S. Papamarcos and Janak H.
Lecture 13: Multiprocessors Kai Bu
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.
Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)
Additional Material CEG 4131 Computer Architecture III
1 Lecture: Coherence Protocols Topics: snooping-based protocols.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Timestamp snooping: an approach for extending SMPs Milo M. K. Martin et al. Summary by Yitao Duan 3/22/2002.
A Systematic Methodology to Develop Resilient Cache Coherence Protocols Konstantinos Aisopos (Princeton, MIT) Li-Shiuan Peh (MIT)
Performance of Snooping Protocols Kay Jr-Hui Jeng.
The University of Adelaide, School of Computer Science
Lecture 13: Multiprocessors Kai Bu
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Framework For Exploring Interconnect Level Cache Coherency
תרגול מס' 5: MESI Protocol
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
A New Coherence Method Using A Multicast Address Network
A Study on Snoop-Based Cache Coherence Protocols
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Lecture 2: Snooping-Based Coherence
CMPT 886: Computer Architecture Primer
Lecture 4: Update Protocol
Improving Multiple-CMP Systems with Token Coherence
Lecture 9: Directory Protocol Implementations
The University of Adelaide, School of Computer Science
Dynamic Verification of Sequential Consistency
The University of Adelaide, School of Computer Science
Presentation transcript:

Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University of Rhode Island Joshua J. Yi, Freescale Semiconductor, Inc.

Motivation Previous work on Wrong-path (WP) effects in Uniprocessors  Positive Effects: Prefetching Up to 20% better performance for 181.mcf (SPECint 2000)  Negative Effects: Pollution L1 and L2 cache pollution Extra traffic  Important to simulate WP, especially for some applications How about WP effects in Multiple-CMP systems?

Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion

Wrong-path effects in SMPs – 0 / 4 Broadcast (snoop)- and directory-based SMP systems  MSI, MOSI, MESI, MOESI cache coherence protocols Same issues in uniprocessors apply  Pollution effect  Prefetching effect  Extra cache/memory traffic In contrast to uniprocessor effects, WP cause:  Extra coherence traffic: data, invalidations, write-backs, acknowledgements  Additional cache block state transitions

Wrong-path effects in SMPs – 1 / 4 Replacements A speculatively replaces B A is a Wrong-path Block ! Initial States

Wrong-path effects in SMPs – 2 / 4 Write-backs Write-back dirty copy of B Write-back dirty copy of A Only for MESI (or MSI) M -> S

Wrong-path effects in SMPs – 3 / 4 Invalidations P1 loses its write privileges for block A P1 asks for grant to write and sends invalidation

Wrong-path effects in SMPs – 4 / 4 Data/Bus and Coherence Traffic Increases  L1 references,  L2 references,  coherence traffic snoop, directory requests for data and invalidations Power Consumption Increases  Due to extra cache references, coherence traffic and cache block state transitions Resource Contention  Competing with correct-path resources In contrast to uniprocessors, the increase in the frequency of full service buffers  critical when many cache-to-cache transfers

WP effects in Multiple-CMPs – 0 / 2 CMP node and a 4 CMP system  We studied inclusive L1 and L2 cache  L2 cache also tracks the coherence of cache blocks in L1

WP effects in Multiple-CMPs – 1 / 2 State Transitions when replacement of an SO line in L2 cache SOOIV OINI S I

WP effects in Multiple-CMPs – 1 / 2 State Transitions when an MT line in L2 cache receives a WP request MTMO SO M S

Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion

Experimental Methodology GEMS simulator – Wisconsin Multifacet Group  Based on Virtutech SIMICS  Aggressive out-of-order superscalar processor  Detailed Shared-Memory Model We evaluate 16-processor (4 and 8-CMPs) SPARC V9 system running unmodified Solaris 9 Evaluated 2-level MOSI directory coherence protocol  MOSI: Modified, Owned, Shared, Invalid We track the speculatively generated memory references  and mark them as being on the wrong-path when the branch misprediction is known

Experimental Methodology

Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion

Evaluation Results 1 / 5 4 CMPs8 CMPs -- L1 and L2 Cache Traffic Total memory references increase by 16% and 14% for 4- and 8-CMPs, respectively. L2 cache references increase by 35% and 36%, respectively. For em3d, the increase in the number of L1 misses increase as much as 70%.

Evaluation Results 2 / 5 -- Coherence Traffic Internal -- 36% External -- 30% 4 CMPs8 CMPs

Evaluation Results 3 / 5 -- L1 and L2 cache replacements L %, L % Potential Cache Performance Impact TypeMeaningL1L2 Usedused by a correct-path reference50%7% Unused evicted before being used or never used by a correct- path 42%70% Direct Miss Replaces a cache block that is needed by a later correct-path load, and is evicted before being used. 4%20% Indirect Miss Changes the LRU of a set, which may eventually cause correct-path misses 4%3%

Evaluation Results 4 / 5 -- Write Misses 4 CMPs8 CMPs On average 4% On average 7%

Evaluation Results 5 / 5 -- Cache Line State Transitions 4 CMPs Internal: 2% to 13% External: 1% to 9% Internal: 2% to 17% External: 1% to 10% 8 CMPs

Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion

It is important to model WP memory references in cache- coherent multi-CMP systems For multi-CMPs, not only do the WP affect the performance of individual processors due to prefetching and pollution, they also affect the performance of the entire system by increasing  cache coherence transactions  cache block state transitions  write-backs  invalidations  resource contention For a workload with many cache-to-cache transfers, WP can significantly affect coherence actions.

The End Thank You !