Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University of Rhode Island Joshua J. Yi, Freescale Semiconductor, Inc.
Motivation Previous work on Wrong-path (WP) effects in Uniprocessors Positive Effects: Prefetching Up to 20% better performance for 181.mcf (SPECint 2000) Negative Effects: Pollution L1 and L2 cache pollution Extra traffic Important to simulate WP, especially for some applications How about WP effects in Multiple-CMP systems?
Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion
Wrong-path effects in SMPs – 0 / 4 Broadcast (snoop)- and directory-based SMP systems MSI, MOSI, MESI, MOESI cache coherence protocols Same issues in uniprocessors apply Pollution effect Prefetching effect Extra cache/memory traffic In contrast to uniprocessor effects, WP cause: Extra coherence traffic: data, invalidations, write-backs, acknowledgements Additional cache block state transitions
Wrong-path effects in SMPs – 1 / 4 Replacements A speculatively replaces B A is a Wrong-path Block ! Initial States
Wrong-path effects in SMPs – 2 / 4 Write-backs Write-back dirty copy of B Write-back dirty copy of A Only for MESI (or MSI) M -> S
Wrong-path effects in SMPs – 3 / 4 Invalidations P1 loses its write privileges for block A P1 asks for grant to write and sends invalidation
Wrong-path effects in SMPs – 4 / 4 Data/Bus and Coherence Traffic Increases L1 references, L2 references, coherence traffic snoop, directory requests for data and invalidations Power Consumption Increases Due to extra cache references, coherence traffic and cache block state transitions Resource Contention Competing with correct-path resources In contrast to uniprocessors, the increase in the frequency of full service buffers critical when many cache-to-cache transfers
WP effects in Multiple-CMPs – 0 / 2 CMP node and a 4 CMP system We studied inclusive L1 and L2 cache L2 cache also tracks the coherence of cache blocks in L1
WP effects in Multiple-CMPs – 1 / 2 State Transitions when replacement of an SO line in L2 cache SOOIV OINI S I
WP effects in Multiple-CMPs – 1 / 2 State Transitions when an MT line in L2 cache receives a WP request MTMO SO M S
Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion
Experimental Methodology GEMS simulator – Wisconsin Multifacet Group Based on Virtutech SIMICS Aggressive out-of-order superscalar processor Detailed Shared-Memory Model We evaluate 16-processor (4 and 8-CMPs) SPARC V9 system running unmodified Solaris 9 Evaluated 2-level MOSI directory coherence protocol MOSI: Modified, Owned, Shared, Invalid We track the speculatively generated memory references and mark them as being on the wrong-path when the branch misprediction is known
Experimental Methodology
Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion
Evaluation Results 1 / 5 4 CMPs8 CMPs -- L1 and L2 Cache Traffic Total memory references increase by 16% and 14% for 4- and 8-CMPs, respectively. L2 cache references increase by 35% and 36%, respectively. For em3d, the increase in the number of L1 misses increase as much as 70%.
Evaluation Results 2 / 5 -- Coherence Traffic Internal -- 36% External -- 30% 4 CMPs8 CMPs
Evaluation Results 3 / 5 -- L1 and L2 cache replacements L %, L % Potential Cache Performance Impact TypeMeaningL1L2 Usedused by a correct-path reference50%7% Unused evicted before being used or never used by a correct- path 42%70% Direct Miss Replaces a cache block that is needed by a later correct-path load, and is evicted before being used. 4%20% Indirect Miss Changes the LRU of a set, which may eventually cause correct-path misses 4%3%
Evaluation Results 4 / 5 -- Write Misses 4 CMPs8 CMPs On average 4% On average 7%
Evaluation Results 5 / 5 -- Cache Line State Transitions 4 CMPs Internal: 2% to 13% External: 1% to 9% Internal: 2% to 17% External: 1% to 10% 8 CMPs
Outlines Wrong Path Effects in SMPs and multi-CMPs Simulation Methodology Evaluation Results Conclusion
It is important to model WP memory references in cache- coherent multi-CMP systems For multi-CMPs, not only do the WP affect the performance of individual processors due to prefetching and pollution, they also affect the performance of the entire system by increasing cache coherence transactions cache block state transitions write-backs invalidations resource contention For a workload with many cache-to-cache transfers, WP can significantly affect coherence actions.
The End Thank You !