Reuse Distance as a Metric for Cache Behavior Kristof Beyls and Erik D’Hollander Ghent University PDCS - August 2001.

Slides:

Advertisements

Similar presentations

Reference stream A memory reference stream is an n-tuple of addresses which corresponds to n ordered memory accesses. – A program which accesses memory.

Advertisements

Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

Reuse distance as a metric for cache behavior - pdcs2001 [1] Characterization and Optimization of Cache Behavior Kristof Beyls, Yijun Yu, Erik D’Hollander.

Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.

Performance Visualizations using XML Representations Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander.

Discovery of Locality-Improving Refactorings by Reuse Path Analysis – Kristof Beyls – HPCC pag. 1 Discovery of Locality-Improving Refactorings.

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.

Computer System Overview

Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due.

Register Allocation (via graph coloring)

EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.

1 Liveness analysis and Register Allocation Cheng-Chia Chen.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)

Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.

Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science University of California, Irvine.

Systems I Locality and Caching

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.

Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.

Computer System Overview Chapter 1. Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users.

Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

Experiences with Enumeration of Integer Projections of Parametric Polytopes Sven Verdoolaege, Kristof Beyls, Maurice Bruynooghe, Francky Catthoor Compiler.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

Slide 1 Platform-Independent Cache Optimization by Pinpointing Low-Locality Reuse Kristof Beyls and Erik D’Hollander International Conference on Computational.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.

Systems I Cache Organization

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

What is it and why do we need it? Chris Ward CS147 10/16/2008.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.

Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)

Chapter 1 Computer System Overview

CSE 351 Section 9 3/1/12.

Introduction To Computer Systems

5.2 Eleven Advanced Optimizations of Cache Performance

Cache Memory Presentation I

Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

Cache memory Direct Cache Memory Associate Cache Memory

Bojian Zheng CSCD70 Spring 2018

Rose Liu Electrical Engineering and Computer Sciences

Lecture 14: Reducing Cache Misses

Lecture 08: Memory Hierarchy Cache Performance

Performance metrics for caches

Performance metrics for caches

Performance metrics for caches

Computer System Design Lecture 9

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Chapter 1 Computer System Overview

Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg

Performance metrics for caches

Cache Performance Improvements

Introduction to Optimization

Performance metrics for caches

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Reuse Distance as a Metric for Cache Behavior Kristof Beyls and Erik D’Hollander Ghent University PDCS - August 2001

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs Introduction  Gap between processor and memory speed widens exponentially fast –Typical: 1 memory access = 100 processor cycles  Caches can deliver data more quickly, but have limited capacity  Reuse distance is a metric for a programs cache performance

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs a Reuse distance Definition: The reuse distance of a memory access is the number of unique addresses referenced since the last reference to the requested data. addrABCABBAC distance∞∞∞22012

Reuse distance as a metric for cache behavior - pdcs b Reuse distance and fully assoc. cache Lemma: In a fully assoc. LRU cache with n cache lines, a reference will hit if the reuse distance d<n. Corollary: In any cache with n lines, a cache miss with reuse distance d is: d < nConflict miss n ≤ d < ∞Capacity miss d = ∞Cold miss

Reuse distance as a metric for cache behavior - pdcs c Reuse Distance Distribution Spec95fp

Reuse distance as a metric for cache behavior - pdcs d Classifying cache misses for SPEC95fp Cache size ConflictCapacity

Reuse distance as a metric for cache behavior - pdcs e Reuse distance vs. hit probability

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs a Reuse distance after optimization ConflictCapacity

Reuse distance as a metric for cache behavior - pdcs b Effect of compiler optimization  SGIpro compiler for Itanium  30% of conflict misses are removed, 1% of capacity misses are removed.  Conclusion: much work needs to be done to remove the most important kind of cache misses: capacity misses.

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs Capacity miss reduction 1. Hardware level –Increasing cache size CS Reuse distance must be smaller than cache size 2. Compiler level –Loop tiling –Loop fusion 3. Algorithmic level CS

Reuse distance as a metric for cache behavior - pdcs a Hardware level  Increasing cache size  Other hardware techniques are hard to imagine. –Long distance between use and reuse of data for a capacity miss overview over a large portion of execution is needed

Reuse distance as a metric for cache behavior - pdcs b Compiler level  Loop tiling –Reduces long reuse distances in a single loop nest.  Loop fusion –Can reduce distances between consecutive loops.  Existing compiler techniques are not powerful enough. Only 1% of capacity miss is eliminated.

Reuse distance as a metric for cache behavior - pdcs c Algorithmic level  Programmer has a better under- standing of the global program structure.  Programmer can change algorithm, so that long distance reuses decrease.  Visualization of the long reuse distance can help the programmer to identify bad data locality in the code.

Reuse distance as a metric for cache behavior - pdcs Overview 1. Introduction 2. Reuse distance ↔ cache behavior 3. Effect of compiler optimization 4. Capacity miss reduction techniques 5. Conclusion

Reuse distance as a metric for cache behavior - pdcs Conclusion  Reuse distance predicts cache behavior accurately, even for direct mapped caches.  Compiler optimizations for eliminating capacity misses are currently not powerful enough. A large overview over the code is needed.  Programmer has large overview of code. Reuse distance visualization can help the programmer to identify regions with bad locality.