Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg

Slides:

Advertisements

Similar presentations

Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.

Advertisements

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Memory Hierarchy 2.

Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.

Lecture 19: Virtual Memory

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

Lecture 5 Cache Operation ECE 463/521 Fall 2002 Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.

Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.

CS6290 Caches. Locality and Caches Data Locality –Temporal: if data item needed now, it is likely to be needed again in near future –Spatial: if data.

Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

CS203 – Advanced Computer Architecture Cache. Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors:

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Lecture 5 Cache Operation

Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)

Virtual Memory Chapter 7.4.

COSC3330 Computer Architecture

Memory COMPUTER ARCHITECTURE

Associativity in Caches Lecture 25

Improving Memory Access 1/3 The Cache and Virtual Memory

CSC 4250 Computer Architectures

Multilevel Memories (Improving performance using alittle “cash”)

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Lecture: Cache Hierarchies

The University of Adelaide, School of Computer Science

5.2 Eleven Advanced Optimizations of Cache Performance

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Morgan Kaufmann Publishers

Lecture: Cache Hierarchies

ECE 445 – Computer Organization

Lecture 21: Memory Hierarchy

Lecture 23: Cache, Memory, Virtual Memory

Lecture 14: Reducing Cache Misses

FIGURE 12-1 Memory Hierarchy

Chapter 5 Memory CSE 820.

Lecture 22: Cache Hierarchies, Memory

Lecture: Cache Innovations, Virtual Memory

Module IV Memory Organization.

Adapted from slides by Sally McKee Cornell University

M. Usha Professor/CSE Sona College of Technology

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Lecture 22: Cache Hierarchies, Memory

Lecture 11: Cache Hierarchies

Virtual Memory Prof. Eric Rotenberg

Caches: reducing miss penalty Prof. Eric Rotenberg

Lecture: Cache Hierarchies

Basic Cache Operation Prof. Eric Rotenberg

CSC3050 – Computer Architecture

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Lecture 21: Memory Hierarchy

Cache - Optimization.

Cache Memory Rabi Mahapatra

Presentation transcript:

Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg ECE 463/563 Fall `18 Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Three factors of cache performance hit time: The cache access time (in units of seconds) Depends on cache configuration, circuit level implementation, and technology miss rate: The fraction of memory references that miss in the cache: miss rate = number of misses / number of references Depends on cache configuration and the running program’s memory reference stream miss penalty: The time it takes to bring a memory block into the cache With one level of cache and a simple memory system, miss penalty is often approximated with a fixed value More generally, different misses perceive different miss penalties due to complex memory hierarchy: multiple levels of cache and a complex memory system. In this case, miss penalty is the average miss penalty. Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Average access time (AAT) Memory stall time: Memory stall time = Number of misses x Miss penalty Total time spent on memory references, including both hits and misses: Total access time = (Number of references) x (Hit time) + (Number of misses) x (Miss penalty) Average access time (AAT) for a single memory reference: AAT = Total access time / Number of references AAT = (Hit time) + (Number of misses / Number of references) x (Miss penalty) AAT = (Hit time) + (Miss rate) x (Miss penalty) Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Measuring cache performance Run a program and collect a trace of accesses Simulate “tag store” part of caches under consideration Measure miss rate Can use to estimate average access time Example Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Improving cache performance Reduce miss rate Block size, cache size, associativity Prefetching: Hardware, Software Transform program to increase locality Reduce miss penalty L2 caches Victim caches Early restart, critical word first Write buffers Reduce hit time Simple caches, small caches Pipeline writes Overlap address translation (TLB access) with cache access Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Categories of misses (3C’s model) Compulsory misses The first reference to a memory block Capacity and Conflict misses Scenario: A memory block is in the cache, then replaced, then re-referenced The re-reference is a miss (either capacity miss or conflict miss) Difference between capacity miss and conflict miss Capacity miss: A miss that occurs due to the limited capacity of the cache A capacity miss is attributed to limited capacity, not constraints of the cache’s mapping function Conflict miss: A miss that occurs due to limited capacity within a set A conflict miss is attributed to constraints of the cache’s mapping function. For example, suppose only four memory blocks are ever referenced by a program, the cache is direct-mapped and has a capacity of 256 memory blocks, but all four memory blocks referenced by the program map to the same set. Clearly there is sufficient capacity for the four blocks, but the inflexible mapping function prevents caching all of them at the same time. Note: A fully-associative cache only suffers compulsory misses and capacity misses. It doesn’t suffer conflict misses. Any non-compulsory miss is attributed to limited capacity, not to the mapping function, because a block can be put anywhere in the cache. Direct-mapped and set-associative caches suffer from all three types of misses. Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg How to classify a miss? Suppose we are simulating a “cache under observation” and a given reference misses in the cache. We want to classify the miss as compulsory, capacity, or conflict. If first reference to a memory block Compulsory miss This also means that the number of compulsory misses is the number of unique memory blocks ever referenced. Else The miss is either a capacity miss or a conflict miss. How to figure out the type of miss? In addition to simulating the cache under observation, ALSO simulate a fully-associative cache that has the same total capacity (same number of blocks) as the cache under observation. If fully-associative test cache also misses, then classify the miss as a capacity miss If fully-associative test cache hits, then classify the miss as a conflict miss Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg Example Direct-mapped cache Capacity is 2 memory blocks The processor references three different memory blocks (A,B,C) in the following sequence All three memory blocks map to the same set C C B C B C A C B Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg Example (cont) cache under observation: direct-mapped, two blocks test cache: fully-assoc., two blocks reference hit/miss, miss type C Compulsory miss C C C Hit C C B Compulsory miss B C (lru) B C Conflict miss C C B (lru) B Conflict miss B C (lru) B C Conflict miss C C B (lru) A Compulsory miss A C (lru) A C Conflict miss C C A (lru) B Capacity miss Fall 2018 ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg