Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due.

Slides:



Advertisements
Similar presentations
Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.
Advertisements

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 2, 2005 Mon, Nov 7, 2005 Topic: Caches (contd.)
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Caches Vincent H. Berk October 21, 2005
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 10, 2003 Topic: Caches (contd.)
CS252/Culler Lec 4.1 1/31/02 CS203A Graduate Computer Architecture Lecture 14 Cache Design Taken from Prof. David Culler’s notes.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Lec17.1 °Q1: Where can a block be placed in the upper level? (Block placement) °Q2: How is a block found if it is in the upper level? (Block identification)
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Improving Cache Performance Four categories of optimisation: –Reduce miss rate –Reduce miss penalty –Reduce miss rate or miss penalty using parallelism.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.
Memory Hierarchy—Improving Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2008.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
For each of these, where could the data be and how would we find it? TLB hit – cache or physical memory TLB miss – cache, memory, or disk Virtual memory.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Memory Hierarchy Design. 2 Outline Introduction Cache Performance Reducing Cache Miss Penalty Reducing Miss Rate Virtual Memory Protection and Examples.
Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.
Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)
CSE 351 Section 9 3/1/12.
Associativity in Caches Lecture 25
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Lecture: Cache Hierarchies
The University of Adelaide, School of Computer Science
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Lecture: Cache Hierarchies
Lecture 14: Reducing Cache Misses
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
Lecture: Cache Innovations, Virtual Memory
Performance metrics for caches
Performance metrics for caches
Memory Hierarchy.
Performance metrics for caches
Siddhartha Chatterjee
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Performance metrics for caches
Cache - Optimization.
Cache Memory Rabi Mahapatra
Cache Performance Improvements
Performance metrics for caches
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due to cache capacity limit, some blocks are discarded resulting in a miss 3.Conflict –Due to a conflict in a set of blocks, some blocks are discarded resulting in a miss

Miss Rate Reduction Techniques 1.Larger Block Size: Increasing the block size decreases compulsory misses Larger blocks take advantage of spatial locality But, larger blocks increase the miss penalty Also, larger block may increase conflict misses There is a trade-off between miss rate (reduction) and miss penalty (increase)

Miss Rate Reduction Techniques (Cont’d) (Example on page 394) Memory system takes 40 clock cycles of overhead, then 16 bytes every 2 clock cycles. Which block size in Fig has the smallest average access time for each cache size? Assume hit time = 1 Clock Cycle Average access time (in clock cycles) = High latency and high bandwidth encourage larger block size

Miss Rate Reduction Techniques (Cont’d) 2.Higher Associativity: Two rules of thumb: i.8-way set associative is as effective as fully associative ii.A direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2 Greater associativity may increase the hit time

Miss Rate Reduction Techniques (Cont’d) (Example on page 396) For finding miss rates, see Fig. 5.9/page 391

Miss Rate Reduction Techniques (Cont’d) Larger Caches: –Increasing the capacity of the cache will reduce capacity miss rate –The drawbacks are larger hit time and higher cost Pseudo-Associative Caches: –The first cache access is just as in the direct- mapped cache –If the first access is not a hit, another cache entry is checked to see if it matches there (i.e., in the pseudo set) Figure 5.16

Miss Rate Reduction Techniques (Cont’d) –Do example on page 399

Miss Rate Reduction Techniques (Cont’d) Victim Caches: A small fully associative cache called victim cache, is placed between the main cache and its refill path. (see Fig 5.15) The block discarded from the main cache because of a miss, are kept in the victim cache The victim cache is checked after a miss in the main cache, before going to the next lower-level memory Victim cache reduces the conflict misses; a four- entry victim cache removed 20% to 95% of the conflict misses in a 4-kB direct-mapped cache

Miss Rate Reduction Techniques (Cont’d) Hardware Prefetching of Instructions and Data: On a miss, in additional to the requested block, the next consecutive block is also fetched The extra block is kept in an instruction stream buffer Example on page 401

Miss Rate Reduction Techniques (Cont’d) Alpha uses instruction prefetching It takes 1 extra CC, if instruction is found in the instruction stream buffer; Prefetch hit rate = 25% Miss rate for 8-KB instruction cache = 1.1% Hit time = 2 CC; Miss penalty = 50 CC Memory Access Time = 2 + (.011 x.25) x 1 + (.011x.75)x50 = CC For effective miss rate; Memory Access time = Hit time + Miss rate x Miss penalty Effective Miss Rate = 50 This is better than 1.1%, the miss rate of an 8-KB Instruction Cache = 0.83%

Miss Rate Reduction Techniques (Cont’d) Compiler Optimizations Code is rearranged by the compiler to reduce miss rates, particularly conflict misses Two examples i.Loop Interchange –In multiple nested loops, changing the order of the nesting can reduce the miss rate –This approach makes use of spatial locality ii.Blocking –Loop interchange does not always work, e.g., matrix multiplication –Block algorithms operate on submatrices or blocks, which can fit within the cache.