Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; 1995 2-level cache,

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Performance of Cache Memory
Memory Hierarchy: The motivation
Memory Hierarchy: Motivation
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Caches Vincent H. Berk October 21, 2005
EECC550 - Shaaban #1 Lec # 9 Spring Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening.
CIS429/529 Cache Basics 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality °Three.
Now, Review of Memory Hierarchy
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Memory Chapter 7 Cache Memories.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Memory Hierarchy: Motivation
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
EECC551 - Shaaban #1 lec # 7 Winter Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with.
EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
EECC550 - Shaaban #1 Lec # 9 Winter Memory Hierarchy: Motivation The gap between CPU performance and realistic (non-ideal) main memory speed.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
1 Caching Basics CS Memory Hierarchies Takes advantage of locality of reference principle –Most programs do not access all code and data uniformly,
Lecture 5 Review of Memory Hierarchy (Appendix C in textbook)
CSIE30300 Computer Architecture Unit 9: Improving Cache Performance Hsin-Chou Chi [Adapted from material by and
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CPE 626 CPU Resources: Introduction to Cache Memories Aleksandar Milenkovic Web:
CMSC 611: Advanced Computer Architecture
Soner Onder Michigan Technological University
Yu-Lun Kuo Computer Sciences and Information Engineering
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
ECE 445 – Computer Organization
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
CPE 631 Lecture 05: Cache Design
CS 704 Advanced Computer Architecture
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
CPE 631 Lecture 04: Review of the ABC of Caches
Lecture 7 Memory Hierarchy and Cache Design
Memory & Cache.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Memory Hierarchy and Caches

Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache, 60% trans. on Alpha µproc

General Principles Locality –Temporal Locality: referenced again soon –Spatial Locality: nearby items referenced soon Locality + smaller HW is faster = memory hierarchy –Levels: each smaller, faster, more expensive/byte than level below –Inclusive: data found in top also found in the bottom Definitions –Upper is closer to processor –Block: minimum unit that present or not in upper level –Address = Block frame address + block offset address –Hit time: time to access upper level, including hit determination

Cache Measures Hit rate: fraction found in that level –So high that usually talk about Miss rate Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time : time to lower level = ƒ(lower level latency) –transfer time : time to transfer block = ƒ(BW upper & lower, block size)

Block Size vs. Cache Measures Increasing Block Size generally increases Miss Penalty and decreases Miss Rate Miss Rate Miss Penalty Avg. Memory Access Time X=

Four Questions for Memory Hierarchy Designers Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Q1: Where can a block be placed in the upper level? Block 12 placed in 8 block cache: –Fully associative –Direct mapped Mapping = Block Number Modulo Number of Blocks in the Cache –Set associative Mapping = Block Number Modulo Number of Sets in the Cache

Q2: How Is a Block Found If It Is in the Upper Level? Tag on each block –No need to check index or block offset –Valid bit is added to indicate whether or not the entry contains a valid address Increasing associativity shrinks index, expands tag

Q3: Which Block Should be Replaced on a Miss? Easy for Direct Mapped Set Associative or Fully Associative: –Random –LRU Associativity:2-way4-way8-way SizeLRURandomLRURandomLRURandom 16 KB5.18%5.69%4.67%5.29%4.39%4.96% 64 KB1.88%2.01%1.54%1.66%1.39%1.53% 256 KB1.15%1.17%1.13%1.13%1.12%1.12%

Q4: What Happens on a Write? Write through: The information is written to both the block in the cache and to the block in the lower-level memory. Write back: The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. –is block clean or dirty? Pros and Cons of each: –WT: read misses cannot result in writes –WB: no writes of repeated writes Write allocate: Block is loaded on a write miss. Used usually with write-back caches because subsequent writes will be captured by the cache. No-write allocate: Block is modified at lower level and not loaded into the cache. Used typically with write-through caches since subsequent writes will have to go to memory any way.

Example: Data Cache Index = 8 bits: 256 blocks = 8192/(32x1) Direct Mapped

Writes in Alpha No write merging vs. write merging in write buffer 4 entry, 4 word 16 sequential writes in a row

Structural Hazard: Instruction and Data? Separate Instruction Cache and Data Cache Size Instruction CacheData CacheUnified Cache 1 KB3.06%24.61%13.34% 2 KB2.26%20.57%9.78% 4 KB1.78%15.94%7.24% 8 KB1.10%10.19%4.57% 16 KB0.64%6.47%2.87% 32 KB0.39%4.82%1.99% 64 KB0.15%3.77%1.35% 128 KB0.02%2.88%0.95% Relative weighting of instruction vs. data access

2-way Set Associative, Address to Select Word Two sets of Address tags and data RAM 2:1 Mux for the way Use address bits to select correct DRAM

Cache Performance CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time Memory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty) Memory stall clock cycles = Memory accesses x Miss rate x Miss penalty

Cache Performance CPUtime = IC x (CPI execution + Mem accesses per instruction x Miss rate x Miss penalty) x Clock cycle time Misses per instruction = Memory accesses per instruction x Miss rate CPUtime = IC x (CPI execution + Misses per instruction x Miss penalty) x Clock cycle time

Improving Cache Performance Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) Improve performance by: 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache.