Download presentation
Presentation is loading. Please wait.
1
4/6/2005 ECE 232 1 Motivation for Memory Hierarchy What we want from memory Fast Large Cheap There are different kinds of memory technologies Register Files, SRAM, DRAM, MRAM, Disk… size: speed: $/Mbyte: line size: 32 B 0.3 ns 8 B RegisterCacheMemoryDisk Memory 32 KB-4MB 1 ns $60/MB 32 B 1024 MB 30 ns $0.10/MB 4 KB 300 GB 8 X 10 6 ns $0.001/MB larger, slower, cheaper
2
4/6/2005 ECE 232 2 Need for speed Assume CPU runs at 3GHz Every instruction requires 4B of instruction and at least one memory access (4B of data) 3 * 8 = 24GB/sec Peak performance of sequential burst of transfer ( Performance for random access is much much slower due to latency ) InterfaceWidthFrequencyBytes/Sec 4-way interleaved PC1600 (DDR200) SDRAM 4 x 64b its100 MHz DDR6.4 GB/s Opteron HyperTran sport memory bus128bits200 MHz DDR6.4 GB/s Pentium 4 "800 MHz" FSB64bits200 MHz QDR6.4 GB/s PC2 6400 (DDR-II 800) SDRAM64bits400 MHz DDR6.4 GB/s PC2 5300 (DDR-II 667) SDRAM64bits333 MHz DDR5.3 GB/s Pentium 4 "533 MHz" FSB64bits133 MHz QDR4.3 GB/s
3
4/6/2005 ECE 232 3 Need for Large Memory Small memories are fast So just write small programs 640 K of memory should be enough for anybody. -- Bill Gates, 1981 Real programs require large memories Powerpoint 2003 – 25 megabytes Data base applications may require Gigabytes of memory
4
4/6/2005 ECE 232 4 Levels in Memory Hierarchy Hierarchy makes memory appear faster, larger and cheaper by exploiting locality of reference Temporal locality Spatial locality Memory Latency (remember from pipeline?) needed for random access Bandwidth for moving blocks of memory Strategy: Provide a Small, Fast Memory which holds a subset of the main memory It is both low latency (smaller address space) and High bandwidth (larger data width)
5
4/6/2005 ECE 232 5 Basic Philosophy Move data into ‘smaller, faster’ memory Operate on it (latency) Move it back to ‘larger, cheaper’ memory (bandwidth) How do we keep track if changed What if we run out of space in ‘smaller, faster’ memory?
6
4/6/2005 ECE 232 6 Typical Hierarchy Notice that the data width is changing Why? Bandwidth: Transfer rate between various levels CPU-Cache: 24 GBps Cache-Main: 0.5-6.4GBps Main-Disk: 187MBps (serial ATA/1500) CPU regs CacheCache Memory disk 8 B32 B4 KB cachevirtual memory
7
4/6/2005 ECE 232 7 Bandwidth Issue Fetch large blocks at a time (Bandwidth) Supports spatial locality for (i=0; i < length; i++) sum += array[i]; array has spatial locality sum has temporal locality
8
4/6/2005 ECE 232 8 Figure of Merit Why are we building the cache? Minimize the average memory access time That means maximize number of access found in the cache “Hit Rate” Percentage of Memory Access In Cache Assumption Every instruction requires exactly 1 memory access Every instruction requires 1 clock cycle to complete Cache access time is same as clock cycle Main memory access time is 20 cycles CPI (cycles/instruction) = hitRate * clocksCacheHit + (1 – hitRate) * clocksCacheMiss
9
4/6/2005 ECE 232 9 CPI Highly sensitive to hit rate 90% hit rate.90 * 1 +.10 * 20 = 2.9 CPI 95% hit rate.95 * 1 +.05 * 20 = 1.95 CPI 99% hit rate.99 * 1 +.01 * 20 = 1.01 CPI Hit rate matters Larger cache, multi-level cache improves hit rate
10
4/6/2005 ECE 232 10 How is cache implemented Basic concept Traditional Memory Given an address, provide some data Associative Memory Given data, provide an address AKA “Content Addressable Memory” “Data” is the Address “Address” is which cache line
11
4/6/2005 ECE 232 11 Cache Implementation Fully associative (read text for set associative) Memory Addr Cache Line 0x400800XX1 0x204500XX4 0x143300XX2 0x542300XX3 …… Cache Line Memory Contents 1 2 3 4 … Associative Memory # of Cache Lines Width of Cache Lines
12
4/6/2005 ECE 232 12 The Issues How is the cache organized Size Line size Number of Lines Write policy Replacement Strategy
13
4/6/2005 ECE 232 13 Cache Size Need to choose size of lines Bigger Lines Exploit More Spatial Locality Diminishing returns for larger and larger lines Tends to be around 128 B And Number of Lines More lines == Higher hit rate Slower Memory As many as practical Cache Line Memory Contents 1 2“ 3 4 … Width of Cache Lines
14
4/6/2005 ECE 232 14 Writing to the Cache Need to keep cache consistent with memory Write to cache and memory simultaneously “Write-through” Refinement: Write to cache and mark as ‘dirty’ Will need to eventually copy back to main memory “Write-back”
15
4/6/2005 ECE 232 15 Replacement Strategies Problem: We need to make space in cache for a new entry Which Line Should be ‘Evicted’ Ideal?: Longest Time Till Next Access Least-recently used Complicated Random selection Simple Effect on hit rate is relatively small
16
4/6/2005 ECE 232 16 Processor-DRAM Gap (latency) µProc 60%/yr. DRAM 7%/yr. 1 10 100 1000 1980198119831984 19851986 198719881989199019911992199319941995 199619971998 1999 2000 DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Patterson, 1998
17
4/6/2005 ECE 232 17 Will Do Almost Anything to Improve Hit Rate Lots of techniques Most important: Make the cache big An improvement of 1% is very worthwhile Avoid worst case whenever possible Multilevel caching
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.