4/6/2005 ECE Motivation for Memory Hierarchy What we want from memory Fast Large Cheap There are different kinds of memory technologies Register Files, SRAM, DRAM, MRAM, Disk… size: speed: $/Mbyte: line size: 32 B 0.3 ns 8 B RegisterCacheMemoryDisk Memory 32 KB-4MB 1 ns $60/MB 32 B 1024 MB 30 ns $0.10/MB 4 KB 300 GB 8 X 10 6 ns $0.001/MB larger, slower, cheaper
4/6/2005 ECE Need for speed Assume CPU runs at 3GHz Every instruction requires 4B of instruction and at least one memory access (4B of data) 3 * 8 = 24GB/sec Peak performance of sequential burst of transfer ( Performance for random access is much much slower due to latency ) InterfaceWidthFrequencyBytes/Sec 4-way interleaved PC1600 (DDR200) SDRAM 4 x 64b its100 MHz DDR6.4 GB/s Opteron HyperTran sport memory bus128bits200 MHz DDR6.4 GB/s Pentium 4 "800 MHz" FSB64bits200 MHz QDR6.4 GB/s PC (DDR-II 800) SDRAM64bits400 MHz DDR6.4 GB/s PC (DDR-II 667) SDRAM64bits333 MHz DDR5.3 GB/s Pentium 4 "533 MHz" FSB64bits133 MHz QDR4.3 GB/s
4/6/2005 ECE Need for Large Memory Small memories are fast So just write small programs 640 K of memory should be enough for anybody. -- Bill Gates, 1981 Real programs require large memories Powerpoint 2003 – 25 megabytes Data base applications may require Gigabytes of memory
4/6/2005 ECE Levels in Memory Hierarchy Hierarchy makes memory appear faster, larger and cheaper by exploiting locality of reference Temporal locality Spatial locality Memory Latency (remember from pipeline?) needed for random access Bandwidth for moving blocks of memory Strategy: Provide a Small, Fast Memory which holds a subset of the main memory It is both low latency (smaller address space) and High bandwidth (larger data width)
4/6/2005 ECE Basic Philosophy Move data into ‘smaller, faster’ memory Operate on it (latency) Move it back to ‘larger, cheaper’ memory (bandwidth) How do we keep track if changed What if we run out of space in ‘smaller, faster’ memory?
4/6/2005 ECE Typical Hierarchy Notice that the data width is changing Why? Bandwidth: Transfer rate between various levels CPU-Cache: 24 GBps Cache-Main: GBps Main-Disk: 187MBps (serial ATA/1500) CPU regs CacheCache Memory disk 8 B32 B4 KB cachevirtual memory
4/6/2005 ECE Bandwidth Issue Fetch large blocks at a time (Bandwidth) Supports spatial locality for (i=0; i < length; i++) sum += array[i]; array has spatial locality sum has temporal locality
4/6/2005 ECE Figure of Merit Why are we building the cache? Minimize the average memory access time That means maximize number of access found in the cache “Hit Rate” Percentage of Memory Access In Cache Assumption Every instruction requires exactly 1 memory access Every instruction requires 1 clock cycle to complete Cache access time is same as clock cycle Main memory access time is 20 cycles CPI (cycles/instruction) = hitRate * clocksCacheHit + (1 – hitRate) * clocksCacheMiss
4/6/2005 ECE CPI Highly sensitive to hit rate 90% hit rate.90 * * 20 = 2.9 CPI 95% hit rate.95 * * 20 = 1.95 CPI 99% hit rate.99 * * 20 = 1.01 CPI Hit rate matters Larger cache, multi-level cache improves hit rate
4/6/2005 ECE How is cache implemented Basic concept Traditional Memory Given an address, provide some data Associative Memory Given data, provide an address AKA “Content Addressable Memory” “Data” is the Address “Address” is which cache line
4/6/2005 ECE Cache Implementation Fully associative (read text for set associative) Memory Addr Cache Line 0x400800XX1 0x204500XX4 0x143300XX2 0x542300XX3 …… Cache Line Memory Contents … Associative Memory # of Cache Lines Width of Cache Lines
4/6/2005 ECE The Issues How is the cache organized Size Line size Number of Lines Write policy Replacement Strategy
4/6/2005 ECE Cache Size Need to choose size of lines Bigger Lines Exploit More Spatial Locality Diminishing returns for larger and larger lines Tends to be around 128 B And Number of Lines More lines == Higher hit rate Slower Memory As many as practical Cache Line Memory Contents 1 2“ 3 4 … Width of Cache Lines
4/6/2005 ECE Writing to the Cache Need to keep cache consistent with memory Write to cache and memory simultaneously “Write-through” Refinement: Write to cache and mark as ‘dirty’ Will need to eventually copy back to main memory “Write-back”
4/6/2005 ECE Replacement Strategies Problem: We need to make space in cache for a new entry Which Line Should be ‘Evicted’ Ideal?: Longest Time Till Next Access Least-recently used Complicated Random selection Simple Effect on hit rate is relatively small
4/6/2005 ECE Processor-DRAM Gap (latency) µProc 60%/yr. DRAM 7%/yr DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Patterson, 1998
4/6/2005 ECE Will Do Almost Anything to Improve Hit Rate Lots of techniques Most important: Make the cache big An improvement of 1% is very worthwhile Avoid worst case whenever possible Multilevel caching