4/6/2005 ECE 232 1 Motivation for Memory Hierarchy What we want from memory  Fast  Large  Cheap There are different kinds of memory technologies  Register.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Cache Memories September 30, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Caches Oct. 22, 1998 Topics Memory Hierarchy
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
Chapter Twelve Memory Organization
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
223 Memory Hierarchy: Caches, Virtual Memory Big memories are slow Fast memories are small Need to get fast, big memories Processor Computer Control Datapath.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
CACHE _View 9/30/ Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels.
COSC3330 Computer Architecture
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
CSCI206 - Computer Organization & Programming
Cache Memories September 30, 2008
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Systems Architecture II
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Chapter Five Large and Fast: Exploiting Memory Hierarchy
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

4/6/2005 ECE Motivation for Memory Hierarchy What we want from memory  Fast  Large  Cheap There are different kinds of memory technologies  Register Files, SRAM, DRAM, MRAM, Disk… size: speed: $/Mbyte: line size: 32 B 0.3 ns 8 B RegisterCacheMemoryDisk Memory 32 KB-4MB 1 ns $60/MB 32 B 1024 MB 30 ns $0.10/MB 4 KB 300 GB 8 X 10 6 ns $0.001/MB larger, slower, cheaper

4/6/2005 ECE Need for speed Assume CPU runs at 3GHz Every instruction requires 4B of instruction and at least one memory access (4B of data)  3 * 8 = 24GB/sec Peak performance of sequential burst of transfer ( Performance for random access is much much slower due to latency ) InterfaceWidthFrequencyBytes/Sec 4-way interleaved PC1600 (DDR200) SDRAM 4 x 64b its100 MHz DDR6.4 GB/s Opteron HyperTran sport memory bus128bits200 MHz DDR6.4 GB/s Pentium 4 "800 MHz" FSB64bits200 MHz QDR6.4 GB/s PC (DDR-II 800) SDRAM64bits400 MHz DDR6.4 GB/s PC (DDR-II 667) SDRAM64bits333 MHz DDR5.3 GB/s Pentium 4 "533 MHz" FSB64bits133 MHz QDR4.3 GB/s

4/6/2005 ECE Need for Large Memory Small memories are fast So just write small programs 640 K of memory should be enough for anybody. -- Bill Gates, 1981 Real programs require large memories  Powerpoint 2003 – 25 megabytes  Data base applications may require Gigabytes of memory

4/6/2005 ECE Levels in Memory Hierarchy Hierarchy makes memory appear faster, larger and cheaper by exploiting locality of reference  Temporal locality  Spatial locality Memory  Latency (remember from pipeline?) needed for random access  Bandwidth for moving blocks of memory Strategy: Provide a Small, Fast Memory which holds a subset of the main memory  It is both low latency (smaller address space) and  High bandwidth (larger data width)

4/6/2005 ECE Basic Philosophy Move data into ‘smaller, faster’ memory Operate on it (latency) Move it back to ‘larger, cheaper’ memory (bandwidth)  How do we keep track if changed What if we run out of space in ‘smaller, faster’ memory?

4/6/2005 ECE Typical Hierarchy Notice that the data width is changing  Why? Bandwidth: Transfer rate between various levels  CPU-Cache: 24 GBps  Cache-Main: GBps  Main-Disk: 187MBps (serial ATA/1500) CPU regs CacheCache Memory disk 8 B32 B4 KB cachevirtual memory

4/6/2005 ECE Bandwidth Issue Fetch large blocks at a time (Bandwidth)  Supports spatial locality for (i=0; i < length; i++) sum += array[i];  array has spatial locality  sum has temporal locality

4/6/2005 ECE Figure of Merit Why are we building the cache?  Minimize the average memory access time  That means maximize number of access found in the cache “Hit Rate”  Percentage of Memory Access In Cache Assumption  Every instruction requires exactly 1 memory access  Every instruction requires 1 clock cycle to complete  Cache access time is same as clock cycle  Main memory access time is 20 cycles CPI (cycles/instruction) = hitRate * clocksCacheHit + (1 – hitRate) * clocksCacheMiss

4/6/2005 ECE CPI Highly sensitive to hit rate  90% hit rate.90 * * 20 = 2.9 CPI  95% hit rate.95 * * 20 = 1.95 CPI  99% hit rate.99 * * 20 = 1.01 CPI Hit rate matters  Larger cache, multi-level cache improves hit rate

4/6/2005 ECE How is cache implemented Basic concept  Traditional Memory Given an address, provide some data  Associative Memory Given data, provide an address AKA “Content Addressable Memory”  “Data” is the Address  “Address” is which cache line

4/6/2005 ECE Cache Implementation Fully associative (read text for set associative) Memory Addr Cache Line 0x400800XX1 0x204500XX4 0x143300XX2 0x542300XX3 …… Cache Line Memory Contents … Associative Memory # of Cache Lines Width of Cache Lines

4/6/2005 ECE The Issues How is the cache organized  Size Line size Number of Lines  Write policy  Replacement Strategy

4/6/2005 ECE Cache Size Need to choose size of lines  Bigger Lines Exploit More Spatial Locality  Diminishing returns for larger and larger lines  Tends to be around 128 B And Number of Lines  More lines == Higher hit rate  Slower Memory  As many as practical Cache Line Memory Contents 1 2“ 3 4 … Width of Cache Lines

4/6/2005 ECE Writing to the Cache Need to keep cache consistent with memory  Write to cache and memory simultaneously “Write-through”  Refinement: Write to cache and mark as ‘dirty’ Will need to eventually copy back to main memory “Write-back”

4/6/2005 ECE Replacement Strategies Problem: We need to make space in cache for a new entry Which Line Should be ‘Evicted’  Ideal?: Longest Time Till Next Access  Least-recently used Complicated  Random selection Simple  Effect on hit rate is relatively small

4/6/2005 ECE Processor-DRAM Gap (latency) µProc 60%/yr. DRAM 7%/yr DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Patterson, 1998

4/6/2005 ECE Will Do Almost Anything to Improve Hit Rate Lots of techniques Most important: Make the cache big An improvement of 1% is very worthwhile Avoid worst case whenever possible Multilevel caching