Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.
4/6/2005 ECE Motivation for Memory Hierarchy What we want from memory  Fast  Large  Cheap There are different kinds of memory technologies  Register.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Computing Systems Memory Hierarchy.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
Five Components of a Computer
Lecture 6. Cache #2 Prof. Taeweon Suh Computer Science & Engineering Korea University ECM534 Advanced Computer Architecture.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Morgan Kaufmann Publishers
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 11: Memory Hierarchy.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy.
CS161 – Design and Architecture of Computer Systems
CS 704 Advanced Computer Architecture
COSC3330 Computer Architecture
Cache Memory and Performance
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Lecture 08: Memory Hierarchy Cache Performance
ECE 445 – Computer Organization
November 14 6 classes to go! Read
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Memory & Cache.
Presentation transcript:

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width clocked bus n Bus clock is typically slower than CPU clock Example cache block read – 1 bus cycle for address transfer – 15 bus cycles per DRAM access – 1 bus cycle per data transfer For 4-word block, 1-word-wide DRAM – Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles – Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

Cache Issues Computer Organization II 2 Increasing Memory Bandwidth 4-word wide memory – Miss penalty = = 17 bus cycles – Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle 4-bank interleaved memory – Miss penalty = ×1 = 20 bus cycles – Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle

Cache Issues Computer Organization II 3 Advanced DRAM Organization Bits in a DRAM are organized as a rectangular array – DRAM accesses an entire row – Burst mode: supply successive words from a row with reduced latency Double data rate (DDR) DRAM – Transfer on both rising and falling clock edges Quad data rate (QDR) DRAM – Separate DDR inputs and outputs

Cache Issues Computer Organization II 4 Measuring Cache Performance Components of CPU time – Program execution cycles n Includes cache hit time – Memory stall cycles n Mainly from cache misses With simplifying assumptions:

Cache Issues Computer Organization II 5 Cache Performance Example Given – I-cache miss rate = 2% – D-cache miss rate = 4% – Miss penalty = 100 cycles – Base CPI (ideal cache) = 2 – Load & stores are 36% of instructions Miss cycles per instruction – I-cache: 0.02 × 100 = 2 – D-cache: 0.36 × 0.04 × 100 = 1.44 Actual CPI = = 5.44 – Ideal CPU is 5.44/2 =2.72 times faster

Cache Issues Computer Organization II 6 Average Access Time Hit time is also important for performance Average memory access time (AMAT) – AMAT = Hit time + Miss rate × Miss penalty Example – CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% – AMAT = × 20 = 2ns n 2 cycles per instruction

Cache Issues Computer Organization II 7 Performance Summary When CPU performance increased – Miss penalty becomes more significant Decreasing base CPI – Greater proportion of time spent on memory stalls Increasing clock rate – Memory stalls account for more CPU cycles Can’t neglect cache behavior when evaluating system performance

Cache Issues Computer Organization II 8 Associative Caches Fully associative – Allow a given block to go in any cache entry – Requires all entries to be searched at once – Comparator per entry (expensive) n-way set associative – Each set contains n entries – Block number determines which set n (Block number) modulo (#Sets in cache) – Search all entries in a given set at once – n comparators (less expensive)

Cache Issues Computer Organization II 9 Associative Cache Example

Cache Issues Computer Organization II 10 Spectrum of Associativity For a cache with 8 entries

Cache Issues Computer Organization II 11 Associativity Example Compare 4-block caches – Direct mapped, 2-way set associative, fully associative – Block access sequence: 0, 8, 0, 6, 8 Direct mapped Block addressCache indexHit/missCache content after access missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6]

Cache Issues Computer Organization II 12 Associativity Example 2-way set associative Block addressCache indexHit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6] Fully associative Block addressHit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6]

Cache Issues Computer Organization II 13 How Much Associativity Increased associativity decreases miss rate – But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 – 1-way: 10.3% – 2-way: 8.6% – 4-way: 8.3% – 8-way: 8.1%

Cache Issues Computer Organization II 14 Set Associative Cache Organization

Cache Issues Computer Organization II 15 Replacement Policy Direct mapped: no choice Set associative – Prefer non-valid entry, if there is one – Otherwise, choose among entries in the set Least-recently used (LRU) – Choose the one unused for the longest time n Simple for 2-way, manageable for 4-way, too hard beyond that Random – Gives approximately the same performance as LRU for high associativity

Cache Issues Computer Organization II 16 Multilevel Caches Primary cache attached to CPU – Small, but fast Level-2 cache services misses from primary cache – Larger, slower, but still faster than main memory Main memory services L-2 cache misses Some high-end systems include L-3 cache

Cache Issues Computer Organization II 17 Multilevel Cache Example Given – CPU base CPI = 1, clock rate = 4GHz – Miss rate/instruction = 2% – Main memory access time = 100ns With just primary cache – Miss penalty = 100ns/0.25ns = 400 cycles – Effective CPI = × 400 = 9

Cache Issues Computer Organization II 18 Example (cont.) Now add L-2 cache – Access time = 5ns – Global miss rate to main memory = 0.5% Primary miss with L-2 hit – Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss – Extra penalty = 500 cycles CPI = × × 400 = 3.4 Performance ratio = 9/3.4 = 2.6

Cache Issues Computer Organization II 19 Multilevel Cache Considerations Primary cache – Focus on minimal hit time L-2 cache – Focus on low miss rate to avoid main memory access – Hit time has less overall impact Results – L-1 cache usually smaller than a single cache – L-1 block size smaller than L-2 block size