Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Slides:



Advertisements
Similar presentations
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Advertisements

Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/
Computing Systems Memory Hierarchy.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 3 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
Patterson & Hennessy: Ch. 5 Silberschatz et al: Ch. 8 & 9 Memory Hierarchy, Caching, Virtual Memory.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Dr. Hussein Al-Zoubi.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 1 Memory Technology Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Foundation of Systems Xiang Lian The University of Texas-Pan American.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 11: Memory Hierarchy.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Modified by S. J. Fritz Spring 2009 (1) Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ COM 249 – Computer Organization and Assembly.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
COSC3330 Computer Architecture
Improving Memory Access 1/3 The Cache and Virtual Memory
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
Presentation transcript:

Computer Organization CS224 Fall 2012 Lessons 39 & 40

Write-Through  On data-write hit, could just update the block in cache l But then cache and memory would be inconsistent  Write through: also update memory  But makes writes take longer l e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles - Effective CPI = ×100 = 11  Solution: write buffer l Holds data waiting to be written to memory l CPU continues immediately -Only stalls on write if write buffer is already full

Write-Back  Alternative: On data-write hit, just update the block in cache l Keep track of whether each block is dirty  When a dirty block is replaced l Write it back to memory l Can use a write buffer to allow replacing block to be read first

Write Allocation  What should happen on a write miss?  Alternatives for write-through l Allocate on miss: fetch the block l Write around: don’t fetch the block -Since programs often write a whole block before reading it (e.g., initialization)  For write-back l Usually fetch the block

Example: Intrinsity FastMATH  Embedded MIPS processor l 12-stage pipeline l Instruction and data access on each cycle  Split cache: separate I-cache and D-cache l Each 16KB: 256 blocks × 16 words/block l D-cache: write-through or write-back  SPEC2000 miss rates l I-cache: 0.4% l D-cache: 11.4% l Weighted average: 3.2%

Example: Intrinsity FastMATH

Main Memory Supporting Caches  Use DRAMs for main memory l Fixed width (e.g., 1 word) l Connected by fixed-width clocked bus -Bus clock is typically slower than CPU clock  Example cache block read l 1 bus cycle for address transfer l 15 bus cycles per DRAM access l 1 bus cycle per data transfer  For 4-word block, 1-word-wide DRAM l Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles l Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

Increasing Memory Bandwidth  4-word wide memory l Miss penalty = = 17 bus cycles l Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle  4-bank interleaved memory l Miss penalty = ×1 = 20 bus cycles l Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle

Advanced DRAM Organization  Bits in a DRAM are organized as a rectangular array l DRAM accesses an entire row l Burst mode: supply successive words from a row with reduced latency  Double data rate (DDR) DRAM l Transfer on rising and falling clock edges  Quad data rate (QDR) DRAM l Separate DDR inputs and outputs

DRAM Generations YearCapacity$/GB Kbit$ Kbit$ Mbit$ Mbit$ Mbit$ Mbit$ Mbit$ Mbit$ Mbit$ Gbit$50

Associative Caches  Fully associative l Allow a given block to go in any cache entry l Requires all entries to be searched at once l Comparator per entry (expensive)  N-way set associative l Each set contains n entries l Block number determines which set -(Block number) modulo (#Sets in cache) l Search all entries in a given set at once l n comparators (less expensive) §5.3 Measuring and Improving Cache Performance

Associative Cache Example

Spectrum of Associativity  For a cache with 8 entries

Associativity Example  Compare 4-block caches l Direct mapped, 2-way set associative, fully associative l Block access sequence: 0, 8, 0, 6, 8  Direct mapped Block addressCache indexHit/missCache content after access missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6]

Associativity Example  2-way set associative Block addressCache indexHit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6]  Fully associative Block addressHit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6]

How Much Associativity?  Increased associativity decreases miss rate l But with diminishing returns  Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 l 1-way: 10.3% l 2-way: 8.6% l 4-way: 8.3% l 8-way: 8.1%

Set Associative Cache Organization