Download presentation
Presentation is loading. Please wait.
Published byChastity Daniels Modified over 9 years ago
1
Computer Organization CS224 Fall 2012 Lessons 39 & 40
2
Write-Through On data-write hit, could just update the block in cache l But then cache and memory would be inconsistent Write through: also update memory But makes writes take longer l e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles - Effective CPI = 1 + 0.1×100 = 11 Solution: write buffer l Holds data waiting to be written to memory l CPU continues immediately -Only stalls on write if write buffer is already full
3
Write-Back Alternative: On data-write hit, just update the block in cache l Keep track of whether each block is dirty When a dirty block is replaced l Write it back to memory l Can use a write buffer to allow replacing block to be read first
4
Write Allocation What should happen on a write miss? Alternatives for write-through l Allocate on miss: fetch the block l Write around: don’t fetch the block -Since programs often write a whole block before reading it (e.g., initialization) For write-back l Usually fetch the block
5
Example: Intrinsity FastMATH Embedded MIPS processor l 12-stage pipeline l Instruction and data access on each cycle Split cache: separate I-cache and D-cache l Each 16KB: 256 blocks × 16 words/block l D-cache: write-through or write-back SPEC2000 miss rates l I-cache: 0.4% l D-cache: 11.4% l Weighted average: 3.2%
6
Example: Intrinsity FastMATH
7
Main Memory Supporting Caches Use DRAMs for main memory l Fixed width (e.g., 1 word) l Connected by fixed-width clocked bus -Bus clock is typically slower than CPU clock Example cache block read l 1 bus cycle for address transfer l 15 bus cycles per DRAM access l 1 bus cycle per data transfer For 4-word block, 1-word-wide DRAM l Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles l Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle
8
Increasing Memory Bandwidth 4-word wide memory l Miss penalty = 1 + 15 + 1 = 17 bus cycles l Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle 4-bank interleaved memory l Miss penalty = 1 + 15 + 4×1 = 20 bus cycles l Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle
9
Advanced DRAM Organization Bits in a DRAM are organized as a rectangular array l DRAM accesses an entire row l Burst mode: supply successive words from a row with reduced latency Double data rate (DDR) DRAM l Transfer on rising and falling clock edges Quad data rate (QDR) DRAM l Separate DDR inputs and outputs
10
DRAM Generations YearCapacity$/GB 198064Kbit$1500000 1983256Kbit$500000 19851Mbit$200000 19894Mbit$50000 199216Mbit$15000 199664Mbit$10000 1998128Mbit$4000 2000256Mbit$1000 2004512Mbit$250 20071Gbit$50
11
Associative Caches Fully associative l Allow a given block to go in any cache entry l Requires all entries to be searched at once l Comparator per entry (expensive) N-way set associative l Each set contains n entries l Block number determines which set -(Block number) modulo (#Sets in cache) l Search all entries in a given set at once l n comparators (less expensive) §5.3 Measuring and Improving Cache Performance
12
Associative Cache Example
13
Spectrum of Associativity For a cache with 8 entries
14
Associativity Example Compare 4-block caches l Direct mapped, 2-way set associative, fully associative l Block access sequence: 0, 8, 0, 6, 8 Direct mapped Block addressCache indexHit/missCache content after access 0123 00missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6]
15
Associativity Example 2-way set associative Block addressCache indexHit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6] Fully associative Block addressHit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6]
16
How Much Associativity? Increased associativity decreases miss rate l But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 l 1-way: 10.3% l 2-way: 8.6% l 4-way: 8.3% l 8-way: 8.1%
17
Set Associative Cache Organization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.