Download presentation
Presentation is loading. Please wait.
1
Advanced Computer Architecture Lecture 19
Write back cache design Set associative cache Pentium 4 caching Lillevik s06-l19 University of Portland School of Engineering
2
Project 4 team review Team Dog Lillevik 437s06-l19
University of Portland School of Engineering
3
Project 5 team review Team Cat Team Dog Lillevik 437s06-l19
University of Portland School of Engineering
4
Cache design example CPU: B2Logic model Memory
256 x 8, RAM (no ROM) 4X slower then cache, Rdy signal Cache: direct mapped, write-back Data: 16 x 8, RAM (no delay) Tag: 16 x 4 RAM (no delay) Lillevik s06-l19 University of Portland School of Engineering
5
Memory schematic Lillevik 437s06-l19
University of Portland School of Engineering
6
Controller description
Read hit: read cache data and drive it onto bus Write hit: write data/tag into cache, set Mod Read miss clean: read data from memory, drive it onto bus, write data/tag into cache, clear Mod Write miss clean: write data/tag into cache, set Mod Write back: write cache data into memory, clear Mod, upper address bits from tag memory Lillevik s06-l19 University of Portland School of Engineering
7
Controller block diagram
Cache controller Mg1 Read Mrw# Write Mben Hit Crw# Mod Cben Rdy WB ModDat, Modrw# Ack Lillevik s06-l19 University of Portland School of Engineering
8
State transitions (rd+wr)·mod·hit wr·mod·hit rdy rdy idle WB wr hit
rd hit wr miss cl rd miss cl S10 S8 S4 S2 S1 rdy wr·hit rd·mod·hit rdy rd·hit S0 reset Lillevik s06-l19 University of Portland School of Engineering
9
Output table State Ack WB Cben Crw# Mben Mrw# Mg1 Idle 1 Rd Hit 2
ModDat Modrw# WB Cben Crw# Mben Mrw# Mg1 Idle 1 Rd Hit 2 Wr Hit 4 8 Wr MCln 10 Rd MCln Rdy Lillevik s06-l19 University of Portland School of Engineering
10
Find actions and states?
Adr Inst Action State 60755 WH 2 1 50700 RH 617aa WB, WMC 4, 8 3 51700 4 RMC 10 5 6 60777 WMC 8 7 617cc WB, RMC 4, 10 9 A f0000 idle Lillevik s06-l19 University of Portland School of Engineering
11
Example trace WH RH WB WMC RH WB Lillevik 437s06-l19
University of Portland School of Engineering
12
Example trace, continued.
RMC RMC WMC WB WMC Lillevik s06-l19 University of Portland School of Engineering
13
Example trace, continued.
WB RMC RMC Lillevik s06-l19 University of Portland School of Engineering
14
Three C’s: reasons for misses
Compulsory First cache accesses will always miss Often called “cold-start” Capacity: cache size too small Conflict Same cache location used by several addresses Often called “interference” Lillevik s06-l19 University of Portland School of Engineering
15
Improving cache performance
Larger line (block) size Write buffer Larger cache More caches Compiler optimizations Lillevik s06-l19 University of Portland School of Engineering
16
Hit rate increases, larger cache & block
Cache block/line size Hit rate increases, larger cache & block Lillevik s06-l19 University of Portland School of Engineering
17
Sets of caches Objective: increase hit rate Method
Provide multiple, direct mapped caches One index may refer to 2, 4, 8, etc. cache data values (set associative) Requires a replacement algorithm upon misses All caches checked for hit Lillevik s06-l19 University of Portland School of Engineering
18
One memory location maps to two possible caches
Set associate cache 1 2 7 1F Set 0 Set 1 Memory One memory location maps to two possible caches Lillevik s06-l19 University of Portland School of Engineering
19
Map memory and cache? Cache Address Memory Addresses Set 1 Set 0 000
001 01, 09, 11, 19 010 011 100 101 110 111 Lillevik s06-l19 University of Portland School of Engineering
20
Read hit or miss? Cache memory index tag1 tag0 CPU read adr hit/miss
000 10 00 001 11 010 011 01 100 101 110 111 CPU read adr hit/miss set 1 0000 0 0011 1 0001 0 1111 1 1101 0 1010 0 0000 1 0011 Lillevik s06-l19 University of Portland School of Engineering
21
Replacement algorithms
Least recently used First-in, first-out Round robin Last-in, first-out Selection based on performance, implementation $ Lillevik s06-l19 University of Portland School of Engineering
22
Simple LRU for 2-way Add an access bit, a
a = 0 if hit to Set 0, a = 1 if hit to Set 1 Miss results in replacement to set a Use 1-bit memory to hold access bit Lillevik s06-l19 University of Portland School of Engineering
23
Pentium 4 architecture Lillevik 437s06-l19
University of Portland School of Engineering
24
L2 advanced transfer cache
Lillevik s06-l19 University of Portland School of Engineering
25
Full-speed L2 cache Depth of 256 KB
Eight-way set associative, 128 B line Wide instruction & data interface of 256 bits (32 B) Read latency of 7 clocks, but … Clocked at core frequency (2.0 GHz) Internal bandwidth, 32 x 2.0 G = 64 GB/s Optimizes data transfers to/from memory Lillevik s06-l19 University of Portland School of Engineering
26
L1 data cache Lillevik 437s06-l19
University of Portland School of Engineering
27
L1 data cache Depth of 8 KB Four-way, set associative, 64 B line
Read latency of 2 clocks, but …. Dual port for one load & one store-per-clock Supports advanced pre-fetch algorithm Lillevik s06-l19 University of Portland School of Engineering
28
Dynamic execution Lillevik 437s06-l19
University of Portland School of Engineering
29
Trace cache & branch prediction
Replaces traditional L1 instruction cache Trace cache contains ~12K decoded instructions (micro-operations), removes decode latency Improved branch prediction algorithm, eliminates 33% of P3 mis-predictions (pipeline stalls) Keeps correct instructions executing Lillevik s06-l19 University of Portland School of Engineering
30
Lillevik s06-l19 University of Portland School of Engineering
31
Find actions and states?
Adr Inst Action State 60755 WH 2 1 50700 RH 617aa WB-WMC 4-8 3 51700 4 WB-RMC 4-10 5 RMC 10 6 60777 WMC 8 7 617cc 9 A f0000 nop Lillevik s06-l19 University of Portland School of Engineering
32
Map memory and cache? Cache Address Memory Addresses Set 1 Set 0 000
00000, 01000, 10000, 11000 001 00001, 01001, 10001, 11001 010 00010, 01010, 10010, 11010 011 00011, 01011, 10011, 11011 100 00100, 01100, 10100, 11100 101 00101, 01101, 10101, 11101 110 00110, 01110, 10110, 11110 111 00111, 01111, 10111, 11111 Lillevik s06-l19 University of Portland School of Engineering
33
Read hit or miss? Cache memory index tag1 tag0 CPU read adr hit/miss
000 10 00 001 11 010 011 01 100 101 110 111 CPU read adr hit/miss set 1 0000 h 1 0 0011 m -- 1 0001 0 1111 1 1101 0 1010 0 0000 1 0011 Lillevik s06-l19 University of Portland School of Engineering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.