Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.2 (writes), 5.3, 5.5.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Ch7a- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Big is Slow The more information stored, the slower the access 7.1 Amdahl’s.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Deniz Altinbuken CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: (except writes) Caches and Memory.
COSC3330 Computer Architecture
The Goal: illusion of large, fast, cheap memory
CPU cache Acknowledgment
Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Basic Performance Parameters in Computer Architecture:
Cache Memory Presentation I
ReCap Random-Access Memory (RAM) Nonvolatile Memory
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013
Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Lecture 21: Memory Hierarchy
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012
Lecture 23: Cache, Memory, Virtual Memory
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science
CPE 631 Lecture 05: Cache Design
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013
Adapted from slides by Sally McKee Cornell University
Han Wang CS 3410, Spring 2012 Computer Science Cornell University
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers
Lecture 20: OOO, Memory Hierarchy
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 20: OOO, Memory Hierarchy
CS-447– Computer Architecture Lecture 20 Cache Memories
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Memory & Cache.
10/18: Lecture Topics Using spatial locality
Caches & Memory.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

What will you do over Spring Break? Relax Head home Head to a warm destination Stay in (frigid) Ithaca Study

compute jump/branch targets Big Picture: Memory Write- Back Memory Instruction Fetch Execute Instruction Decode extend register file control alu memory din dout addr PC new pc inst IF/ID ID/EX EX/MEM MEM/WB imm B A ctrl D M compute jump/branch targets +4 forward unit detect hazard Code Stored in Memory (also, data and stack) $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) Stack, Data, Code Stored in Memory

compute jump/branch targets Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

compute jump/branch targets Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory $$ memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 $$ addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

Goals for Today: caches Caches vs memory vs tertiary storage Tradeoffs: big & slow vs small & fast working set: 90/10 rule How to predict future: temporal & spacial locality Examples of caches: Direct Mapped Fully Associative N-way set associative Caching Questions How does a cache work? How effective is the cache (hit rate/miss rate)? How large is the cache? How fast is the cache (AMAT=average memory access time)

Big Picture How do we make the processor fast, Given that memory is VEEERRRYYYY SLLOOOWWW!!

Performance CPU clock rates ~0.2ns – 2ns (5GHz-500MHz) Technology Capacity $/GB Latency Tape 1 TB $.17 100s of seconds Disk 4 TB $.03 Millions of cycles (ms) SSD (Flash) 512 GB $2 Thousands of cycles (us) DRAM 8 GB $10 (10s of ns) SRAM off-chip 32 MB $4000 5-15 cycles (few ns) SRAM on-chip 256 KB ??? 1-3 cycles (ns) Others: eDRAM aka 1T SRAM , FeRAM, CD, DVD, … Q: Can we create illusion of cheap + large + fast? 50-300 cycles Capacity and latency are closely coupled Cost is inversely proportional eDRAM is embedded DRAM that is part of the same ASIC and die instead of a separate module 1T-SRAM, stands for 1-transistor SRAM, is an instance eDRAM where the refresh controller is integrated so the memory can externally be treated as SRAM.

Memory Pyramid Memory Pyramid < 1 cycle access L1 Cache RegFile 100s bytes < 1 cycle access L1 Cache (several KB) 1-3 cycle access L3 becoming more common (eDRAM ?) L2 Cache (½-32MB) 5-15 cycle access Memory Pyramid Memory (128MB – few GB) 50-300 cycle access L1 L2 and even L3 all on-chip these days using SRAM and eDRAM Disk (Many GB – few TB) 1000000+ cycle access These are rough numbers: mileage may vary for latest/greatest Caches usually made of SRAM (or eDRAM)

Memory Hierarchy Memory closer to processor small & fast stores active data Memory farther from processor big & slow stores inactive data L1 Cache SRAM-on-chip L2/L3 Cache SRAM Memory DRAM

Memory Hierarchy Memory closer to processor small & fast stores active data Memory farther from processor big & slow stores inactive data $R3 Reg LW $R3, Mem 1% of data access the most L1 Cache SRAM-on-chip L2/L3 Cache SRAM 9% of data is “active” 90% of data inactive (not accessed) Memory DRAM

Memory Hierarchy Insight for Caches If Mem[x] was accessed recently... A: Data that will be used soon Insight for Caches If Mem[x] was accessed recently... … then Mem[x] is likely to be accessed soon Exploit temporal locality: Put recently accessed Mem[x] higher in memory hierarchy since it will likely be accessed again soon … then Mem[x ± ε] is likely to be accessed soon Exploit spatial locality: Put entire block containing Mem[x] and surrounding addresses higher in memory hierarchy since nearby address will likely be accessed put recently accessed Mem[x] higher in the pyramid soon = frequency of loads put entire block containing Mem[x] higher in the pyramid

Memory Hierarchy Memory closer to processor is fast but small usually stores subset of memory farther away “strictly inclusive” Transfer whole blocks (cache lines): 4kb: disk ↔ ram 256b: ram ↔ L2 64b: L2 ↔ L1

Memory Hierarchy Memory trace int n = 4; int k[] = { 3, 14, 0, 10 }; 0x7c9a2b18 0x7c9a2b19 0x7c9a2b1a 0x7c9a2b1b 0x7c9a2b1c 0x7c9a2b1d 0x7c9a2b1e 0x7c9a2b1f 0x7c9a2b20 0x7c9a2b21 0x7c9a2b22 0x7c9a2b23 0x7c9a2b28 0x7c9a2b2c 0x0040030c 0x00400310 0x7c9a2b04 0x00400314 0x7c9a2b00 0x00400318 0x0040031c ... int n = 4; int k[] = { 3, 14, 0, 10 }; int fib(int i) { if (i <= 2) return i; else return fib(i-1)+fib(i-2); } int main(int ac, char **av) { for (int i = 0; i < n; i++) { printi(fib(k[i])); prints("\n"); Temporal Locality Spatial Locality

Cache Lookups (Read) Processor tries to access Mem[x] Check: is block containing Mem[x] in the cache? Yes: cache hit return requested data from cache line No: cache miss read block from memory (or lower level cache) (evict an existing cache line to make room) place new block in cache return requested data  and stall the pipeline while all of this happens

Three common designs A given data block can be placed… … in exactly one cache line  Direct Mapped … in any cache line  Fully Associative … in a small set of cache lines  Set Associative

Direct Mapped Cache Direct Mapped Cache Memory 0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 2 cachelines 4-words per cacheline

Direct Mapped Cache Direct Mapped Cache Memory 0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware addr 32-addr tag index offset 27-bits 1-bits 4-bits Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 0x000000 0x000004 0x000008 0x00000c 2 cachelines 4-words per cacheline

Direct Mapped Cache Direct Mapped Cache Memory 0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware line 0 line 1 32-addr tag index offset 27-bits 1-bits 4-bits Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 0x000000 0x000004 0x000008 0x00000c 2 cachelines 4-words per cacheline

Direct Mapped Cache Direct Mapped Cache Memory 0x000000 0x000004 0x000008 0x00000c 0x000010 0x000014 0x000018 0x00001c 0x000020 0x000024 0x000028 0x00002c 0x000030 0x000034 0x000038 0x00003c 0x000040 0x000044 0x000048 Direct Mapped Cache Each block number mapped to a single cache line index Simplest hardware line 0 line 1 line 2 line 3 line 0 32-addr tag index offset 27-bits 2-bits 3-bits line 1 Cache 4 bytes each, 8 bytes per line, show tags line 0 line 1 line 2 line 3 0x000000 0x000004 line 2 line 3 4 cachelines 2-words per cacheline

Direct Mapped Cache

Direct Mapped Cache (Reading) Tag Index Offset V Tag Block word selector = = Byte offset in word word select 0…001000 tag offset word select data index hit? hit? data 32bits

Direct Mapped Cache (Reading) Tag Index Offset V Tag n bit index, m bit offset Q: How big is cache (data only)? = word select data hit?

Direct Mapped Cache (Reading) Tag Index Offset 2m bytes per block V Tag Block 2n blocks n bit index, m bit offset Q: How big is cache (data only)? Cache of size 2n blocks Block size of 2m bytes Cache Size: 2n bytes per block x 2n blocks = 2n+m bytes = word select data hit?

Direct Mapped Cache (Reading) Tag Index Offset V Tag Block n bit index, m bit offset Q: How much SRAM is needed (data + overhead)? = word select data hit?

Direct Mapped Cache (Reading) Tag Index Offset 2m bytes per block V Tag Block 2n blocks n bit index, m bit offset Q: How much SRAM is needed (data + overhead)? Cache of size 2n blocks Block size of 2m bytes Tag field: 32 – (n + m), Valid bit: 1 SRAM Size: 2n x (block size + tag size + valid bit size) = 2n x (2m bytes x 8 bits-per-byte + (32–n–m) + 1) = word select data hit?

Example:A Simple Direct Mapped Cache Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

Example:A Simple Direct Mapped Cache Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 Addr: 00001 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data block offset V 140 1 00 100 150 110 160 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 0 230 240 250

2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data V 140 1 00 100 150 110 160 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 0 230 240 250

2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00101 110 block offset 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 100 150 110 160 170 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H block offset 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00000 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 3 100 140 230 140 240 250

6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 3 100 140 230 140 240 250

6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01010 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 200 140 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 140 200 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01111 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 2 01 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 4 Hits: 3 250 140 230 140 240 250

8th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 00 100 150 110 160 1 1 01 2 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 4 Hits: 3 140 250 230 140 240 250

8th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 10 ] LB $2  M[ 15 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 01 100 180 150 190 110 160 1 1 01 2 140 200 170 150 210 180 1 00 140 190 150 200 1 01 240 $0 $1 $2 $3 210 250 110 220 Misses: 5 Hits: 3 180 140 230 140 240 250

Misses Three types of misses Cold (aka Compulsory) Capacity Conflict The line is being referenced for the first time Capacity The line was evicted because the cache was not large enough Conflict The line was evicted because of another access whose index conflicted

Misses Q: How to avoid… Cold Misses Capacity Misses Conflict Misses Unavoidable? The data was never in the cache… Prefetching! Capacity Misses Buy more SRAM Conflict Misses Use a more flexible cache design

Direct Mapped Example: 6th Access Using byte addresses in this example! Addr Bus = 5 bits Pathological example Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 00 140 190 150 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 3 100 140 230 140 240 250

6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 220 140 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 00 100 150 110 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 220 140 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data V 140 1 01 180 100 150 110 190 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 180 220 Misses: 4 Hits: 3 140 220 230 140 240 250

8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 01 100 180 150 110 190 160 1 2 140 170 150 180 1 01 220 190 230 200 $0 $1 $2 $3 210 180 220 Misses: 4 Hits: 3 140 220 230 140 240 250

8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 230 150 200 $0 $1 $2 $3 210 180 220 Misses: 4+2 Hits: 3 100 230 140 240 250

10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 150 230 200 210 220 Misses: 4+2 Hits: 3 230 240 250

10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data V 140 1 1 180 100 150 190 110 160 1 2 140 170 150 180 1 1 220 190 230 200 210 220 Misses: 4+2+2 Hits: 3 230 240 250

Cache Organization How to avoid Conflict Misses Three common designs Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set-associative: Block can be in a few (2 to 8) places in the cache

Fully Associative Cache (Reading) Tag Offset V Tag Block = = = = line select “way” = line 64bytes word select 32bits hit? data

Fully Associative Cache (Reading) Tag Offset V Tag Block m bit offset Q: How big is cache (data only)? , 2n blocks (cache lines) “way” = line

Fully Associative Cache (Reading) Tag Offset V Tag Block m bit offset Q: How big is cache (data only)? Cache of size 2n blocks Block size of 2m bytes Cache Size: number-of-blocks x block size = 2n x 2m bytes = 2n+m bytes , 2n blocks (cache lines) “way” = line

Fully Associative Cache (Reading) Tag Offset V Tag Block m bit offset Q: How much SRAM needed (data + overhead)? , 2n blocks (cache lines) “way” = line

Fully Associative Cache (Reading) Tag Offset V Tag Block m bit offset Q: How much SRAM needed (data + overhead)? Cache of size 2n blocks Block size of 2m bytes Tag field: 32 – m Valid bit: 1 SRAM size: 2n x (block size + tag size + valid bit size) = 2nx (2m bytes x 8 bits-per-byte + (32-m) + 1) , 2n blocks (cache lines) “way” = line

Example:Simple Fully Associative Cache Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] 130 tag data V 140 V 150 160 V 170 180 V 190 200 V $0 $1 $2 $3 210 220 230 240 250

1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] 130 tag data V 140 150 160 170 180 190 200 $0 $1 $2 $3 210 220 230 240 250

1st Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 block offset 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data 140 1 0000 100 150 110 160 LRU 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 0 230 240 250

2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data 140 1 0000 100 150 110 160 lru 170 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 1 Hits: 0 230 240 250

2nd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00101 110 120 block offset LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

3rd Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00001 110 block offset 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

4th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00100 110 120 block offset LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

5th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 00000 110 block offset 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 100 150 110 160 1 2 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 3 100 140 230 140 240 250

6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data 140 1 100 150 110 160 1 2 140 170 150 180 190 200 $0 $1 $2 $3 210 110 220 Misses: 2 Hits: 3 100 140 230 140 240 250

6th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 220 140 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 $0 $1 $2 $3 210 110 220 Misses: 3 Hits: 3 220 140 230 140 240 250

7th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 Addr: 01000 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 3 180 140 100 230 140 240 250

8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 3 180 140 100 230 140 240 250

8th and 9th Access Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 $0 $1 $2 $3 210 190 110 220 Misses: 4 Hits: 3+2 100 140 180 230 140 240 250

10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 210 190 220 Misses: 4 Hits: 3+2 230 240 250

10th and 11th Access 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 8 ] LB $2  M[ 4 ] LB $2  M[ 0 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 1 0100 180 210 190 220 Misses: 4 Hits: 3+2+2 230 240 250

Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped no choice, must evict line selected by index Associative caches random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time

Cache Tradeoffs Direct Mapped + Smaller + Less + Faster + Very – Lots – Low – Common Fully Associative Larger – More – Slower – Not Very – Zero + High + ? Tag Size SRAM Overhead Controller Logic Speed Price Scalability # of conflict misses Hit rate Pathological Cases?

Set-associative cache Like a direct-mapped cache Compromise Set-associative cache Like a direct-mapped cache Index into a location Fast Like a fully-associative cache Can store multiple entries decreases thrashing in cache Search in each element

3-Way Set Associative Cache (Reading) Tag Index Offset = = = line select 64bytes word select 32bits hit? data

Comparison: Direct Mapped Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 100 150 110 160 1 2 140 170 150 180 190 200 210 220 Misses: Hits: 230 240 250

Comparison: Direct Mapped Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 230 150 200 210 220 Misses: 8 Hits: 3 230 240 250

Comparison: Fully Associative Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 150 160 170 180 190 200 210 220 Misses: Hits: 230 240 250

Comparison: Fully Associative Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 210 220 Misses: 3 Hits: 8 230 240 250

Comparison: 2 Way Set Assoc Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 130 140 150 160 170 180 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: 3 01/45/00/00 1100 M: 3, H: 3 12,13/01/00/00 1000 M:4, H:3 12,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H:4 01/45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 190 200 210 220 Misses: Hits: 230 240 250

Comparison: 2 Way Set Assoc Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 M H 130 140 150 160 170 180 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: 3 01/45/00/00 1100 M: 3, H: 3 12,13/01/00/00 1000 M:4, H:3 12,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H:4 01/45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 190 200 210 220 Misses: 4 Hits: 7 230 240 250

Remaining Issues To Do: Evicting cache lines Picking cache parameters Writing using the cache

Summary Caching assumptions Benefits small working set: 90/10 rule can predict future: spatial & temporal locality Benefits big & fast memory built from (big & slow) + (small & fast) Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate Fully Associative  higher hit cost, higher hit rate Larger block size  lower hit cost, higher miss penalty Next up: other designs; writing to caches

Administrivia Upcoming agenda HW3 was due yesterday Wednesday, March 13th PA2 Work-in-Progress circuit due before spring break Spring break: Saturday, March 16th to Sunday, March 24th Prelim2 Thursday, March 28th, right after spring break PA2 due Thursday, April 4th

Have a great Spring Break!!!