Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches 2 P & H Chapter 5.2 (writes), 5.3, 5.5.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Deniz Altinbuken CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: (except writes) Caches and Memory.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
The Goal: illusion of large, fast, cheap memory
CPU cache Acknowledgment
Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Multilevel Memories (Improving performance using alittle “cash”)
Prof. Hakim Weatherspoon
Cache Memory Presentation I
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012
Virtual Memory Hakim Weatherspoon CS 3410, Spring 2012
Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science
CPE 631 Lecture 05: Cache Design
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013
Adapted from slides by Sally McKee Cornell University
Han Wang CS 3410, Spring 2012 Computer Science Cornell University
CMSC 611: Advanced Computer Architecture
Lecture 20: OOO, Memory Hierarchy
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 20: OOO, Memory Hierarchy
Lecture 22: Cache Hierarchies, Memory
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Fundamentals of Computing: Computer Architecture
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Caches & Memory.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.1-5.3, 5.3-5.4, 5.8, Also, 5.13 & 5.17

compute jump/branch targets Big Picture: Memory Write- Back Memory Instruction Fetch Execute Instruction Decode extend register file control alu memory din dout addr PC new pc inst IF/ID ID/EX EX/MEM MEM/WB imm B A ctrl D M compute jump/branch targets +4 forward unit detect hazard Code Stored in Memory (also, data and stack) $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) Stack, Data, Code Stored in Memory

compute jump/branch targets Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

compute jump/branch targets Big Picture: Memory Memory: big & slow vs Caches: small & fast Write- Back Code Stored in Memory (also, data and stack) compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory $$ memory register file $0 (zero) $1 ($at) $29 ($sp) $31 ($ra) A alu D D B +4 $$ addr PC din dout inst control B M memory extend new pc imm forward unit detect hazard Stack, Data, Code Stored in Memory ctrl ctrl ctrl IF/ID ID/EX EX/MEM MEM/WB

Big Picture How do we make the processor fast, Given that memory is VEEERRRYYYY SLLOOOWWW!!

Big Picture How do we make the processor fast, Given that memory is VEEERRRYYYY SLLOOOWWW!! Insight for Caches small working set: 90/10 rule can predict future: spatial & temporal locality Benefits Abstraction: big & fast memory built from (big & slow memory; DRAM) + (small & fast cache; SRAM)

Memory Hierarchy < 1 cycle access L1 Cache 1-3 cycle access RegFile 100s bytes < 1 cycle access 1-3 cycle access L1 Cache (several KB) L3 becoming more common L2 Cache (½-32MB) 5-15 cycle access Memory (128MB – few GB) 150-200 cycle access Disk (Many GB – few TB) 1000000+ cycle access *These are rough numbers, mileage may vary.

Just like your Contacts! ~ 1 click Last Called 1-3 clicks Speed Dial 5-15 clicks Favorites More clicks Contacts Bunch of clicks Google / Facebook / White Pages * Will refer to this analogy using GREEN in the slides.

Memory Hierarchy L1 Cache SRAM-on-chip 1% of data most accessed Speed Dial 1% most called people L2/L3 Cache SRAM 9% of data is “active” Favorites 9% of people called Memory DRAM 90% of data inactive (not accessed) Contacts 90% people rarely called

Memory Hierarchy Memory closer to processor Memory farther small & fast stores active data Memory farther from processor big & slow stores inactive data Speed Dial small & fast L1 Cache SRAM-on-chip L2/L3 Cache SRAM Contact List big & slow Memory DRAM

Memory Hierarchy Memory closer to processor is fast but small usually stores subset of memory farther “strictly inclusive” Transfer whole blocks (cache lines): 4kB: disk ↔ RAM 256B: RAM↔L2 64B: L2 ↔ L1 L1 Cache SRAM-on-chip L2/L3 Cache SRAM Memory DRAM

Goals for Today: caches Comparison of cache architectures: Direct Mapped Fully Associative N-way set associative Caching Questions How does a cache work? How effective is the cache (hit rate/miss rate)? How large is the cache? How fast is the cache (AMAT=average memory access time) Next time: Writing to the Cache Write-through vs Write-back Caches vs memory vs tertiary storage - Tradeoffs: big & slow vs small & fast - working set: 90/10 rule - How to predict future: temporal & spacial locality

Next Goal How do the different cache architectures compare? Cache Architecture Tradeoffs? Cache Size? Cache Hit rate/Performance?

Cache Tradeoffs A given data block can be placed… … in any cache line  Fully Associative (a contact maps to any speed dial number) … in exactly one cache line  Direct Mapped (a contact maps to exactly one speed dial number) … in a small set of cache lines  Set Associative (like direct mapped, a contact maps to exactly one speed dial number, but many different contacts can associate with the same speed dial number at the same time)

Cache Tradeoffs Direct Mapped + Smaller + Less + Faster + Very – Lots – Low – Common Fully Associative Larger – More – Slower – Not Very – Zero + High + ? Tag Size SRAM Overhead Controller Logic Speed Price Scalability # of conflict misses Hit rate Pathological Cases?

Compromise: Set-associative cache Like a direct-mapped cache Cache Tradeoffs Compromise: Set-associative cache Like a direct-mapped cache Index into a location Fast Like a fully-associative cache Can store multiple entries decreases thrashing in cache Search in each element

Direct Mapped Cache Use the first letter to index! Contacts Speed Dial Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 ABC Baker, J. 3 DEF Dunn, A. 4 GHI Gill, S. 5 JKL Jones, N. 6 MNO Mann, B. 7 PQRS Powell, J. 8 TUV Taylor, B. 9 WXYZ Wright, T. 4 bytes each, 8 bytes per line, show tags Block Size: 1 Contact Number

Fully Associative Cache Contacts No index! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 Baker, J. Dunn, A. Gill, S. Jones, N. Mann, B. Powell, J. Taylor, B. Wright, T. 4 bytes each, 8 bytes per line, show tags Block Size: 1 Contact Number

Fully Associative Cache Contacts No index! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 Mann, B. Dunn, A. Taylor, B. Wright, T. Baker, J. Powell, J. Gill, S. Jones, N. 4 bytes each, 8 bytes per line, show tags Block Size: 1 Contact Number

Fully Associative Cache Contacts No index! Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Jones, N. Lee, V. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

Fully Associative Cache Contacts No index! Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 Mann, B. Moore, F. Powell, C. Sam, J. Gill, D. Harris, F. Wright, T. Zhang, W. Baker, J. Baker, S. Dunn, A. Foster, D. Taylor, B. Taylor, O. Jones, N. Lee, V. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

N-way Set Associative Cache Contacts 2-way set associative cache 8 sets, use first letter to index set Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Jones, N. 777-777-7777 Lee, V. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 ABC DEF GHI JKL MNO PQRS TUV WXYZ Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Jones, N. Lee, V. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

N-way Set Associative Cache Contacts 2-way set associative cache 8 sets, use first letter to index set Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 ABC DEF GHI JKL MNO PQRS TUV WXYZ Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Henry, J. Isaacs, M. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

2-Way Set Associative Cache (Reading) Tag Index Offset = = line select 64bytes word select 32bits hit? data

3-Way Set Associative Cache (Reading) Tag Index Offset = = = line select 64bytes word select 32bits hit? data

3-Way Set Associative Cache (Reading) Tag Index Offset 2m bytes-per-block 2n blocks n bit index, m bit offset, N-way Set Associative Q: How big is cache (data only)? Cache of size 2n sets Block size of 2m bytes, N-way set associative Cache Size: 2m bytes-per-block x (2n sets x N-way-per-set) = N x 2n+m bytes

3-Way Set Associative Cache (Reading) Tag Index Offset 2m bytes-per-block 2n blocks n bit index, m bit offset, N-way Set Associative Q: How much SRAM is needed (data + overhead)? Cache of size 2n sets Block size of 2m bytes, N-way set associative Tag field: 32 – (n + m), Valid bit: 1 SRAM Size: 2n sets x N-way-per-set x (block size + tag size + valid bit size) = 2n x N-way x (2m bytes x 8 bits-per-byte + (32–n–m) + 1)

Comparison: Direct Mapped Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 100 150 110 160 1 2 140 170 150 180 190 200 210 220 Misses: Hits: 230 240 250

Comparison: Direct Mapped Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Hits? a) 3 b) 4 c) 5 d) 7 e) 8 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 100 150 110 160 1 2 140 170 150 180 190 200 210 220 Misses: Hits: 230 240 250

Comparison: Direct Mapped Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 2 bit tag field 2 bit index field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 00 180 100 100 150 110 190 110 160 1 2 140 170 150 180 1 00 220 140 190 230 150 200 210 220 Misses: 8 Hits: 3 230 240 250

Comparison: Fully Associative Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Hits? a) 3 b) 4 c) 5 d) 7 e) 8 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 130 tag data 140 150 160 170 180 190 200 210 220 Misses: Hits: 230 240 250

Comparison: Fully Associative Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory 4 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] M H 130 tag data 140 1 0000 100 150 110 160 1 0010 140 170 150 180 1 0110 220 190 230 200 210 220 Misses: 3 Hits: 8 230 240 250

Comparison: 2 Way Set Assoc Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 130 140 150 160 170 Hits? a) 3 b) 4 c) 5 d) 6 e) 7 180 190 200 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: 3 01/45/00/00 1100 M: 3, H: 3 12,13/01/00/00 1000 M:4, H:3 12,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H:4 01/45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 210 220 Misses: Hits: 230 240 250

Comparison: 2 Way Set Assoc Using byte addresses in this example! Addr Bus = 5 bits Cache 2 sets 2 word block 3 bit tag field 1 bit set index field 1 bit block offset field Memory Processor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 tag data LB $1  M[ 1 ] LB $2  M[ 5 ] LB $3  M[ 1 ] LB $3  M[ 4 ] LB $2  M[ 0 ] LB $2  M[ 12 ] LB $2  M[ 12 ] LB $2  M[ 5 ] 120 M H 001 140 011 220 130 150 230 140 150 160 170 180 190 200 0001 M: 1, H: 0 0101 M: 2, H: 0 0001 M: 2, H: 3 01/45/00/00 1100 M: 3, H: 3 12,13/01/00/00 1000 M:4, H:3 12,13/4,5/00/00 0100 M:5, H:3 8,9/45/00/00 0000 M:5, H:4 01/45/00/00 1000 M:4, H:3 8,9/12,13/00/00 M:7,H:4 210 220 Misses: 4 Hits: 7 230 240 250

Misses Cache misses: classification The line is being referenced for the first time Cold (aka Compulsory) Miss The line was in the cache, but has been evicted… … because some other access with the same index Conflict Miss … because the cache is too small i.e. the working set of program is larger than the cache Capacity Miss Q: What causes a cache miss? conflict: collisions, competition

Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped no choice, must evict line selected by index Associative caches Random: select one of the lines at random Round-Robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the longest time

Takeaway Direct Mapped  fast but low hit rate Fully Associative  higher hit cost, but higher hit rate N-way Set Associative  middleground

Summary Illusion: Large, fast, memory Caching assumptions Benefits small working set: 90/10 rule can predict future: spatial & temporal locality Benefits big & fast memory built from (big & slow) + (small & fast) Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate Direct mapped  fast, low hit rate, low hit rate Fully Associative  higher hit cost, higher hit rate N-way set associative  middleground Larger block size  lower hit cost, higher miss penalty