Caches & Memory.

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.

Systems I Locality and Caching

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

CMPE 421 Parallel Computer Architecture

ECM534 Advanced Computer Architecture

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

0 High-Performance Computer Architecture Memory Organization Chapter 5 from Quantitative Architecture January 2006.

Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.

1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Memory Architecture Chapter 5 in Hennessy & Patterson.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Deniz Altinbuken CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: (except writes) Caches and Memory.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

CMSC 611: Advanced Computer Architecture

COSC3330 Computer Architecture

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

CSE 351 Section 9 3/1/12.

Cache Memory and Performance

The Goal: illusion of large, fast, cheap memory

CPU cache Acknowledgment

Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

How will execution time grow with SIZE?

Memory hierarchy.

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013

CSCI206 - Computer Organization & Programming

Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Cache Memories September 30, 2008

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science

CPE 631 Lecture 05: Cache Design

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

10/16: Lecture Topics Memory problem Memory Solution: Caches Locality

Han Wang CS 3410, Spring 2012 Computer Science Cornell University

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?

CS 3410, Spring 2014 Computer Science Cornell University

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Fundamentals of Computing: Computer Architecture

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Caches & Memory

Programs 101 Load/Store Architectures: C Code RISC Assembly int main (int argc, char* argv[ ]) { int i; int m = n; int sum = 0; for (i = 1; i <= m; i++) { sum += i; } printf (“...”, n, sum); main: addi sp,sp,-48 sw x1,44(sp) sw fp,40(sp) move fp,sp sw x10,-36(fp) sw x11,-40(fp) la x15,n lw x15,0(x15) sw x15,-28(fp) sw x0,-24(fp) li x15,1 sw x15,-20(fp) L2: lw x14,-20(fp) lw x15,-28(fp) blt x15,x14,L3 . . . Load/Store Architectures: Read data from memory (put in registers) Manipulate it Store it back to memory Instructions that read from or write to memory…

Programs 101 Load/Store Architectures: C Code RISC Assembly int main (int argc, char* argv[ ]) { int i; int m = n; int sum = 0; for (i = 1; i <= m; i++) { sum += i; } printf (“...”, n, sum); main: addi sp,sp,-48 sw ra,44(sp) sw fp,40(sp) move fp,sp sw a0,-36(fp) sw a1,-40(fp) la a5,n lw a5,0(x15) sw a5,-28(fp) sw x0,-24(fp) li a5,1 sw a5,-20(fp) L2: lw a4,-20(fp) lw a5,-28(fp) blt a5,a4,L3 . . . Load/Store Architectures: Read data from memory (put in registers) Manipulate it Store it back to memory Instructions that read from or write to memory…

1 Cycle Per Stage: the Biggest Lie (So Far) Code Stored in Memory (also, data and stack) Write- Back Memory Instruction Fetch Execute Instruction Decode extend register file control ALU memory din dout addr PC new pc inst IF/ID ID/EX EX/MEM MEM/WB imm B A ctrl D M compute jump/branch targets +4 forward unit detect hazard Stack, Data, Code Stored in Memory

What’s the problem? CPU Main Memory + big – slow – far away SandyBridge Motherboard, 2011 http://news.softpedia.com

What’s the solution? Caches ! Intel Pentium 3, 1999 Level 2 $ Insn $ Data $ Insn $ Intel Pentium 3, 1999

Locality of Data If you ask for something, you’re likely to ask for: the same thing again soon  Temporal Locality something near that thing, soon  Spatial Locality total = 0; for (i = 0; i < n; i++) total += a[i]; return total;

Locality of Data This highlights the temporal and spatial locality of data. Q1: Which line of code exhibits good temporal locality? 1 total = 0; 2 for (i = 0; i < n; i++) { 3 n--; 4 total += a[i]; 5 return total; 1 2 3 4 5

Locality examples Last Called Speed Dial Favorites Contacts Google/Facebook/email

Locality

The Memory Hierarchy Small, Fast Big, Slow Disk L2 Cache L3 Cache 1 cycle, 128 bytes Registers 4 cycles, 64 KB L1 Caches 12 cycles, 256 KB L2 Cache L3 Cache 36 cycles, 2-20 MB 50-70 ns, 512 MB – 4 GB Big, Slow Main Memory 5-20 ms 16GB – 4 TB, Disk Intel Haswell Processor, 2013

Some Terminology Cache hit Cache miss data is in the Cache thit : time it takes to access the cache Hit rate (%hit): # cache hits / # cache accesses Cache miss data is not in the Cache tmiss : time it takes to get the data from below the $ Miss rate (%miss): # cache misses / # cache accesses Cacheline or cacheblock or simply line or block Minimum unit of info that is present/or not in the cache

tavg = thit + %miss* tmiss The Memory Hierarchy 1 cycle, 128 bytes average access time tavg = thit + %miss* tmiss = 4 + 5% x 100 = 9 cycles Registers 4 cycles, 64 KB L1 Caches 12 cycles, 256 KB L2 Cache L3 Cache 36 cycles, 2-20 MB 50-70 ns, 512 MB – 4 GB Main Memory 5-20 ms 16GB – 4 TB, Disk Intel Haswell Processor, 2013

Single Core Memory Hierarchy ON CHIP Disk Processor Regs I$ D$ L2 Main Memory Registers L1 Caches L2 Cache L3 Cache Main Memory Disk

Multi-Core Memory Hierarchy ON CHIP Processor Regs I$ D$ L2 Processor Regs I$ D$ L2 Processor Regs I$ D$ L2 Processor Regs I$ D$ L2 L3 Main Memory Disk

Memory Hierarchy by the Numbers CPU clock rates ~0.33ns – 2ns (3GHz-500MHz) *Registers,D-Flip Flops: 10-100’s of registers Memory technology Transistor count* Access time Access time in cycles $ per GIB in 2012 Capacity SRAM (on chip) 6-8 transistors 0.5-2.5 ns 1-3 cycles $4k 256 KB (off chip) 1.5-30 ns 5-15 cycles 32 MB DRAM 1 transistor (needs refresh) 50-70 ns 150-200 cycles $10-$20 8 GB SSD (Flash) 5k-50k ns Tens of thousands $0.75-$1 512 GB Disk 5M-20M ns Millions $0.05-$0.1 4 TB

Basic Cache Design Byte-addressable memory data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q 16 Byte Memory load 1100  r1 Byte-addressable memory 4 address bits  16 bytes total b addr bits  2b bytes in memory

4-Byte, Direct Mapped Cache MEMORY addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q CACHE index XXXX index 00 01 10 11 data A B C D Cache entry = row = (cache) line = (cache) block Block Size: 1 byte Direct mapped: Each address maps to 1 cache block 4 entries  2 index bits (2n  n bits) Index with LSB: Supports spatial locality

4-Byte, Direct Mapped Cache MEMORY 4-Byte, Direct Mapped Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 data A B C D tag 00 Tag: minimalist label/address address = tag + index

4-Byte, Direct Mapped Cache MEMORY 4-Byte, Direct Mapped Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q CACHE index 00 01 10 11 V tag data 00 X One last tweak: valid bit

Simulation #1 of a 4-byte, DM Cache MEMORY Simulation #1 of a 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 11 X load 1100 Lookup: Index into $ Check tag Check valid bit Miss

Simulation #1 of a 4-byte, DM Cache MEMORY Simulation #1 of a 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 1 11 N xx X load 1100 Lookup: Index into $ Check tag Check valid bit Miss

Simulation #1 of a 4-byte, DM Cache MEMORY Simulation #1 of a 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 1 11 N xx X load 1100 ... Lookup: Index into $ Check tag Check valid bit Miss Hit!

Terminology again Cache hit Cache miss data is in the Cache thit : time it takes to access the cache Hit rate (%hit): # cache hits / # cache accesses Cache miss data is not in the Cache tmiss : time it takes to get the data from below the $ Miss rate (%miss): # cache misses / # cache accesses Cacheline or cacheblock or simply line or block Minimum unit of info that is present/or not in the cache

Average memory-access time (AMAT) There are two expressions for Average memory-access time (AMAT): AMAT = %hit * hit time + %miss * miss time where miss time = (hit time + miss penalty) AMAT = %hit * hit time + %miss * (hit time + miss penalty) AMAT = %hit * hit time + %miss * hit time + %miss * miss penalty AMAT = hit time * (%hit + %miss) + % miss * miss penalty AMAT = hit time + % miss * miss penalty

Questions Processor A is a 1 GHz processor (1 cycle = 1 ns). There is an L1 cache with a 1 cycle access time and a 50% hit rate. There is an L2 cache with a 10 cycle access time and a 90% hit rate. It takes 100 ns to access memory. Q1: What is the average memory access time of Processor A in ns? A: 5 B: 7 C: 9 D: 11 E: 13 Proc. B attempts to improve upon Proc. A by making the L1 cache larger. The new hit rate is 90%. To maintain a 1 cycle access time, Proc. B runs at 500 MHz (1 cycle = 2 ns). Q2: What is the average memory access time of Processor B in ns? A: 5 B: 7 C: 9 D: 11 E: 13 Q3: Which processor should you buy? A: Processor A B: Processor B C: They are equally good.

Q1: 1 + 50% (10 + 10% x 100) = 1 + 5 + 5 = 11 ns Q2: 2 + 10% (20 + 10% x 100) = 2 + 2 + 1 = 5 ns Processor A. Although memory access times are over 2x faster on Processor B, the frequency of Processor B is 2x slower. Since most workloads are not more than 50% memory instructions, Processor A is still likely to be the faster processor for most workloads. (If I did have workloads that were over 50% memory instructions, Processor B might be the better choice.)

Block Diagram 4-entry, direct mapped Cache tag|index 1101 CACHE 2 2 V tag data 1 00 1111 0000 11 1010 0101 01 1010 1010 0000 0000 2 data 8 1010 0101 = Hit!

Simulation #2: 4-byte, DM Cache A) Hit B) Miss Lookup: Index into $ MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q A) Hit B) Miss CACHE index 00 01 10 11 V tag data 11 X load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss 29

Simulation #2: 4-byte, DM Cache MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q CACHE index 00 01 10 11 V tag data 1 11 N X load 1100 load 1101 load 0100 Miss Lookup: Index into $ Check tag Check valid bit

Simulation #2: 4-byte, DM Cache Clicker: A) Hit B) Miss Lookup: MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q Clicker: A) Hit B) Miss CACHE index 00 01 10 11 V tag data 1 11 N X load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss 31

Simulation #2: 4-byte, DM Cache MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 1 11 N O X load 1100 load 1101 load 0100 Miss Lookup: Index into $ Check tag Check valid bit Miss

Simulation #2: 4-byte, DM Cache Clicker: A) Hit B) Miss Lookup: MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q Clicker: A) Hit B) Miss CACHE index 00 01 10 11 V tag data 1 11 N O X load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Miss Miss 33

Simulation #2: 4-byte, DM Cache MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 1 01 E 11 O X load 1100 load 1101 load 0100 Miss Lookup: Index into $ Check tag Check valid bit Miss Miss

Simulation #2: 4-byte, DM Cache Clicker: A) Hit B) Miss Lookup: MEMORY Simulation #2: 4-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q Clicker: A) Hit B) Miss CACHE index 00 01 10 11 V tag data 1 01 E 11 O X load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Miss Miss Miss

Simulation #2: 4-byte, DM Cache MEMORY tag|index XXXX CACHE load 1100 addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q tag|index XXXX CACHE index 00 01 10 11 V tag data 1 11 N O X load 1100 load 1101 load 0100 Miss Miss Miss Miss 36

Increasing Block Size Block Size: 2 bytes MEMORY Increasing Block Size addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q CACHE offset XXXX index 00 01 10 11 V tag data x A | B C | D E | F G | H Block Size: 2 bytes Block Offset: least significant bits indicate where you live in the block Which bits are the index? tag?

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X 1 N | O load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X 1 N | O load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Hit!

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X 1 N | O load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Hit! Miss

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X 1 E | F load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Hit! Miss

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q index CACHE tag| |offset XXXX index 00 01 10 11 V tag data x X | X 1 E | F load 1100 load 1101 load 0100 Lookup: Index into $ Check tag Check valid bit Miss Hit! Miss Miss

Simulation #3: 8-byte, DM Cache MEMORY Simulation #3: 8-byte, DM Cache addr data 0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 J 1001 K 1010 L 1011 M 1100 N 1101 O 1110 P 1111 Q CACHE index 00 01 10 11 V tag data x X | X 1 E | F load 1100 load 1101 load 0100 Miss cold conflict 1 hit, 3 misses 3 bytes don’t fit in a 4 entry cache? Hit! Miss Miss