/ Computer Architecture and Design

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Computer Organization & Programming

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

CSCI206 - Computer Organization & Programming

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

COSC3330 Computer Architecture

Soner Onder Michigan Technological University

COSC3330 Computer Architecture

CSE 351 Section 9 3/1/12.

CS161 – Design and Architecture of Computer

The Goal: illusion of large, fast, cheap memory

Lecture 12 Virtual Memory.

CSC 4250 Computer Architectures

Multilevel Memories (Improving performance using alittle “cash”)

Basic Performance Parameters in Computer Architecture:

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

/ Computer Architecture and Design

Lecture 21: Memory Hierarchy

Lecture 21: Memory Hierarchy

EECE.4810/EECE.5730 Operating Systems

Lecture 22: Cache Hierarchies, Memory

CPE 631 Lecture 05: Cache Design

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Adapted from slides by Sally McKee Cornell University

CMSC 611: Advanced Computer Architecture

ECE232: Hardware Organization and Design

CS 704 Advanced Computer Architecture

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

/ Computer Architecture and Design

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

Lecture 22: Cache Hierarchies, Memory

CS 3410, Spring 2014 Computer Science Cornell University

Lecture 21: Memory Hierarchy

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Cache Memory Rabi Mahapatra

Lecture 7 Memory Hierarchy and Cache Design

Principle of Locality: Memory Hierarchies

Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )

10/18: Lecture Topics Using spatial locality

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

16.482 / 16.561 Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2015 Lecture 8: Memory hierarchies

Computer Architecture Lecture 8 Lecture outline Announcements/reminders HW 6 due today HW 7 to be posted; due 6/19 Review Multiple issue and multithreading Today’s lecture Memory hierarchies 2/17/2019 Computer Architecture Lecture 8

Review: TLP, multithreading ILP (which is implicit) limited on realistic hardware for single thread of execution Could look at explicit parallelism: Thread-Level Parallelism (TLP) or Data-Level Parallelism (DLP) Focus on TLP and multithreading Coarse-grained: switch threads on a long stall Fine-grained: switch threads every cycle Simultaneous: allow multiple threads to execute in same cycle 2/17/2019 Computer Architecture Lecture 8

Motivating memory hierarchies Why do we need out-of-order scheduling, TLP, etc? Data dependences cause stalls The longer it takes to satisfy a dependence, the longer the stall and the harder we have to work to find independent instructions to execute Major source of stall cycles: memory accesses 2/17/2019 Computer Architecture Lecture 8

Why care about memory hierarchies? Processor-Memory Performance Gap Growing Y-axis is performance X-axis is time Latency Cliché: Not e that x86 didn’t have cache on chip until 1989 Major source of stall cycles: memory accesses 2/17/2019 Computer Architecture Lecture 8

Motivating memory hierarchies What characteristics would we like memory to have? High capacity Low latency Low cost Can’t satisfy these requirements with one memory technology Solution: use a little bit of everything! Small SRAM array(s)  caches Small means fast and cheap Typically at least 2 levels (often 3) of cache on chip Larger DRAM array  main memory or RAM Hope you rarely have to use it Extremely large disk Costs are decreasing at a faster rate than we fill them 2/17/2019 Computer Architecture Lecture 8

Memory hierarchy analogy Looking for something to eat First step: check the refrigerator Find item  eat! Latency = 1 minute Second step: go to the store Find item  purchase it, take it home, eat! Latency = 20-30 minutes Third step: grow food Plant food, wait … wait some more … harvest, eat Latency = ~250,000 minutes (~6 months) 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Terminology Find data you want at a given level: hit Data is not present at that level: miss In this case, check the next lower level Hit rate: Fraction of accesses that hit at a given level (1 – hit rate) = miss rate Another performance measure: average memory access time AMAT = (hit time) + (miss rate) x (miss penalty) 2/17/2019 Computer Architecture Lecture 8

Average memory access time Given the following: Cache: 1 cycle access time Memory: 100 cycle access time Disk: 10,000 cycle access time What is the average memory access time if the cache hit rate is 90% and the memory hit rate is 80%? 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 AMAT example solution Key point: miss penalty = AMATnext level Therefore: AMAT = (HT) + (MR)(MP) = 1 + (0.1)(AMATmemory) = 1 + (0.1)[100 + (0.2)(AMATdisk)] = 1 + (0.1)[100 + (0.2)(10,000)] = 1 + (0.1)[100 + 2000] = 1 + (0.1)[2100] = 1 + 210 = 211 cycles 2/17/2019 Computer Architecture Lecture 8

Memory hierarchy operation We’d like most accesses to use the cache Fastest level of the hierarchy But, the cache is much smaller than the address space Most caches have a hit rate > 80% How is that possible? Cache holds data most likely to be accessed 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Principle of locality Programs don’t access data randomly—they display locality in two forms Temporal locality: if you access a memory location (e.g., 1000), you are more likely to re-access that location than some random location Spatial locality: if you access a memory location (e.g., 1000), you are more likely to access a location near it (e.g., 1001) than some random location 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Basic cache design Each cache line consists of two main parts Tag: Upper bits of memory address Bits that uniquely identify a given block Check tag on each access to determine hit or miss Block: Actual data Block sizes can vary—typically 32 to 128 bytes Larger blocks exploit spatial locality Status bits Valid bit: Indicates if line holds valid data We’ll discuss dirty bit today; other bits in multiprocessors Valid Dirty Tag Block 1 0x1000 0x00 0x12 0x34 0x56 0x2234 0xAB 0xCD 0x87 0xA9 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Cache organization Cache consists of multiple tag/block pairs, called cache lines Can search lines in parallel (within reason) Each line also has a valid bit Write-back caches have a dirty bit Note that block sizes can vary Most systems use between 32 and 128 bytes Larger blocks exploit spatial locality Larger block size  smaller tag size 2/17/2019 Computer Architecture Lecture 8

4 Questions for Memory Hierarchy Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy) 2/17/2019 Computer Architecture Lecture 8

Q1: Block placement (cont.) Three alternatives Place block anywhere in cache: fully associative Impractical for anything but very small caches Place block in a single location: direct mapped Line # = (Block #) mod (# lines) = index field of address Block # = (Address) / (block size) In other words, all bits of address except offset Place block in a restricted set of locations: set associative n-way set associative cache has n lines per set Set # = (Block #) mod (# sets) = index field of address 2/17/2019 Computer Architecture Lecture 8

Block placement example Given: A cache with 16 lines numbered 0-15 Main memory that holds 2048 blocks of the same size as each cache block Determine which cache line(s) will be used for each of the memory blocks below, if the cache is (i) direct-mapped, or (ii) 4-way set associative Block 0 Block 13 Block 249 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Solution Cache line # = (block #) mod (# sets) “mod” = remainder of division (e.g., 5 mod 3 = 2) Direct-mapped cache: 1 line per “set” Block 0  line (0 mod 16) = 0 Block 13  line (13 mod 16) = 13 Block 249  line (249 mod 16) = 9 249 / 16 = 15 R9 4-way SA cache: 4 lines per set  4 total sets Block 0  set (0 mod 4) = lines 0-3 Block 13  set (13 mod 4) = set 1 = lines 4-7 Block 249  set (249 mod 4) = set 1 = lines 4-7 2/17/2019 Computer Architecture Lecture 8

Q2: Block identification Tag on each block No need to check index or block offset Increasing associativity shrinks index, expands tag Block Offset Block Address Index Tag 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Address breakdown Can address single byte (smallest addressable unit) If addressing multiple bytes, address is lowest in word/half-word E.g., word containing bytes 0-3 accessed by address 0 Block offset: byte address within block # block offset bits = log2(block size) Index: line (or set) number within cache # index bits = log2(# of cache lines or sets) # of cache sets = Tag: remaining bits Tag Index Block offset 2/17/2019 Computer Architecture Lecture 8

Mapping memory to cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Example: 8-byte cache 2-byte blocks 16-byte memory 29 tag data 120 1 2 3 123 71 150 162 173 18 21 33 Address: 28 19 tag index block offset 200 1 bit 2 bits 1 bit 210 225 2/17/2019 Computer Architecture Lecture 8

Mapping memory example Say we have the following cache organization Direct-mapped 4-byte blocks 32 B cache 6-bit memory addresses Given the addresses below, answer the following questions Which addresses belong to the exact same block? i.e., both tag and index are equal Which addresses map to the same cache line, but are part of different blocks? i.e., index numbers are equal, but tags are different Addresses: 3, 4, 6, 11, 37, 43 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Solution Figure out size of each field 4 = 22 bytes per block  2-bit offset 32 / (4 x 1) = 8 = 23 cache lines  3-bit index 6-bit address, 5 bits for offset & index  1-bit tag Convert addresses into binary 3 = 0 000 11 4 = 0 001 00 6 = 0 001 10 11 = 0 010 11 37 = 1 001 01 43 = 1 010 11 Addresses 4 & 6 are part of same block Index = 001, tag = 0 Address 37 maps to same cache line as 4 & 6, but different block Index = 001, tag = 1 Addresses 11 & 43 map to same cache line, but different blocks Index = 010 for both Tag = 0 for 11, 1 for 43 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Q3: Block replacement On cache miss, bring requested data into cache If line contains valid data, that data is evicted When we need to evict a line, what do we choose? Easy choice for direct-mapped—only one possibility! For set-associative or fully-associative, choose least recently used (LRU) line Want to choose data that is least likely to be used next Temporal locality suggests that’s the line that was accessed farthest in the past 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example Assume the following simple setup Only 2 levels to hierarchy 16-byte memory  4-bit addresses Cache organization Direct-mapped 8 total bytes 2 bytes per block  4 lines Write-back cache Leads to the following address breakdown: Offset: 1 bit Index: 2 bits Tag: 1 bit 2/17/2019 Computer Architecture Lecture 8

Direct-mapped example (cont.) We’ll use the following sequence of accesses lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: initial state Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Registers: $t0 = ?, $t1 = ? Cache V D Tag Data 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #1 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 1 = 00012 Tag = 0 Index = 00 Offset = 1 Registers: $t0 = ?, $t1 = ? Cache V D Tag Data Hits: 0 Misses: 0 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #1 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 1 = 00012 Tag = 0 Index = 00 Offset = 1 Registers: $t0 = 29, $t1 = ? Cache V D Tag Data 1 78 29 Hits: 0 Misses: 1 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #2 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 8 = 10002 Tag = 1 Index = 00 Offset = 0 Registers: $t0 = 29, $t1 = ? Cache V D Tag Data 1 78 29 Hits: 0 Misses: 1 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #2 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 8 = 10002 Tag = 1 Index = 00 Offset = 0 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 Hits: 0 Misses: 2 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #3 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 4 = 01002 Tag = 0 Index = 10 Offset = 0 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 Hits: 0 Misses: 2 2/17/2019 Computer Architecture Lecture 8

Q4: What happens on a write? Write-Through Write-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache Debug Easy Hard Do read misses produce writes? No Yes Do repeated writes make it to lower level? 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Write policy example Say we have the following code sequence: lw $t0, 0($t1) addi $t2, $t0, 5 sw $t2, 4($t1) addi $t2, $t2, 10 sw $t2, 8($t1) addi $t2, $t2, 20 sw $t2, 12($t1) Assume all addresses map to the same cache block Assume this initial cache line state (each block entry is 1 word, not 1 byte, shown in decimal) Valid Dirty Tag Block 1 1000 100 2/17/2019 Computer Architecture Lecture 8

Write policy example (cont.) In write-through cache, no need for dirty bit Every single write (store instruction) causes entire block to be written to memory After first store, cache line: After second store: After third store Valid Tag Block 1 1000 100 105 Valid Tag Block 1 1000 100 105 115 Valid Tag Block 1 1000 100 105 115 135 2/17/2019 Computer Architecture Lecture 8

Write policy example (cont.) Write-through cache produces 3 memory writes Not strictly necessary—as long as data remains in cache, most up-to-date value is available at top of hierarchy Accesses never go to memory until block is evicted In write-back cache, only write data to memory on eviction Check dirty bit to see if write back is needed Bit is set on every write to block Bit is cleared when block is evicted and new data brought into line After first store: Second, third store proceed in same way Difference is that no writes to memory occur in this sequence! Valid Dirty Tag Block 1 1000 100 105 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #3 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 4 = 01002 Tag = 0 Index = 10 Offset = 0 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 150 Hits: 0 Misses: 3 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #4 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 71 5 150 6 162 7 173 8 18 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 13 = 11012 Tag = 1 Index = 10 Offset = 1 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 150 Hits: 0 Misses: 3 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #4 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 13 = 11012 Tag = 1 Index = 10 Offset = 1 Must write back dirty block Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 150 Hits: 0 Misses: 4 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #4 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 13 = 11012 Tag = 1 Index = 10 Offset = 1 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 19 29 Hits: 0 Misses: 4 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #5 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 9 = 10012 Tag = 1 Index = 00 Offset = 1 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 19 29 Hits: 0 Misses: 4 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #5 Instructions: lb $t0, 1($zero) lb $t1, 8($zero) sb $t1, 4($zero) sb $t0, 13($zero) lb $t1, 9($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 9 = 10012 Tag = 1 Index = 00 Offset = 1 Registers: $t0 = 29, $t1 = 21 Cache V D Tag Data 1 18 21 19 29 Hits: 1 Misses: 4 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Additional examples Given the final cache state above, determine the new cache state after the following three accesses: lb $t1, 3($zero) sb $t0, 2($zero) lb $t0, 11($zero) 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #6 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 3 = 00112 Tag = 0 Index = 01 Offset = 1 Registers: $t0 = 29, $t1 = 18 Cache V D Tag Data 1 18 21 19 29 Hits: 1 Misses: 4 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #6 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 3 = 00112 Tag = 0 Index = 01 Offset = 1 Registers: $t0 = 29, $t1 = 123 Cache V D Tag Data 1 18 21 120 123 19 29 Hits: 1 Misses: 5 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #7 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 2 = 00102 Tag = 0 Index = 01 Offset = 0 Registers: $t0 = 29, $t1 = 123 Cache V D Tag Data 1 18 21 120 123 19 29 Hits: 1 Misses: 5 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #7 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 2 = 00102 Tag = 0 Index = 01 Offset = 0 Registers: $t0 = 29, $t1 = 123 Cache V D Tag Data 1 18 21 29 123 19 Hits: 2 Misses: 5 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #8 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 120 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 11 = 10112 Tag = 1 Index = 01 Offset = 1 Registers: $t0 = 29, $t1 = 123 Cache V D Tag Data 1 18 21 29 123 19 Hits: 2 Misses: 5 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #8 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 11 = 10112 Tag = 1 Index = 01 Offset = 1 Must write back dirty block Registers: $t0 = 29, $t1 = 123 Cache V D Tag Data 1 18 21 29 123 19 Hits: 2 Misses: 5 2/17/2019 Computer Architecture Lecture 8

Direct-mapped cache example: access #8 Instructions: lb $t1, 3($zero) sb $t0, 2($zero) lb $t1, 11($zero) Memory 78 1 29 2 3 123 4 18 5 150 6 162 7 173 8 9 21 10 33 11 28 12 19 13 200 14 210 15 225 Address = 11 = 10112 Tag = 1 Index = 01 Offset = 1 Registers: $t0 = 29, $t1 = 28 Cache V D Tag Data 1 18 21 33 28 19 29 Hits: 2 Misses: 6 2/17/2019 Computer Architecture Lecture 8

Computer Architecture Lecture 8 Final notes Next time Set associative caches Cache optimizations Virtual memory Reminders HW 6 due today HW 7 to be posted; due 6/19 2/17/2019 Computer Architecture Lecture 8