Dr. Hadi AL Saadi Faculty Of Information Technology

Slides:

Advertisements

Similar presentations

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

CMPE 421 Parallel Computer Architecture

EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.

Lecture 19 Today’s topics Types of memory Memory hierarchy.

EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Computer Organization & Programming

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

CS2100 Computer Organisation Cache I (AY2015/6) Semester 1.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

Lecture 20 Last lecture: Today’s lecture: Types of memory

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

CMSC 611: Advanced Computer Architecture

CS 704 Advanced Computer Architecture

COSC3330 Computer Architecture

Computer Organization

Soner Onder Michigan Technological University

CSE 351 Section 9 3/1/12.

Yu-Lun Kuo Computer Sciences and Information Engineering

The Goal: illusion of large, fast, cheap memory

Improving Memory Access 1/3 The Cache and Virtual Memory

How will execution time grow with SIZE?

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

Lecture 08: Memory Hierarchy Cache Performance

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Adapted from slides by Sally McKee Cornell University

CMSC 611: Advanced Computer Architecture

How can we find data in the cache?

Memory Organization.

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

CS 704 Advanced Computer Architecture

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

CS-447– Computer Architecture Lecture 20 Cache Memories

Memory ICS 233 Computer Architecture and Assembly Language

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache Memory Rabi Mahapatra

Cache Memory and Performance

Memory & Cache.

Memory Principles.

Presentation transcript:

Computer Architecture Introduction to Caches Introduction to Caches Memory Dr. Hadi AL Saadi Faculty Of Information Technology University of Petra 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Random Access Memory Large arrays of storage cells Volatile memory Hold the stored data as long as it is powered on Random Access Access time is practically the same to any data on a RAM chip Output Enable (OE) control signal Specifies read operation Write Enable (WE) control signal Specifies write operation 2n × m RAM chip: n-bit address and m-bit data RAM Address Data OE WE n m 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Data Transfer: The Big Picture Memory Computer Processor Control Devices Input Output Datapath + Registers Store to memory Load from memory Registers are in the datapath of the processor. If operands are in memory we have to load them to processor (registers), operate on them, and store them back to memory 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Memory Technology Today: DRAM DDR SDRAM Double Data Rate – Synchronous Dynamic RAM The dominant memory technology in PC market Delivers memory on the positive and negative edge of a clock (double rate) Generations: DDR (MemClkFreq x 2(double rate) x 8 words) DDR2 (MemClkFreq x 2(multiplier) x 2 x 8 words) DDR3 (MemClkFreq x 4(multiplier) x 2 x 8 words) DDR4 (due 2014) 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture DRAM Capacity Growth Unprecedented growth in density, but we still have a problem 4/14/2019 Dr. Hadi Hassan Computer architecture

Static RAM Storage Cell Static RAM (SRAM): fast but expensive RAM 6-Transistor cell Typically used for caches Provides Fast access latency of 0.5 – 5 ns Cell Implementation: Cross-coupled inverters store bit Two pass transistors Row decoder selects the word line Pass transistors enable the cell to be read and written Typical SRAM Cell 4/14/2019 Dr. Hadi Hassan Computer architecture

Dynamic RAM Storage Cell Dynamic RAM (DRAM): slow ( access latency of 50-70ns) , cheap, and dense memory Typical choice for main memory Cell Implementation: 1-Transistor cell (pass transistor) Trench capacitor (stores bit) Bit is stored as a charge on capacitor Must be refreshed periodically Because of leakage of charge from tiny capacitor Refreshing for all memory rows Reading each row and writing it back to restore the charge 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Slow Memory Technologies: Magnetic Disk Typical high-end hard disk: Average Latency: 4 - 10 ms Capacity: 500-2000GB 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Quality vs Quantity Memory (DRAM) Devices Input Output Processor Control Datapath Registers (Harddisk) Capacity Latency Cost/GB Register 100s Bytes 20 ps $$$$ SRAM 100s KB 0.5-5 ns $$$ DRAM 100s MB 50-70 ns $ Hard Disk 100s GB 5-20 ms Cents Ideal 1 GB 1 ns Cheap 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Best of Both Worlds What we want: A BIG and FAST memory Memory system should perform like 1GB of SRAM (1ns access time) but cost like 1GB of slow memory Key concept: Use a hierarchy of memory technologies: Small but fast memory near CPU Large but slow memory farther away from CPU 4/14/2019 Dr. Hadi Hassan Computer architecture

The Need for Cache Memory Widening speed gap between CPU and main memory Processor operation takes less than 1 ns Main memory requires about 100 ns to access Each instruction involves at least one memory access One memory access to fetch the instruction A second memory access for load and store instructions Memory bandwidth limits the instruction execution rate Cache memory can help bridge the CPU-memory gap Cache memory is small in size but fast 4/14/2019 Dr. Hadi Hassan Computer architecture

Typical Memory Hierarchy Registers are at the top of the hierarchy Typical size < 1 KB Access time < 0.5 ns Level 1 Cache (8 – 64 KB) Access time: 1 ns L2 Cache (512KB – 8MB) Access time: 3 – 10 ns Main Memory (4 – 16 GB) Access time: 50 – 100 ns Disk Storage (> 200 GB) Access time: 5 – 10 ms Microprocessor Registers L1 Cache L2 Cache Main Memory Magnetic or Flash Disk Memory Bus I/O Bus Faster Bigger 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Cache: The Basic Idea Keep the frequently and recently used data in smaller but faster memory Refer to bigger and slower memory: Only when you cannot find data/instruction in the faster memory Why does it work? Principle of Locality Program accesses only a small portion of the memory address space within a small time interval 4/14/2019 Dr. Hadi Hassan Computer architecture

Principle of Locality of Reference Programs access small portion of their address space At any time, only a small set of instructions & data is needed Temporal Locality (in time) If an item is accessed, probably it will be accessed again soon Same loop instructions are fetched each iteration Same procedure may be called and executed many times Spatial Locality (in space) Tendency to access contiguous instructions/data in memory Sequential execution of Instructions Traversing arrays element by element Loop:lw $t0, 0($s1) add $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $0, Loop sub $sp, $sp, 16 sw $ra, 0($sp) sw $s0, 4($sp) sw $a0, 8($sp) sw $a1, 12($sp) 4/14/2019 Dr. Hadi Hassan Computer architecture

Four Basic Questions on Caches Q1: Where can a block be placed in a cache? Block placement Direct Mapped, Set Associative, Fully Associative Q2: How is a block found in a cache? Block identification Block address, tag, index Q3: Which block should be replaced on a miss? Block replacement FIFO, Random, LRU Q4: What happens on a write? Write strategy Write Back or Write Through (with Write Buffer) 4/14/2019 Dr. Hadi Hassan Computer architecture

BASICS OF MEMORY HIERARCHIES 8-byte blocks Address Cache memory is subdivided into cache lines (Blocks) Cache Lines / Blocks: The smallest unit of memory that can be transferred between the main memory and the cache. Block size is typically one or more words e.g.: 16-byte block  4-word block 32-byte block  8-word block if the Block size= 2N bytes byte offset within a block=log22N = N ..00000 ..00001 Word0 ..00010 ..00011 Block0 ..00100 ..00101 Word1 ..00110 ..00111 ..01000 ..01001 Word2 ..01010 Block ..01011 Block1 Word ..01100 Byte ..01101 Word3 ..01110 ..01111 Cache Memory 4/14/2019 Dr. Hadi Hassan Computer architecture

BASICS OF MEMORY HIERARCHIES Where to place blocks in the cache? Direct-mapped cache One block per set The block is placed at the same one location Number of Blocks in Cache = cache size/Block size Cache block #(Index) =(block address) modulo (# of blocks in the cache) Fully-associative cache Has one set A block can go any where in the cache Set-associative cache A set is a group of blocks A block is placed within a set n blocks in a set  n-way set associative Cache Set # = Block Address MOD #Sets 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Direct Mapped Cache: Cache Index For example, on the right is a 16-byte main memory and a 4-byte cache (four 1-byte blocks). Memory locations 0, 4, 8 and 12 all map to cache block 0. Addresses 1, 5, 9 and 13 map to cache block 1, etc. How can we compute this mapping? For instance, with the four-block cache here, address 14 would map to cache block 2. Cache block Index=(block address) Mod (# blocks in the cache) 14 mod 4 = 2 With our four-byte cache we would inspect the two least significant bits of our memory addresses Again, you can see that address 14 (1110 in binary) maps to cache block 2 (10 in binary). Taking the least k bits of a binary value is the same as computing that value mod 2k. Memory Address 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Index 1 2 3 Cache Memory Main Memory 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example: Given the following information's Cache size = 32 KB Block size = 32 Bytes Byte address = 8160 Find the cache block numbers ( index ) of the above byte Cache block #(Index) =(block address) modulo (# of blocks in the cache) # Blocks in Cache = cache size/Block size=32*1024/32=1024 Blocks Block address = Byte address/#bytes = 8160/32 = 255 Cache block # =255 mod 1024=255 4/14/2019 Dr. Hadi Hassan Computer architecture

BASICS OF MEMORY HIERARCHIES How to find a block in the cache? We need to add tags to the cache, which supply the rest of the address bits to let us distinguish between different memory locations that map to the same cache block. Tag : part of the memory address. Checked to know if the block contains the required information Tag number: Tag = Floor[Block number / Number of Cache Blocks] 00 01 10 11 Index Tag Data 00 + 00 = 0000 11 + 01 = 1101 01 + 10 = 0110 01 + 11 = 0111 Main memory address in cache block 4/14/2019 Dr. Hadi Hassan Computer architecture

Direct Mapped Cache: Mapping Memory Address 31 N-1 N Block Number Offset Cache Block size = 2N bytes N+M-1 Memory Address 31 N-1 N Tag Offset Index Cache Block size = 2N bytes Number of Blocks in cache = 2M Offset = N bits Index = M bits Tag = 32 – (N + M) bits 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example: Show cache addressing for a byte-addressable memory with 32-bit addresses. Cache line(Block) W = 16 Byte. Cache size L = 4096 lines(Blocks) (64 KB). Solution Byte offset in line is log216 = 4 bit. Cache line index is log24096 = 12 bit. This leaves 32 – 12 – 4 = 16 bit for the tag. 4/14/2019 Dr. Hadi Hassan Computer architecture

Mapping an Address to a Cache Block Example Consider a direct-mapped cache with 256 blocks Block size = 16 bytes Compute tag, index, and byte offset of address: 0x01FFF8AC Solution 32-bit address is divided into: 4-bit byte offset field, because block size = 24 = 16 bytes 8-bit cache index, because there are 28 = 256 blocks in cache 20-bit tag field Byte offset = 0xC = 12 (least significant 4 bits of address) Cache index = 0x8A = 138 (next lower 8 bits of address) Tag = 0x01FFF (upper 20 bits of address) Tag Index offset 4 8 20 Block Address 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture The Valid Bit When started, the cache is empty and does not contain valid data. We should account for this by adding a valid bit for each cache block. When the system is initialized, all the valid bits are set to 0. When data is loaded into a particular cache block, the corresponding valid bit is set to 1 So the cache contains more than just copies of the data in memory; it also has bits to help us find data within the cache and verify its validity. Valid Tag Data Index 00 01 10 11 Index Tag Data 00 + 00 = 0000 Invalid 01 + 11 = 0111 Main memory address in cache block 1 Valid Bit 00 01 10 11 Cache Structure 4/14/2019 Dr. Hadi Hassan Computer architecture

What happens on a cache hit When the CPU tries to read from memory, the address will be sent to a cache controller. The lowest k bits of the address will index a block in the cache. If the block is valid and the tag matches the upper (m - k) bits of the m-bit address, then that data will be sent to the CPU. Here is a diagram of a 32-bit memory address and a 210-byte cache. Cache hit : ( Valid[index] = TRUE ) AND ( Tag[ index ] = Tag[ memory address ] ) 4/14/2019 Dr. Hadi Hassan Computer architecture

Example on Cache Placement and Misses Consider a small direct-mapped cache with 32 blocks Cache is initially empty all Valid bit=0 , Block size = 16 bytes The following memory addresses (in decimal) are referenced: 1000, 1004, 1008, 2548, 2552, 2556. Map addresses to cache blocks and indicate whether hit or miss Solution: 1000 = 0x3E8= 0000 0011 1110 1000 Tag=0x1 cache index = 0x1E valid =0 Miss (first access) 1004 = 0x3EC= 0000 0011 1110 1100 Tag=0x1 cache index = 0x1E valid=1 Hit 1008 = 0x3F0= 0000 0011 1111 0000 Tag=0x1 cache index = 0x1F valid=0 Miss (first access) 2548 = 0x9F4= 0000 1001 1111 0100 Tag=0x4 cache index = 0x1F valid=1Miss (different tag) 2552 = 0x9F8= 0000 1001 1111 1000 Tag=0x4 cache index = 0x1F valid=1 Hit 2556 = 0x9FC= 0000 1001 1111 1100 Tag=0x4 cache index = 0x1F valid =1 Hit Tag Index offset 4 5 23 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example : Cache Miss; Empty Block 4 31 9 Cache Index : Cache Tag 0x0002fe 0x00 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select = Cache Miss 0xxxxxxx 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example 1b: … Read in Data 4 31 9 Cache Index : Cache Tag 0x0002fe 0x00 0x000050 Valid Bit 1 2 3 Cache Data Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select = 0x004440 New Block of data 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example 1c: Cache Hit 4 31 9 Cache Index : Cache Tag 0x000050 0x01 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x08 = Cache Hit 0x0002fe 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example 1d: Cache Miss; Incorrect Block 4 31 9 Cache Index : Cache Tag 0x002450 0x02 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x04 = Cache Miss 0x0002fe 0x004440 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example 1e: … Replace Block 4 31 9 Cache Index : Cache Tag 0x002450 0x02 0x000050 Valid Bit 1 2 3 Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 Byte 992 Byte 1023 Byte Select 0x04 = 0x0002fe New Block of data Cache Miss 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture The size of direct-mapped memory cache • Assuming a 32-bit address, a direct-mapped cache of size 2n words (number of entries) with one-word (4- byte) blocks (data block size) will require a tag field whose size is 32-(n+2) bits, because 2 bits are used for the byte offset and n bits are used for the index. • The total number of bits in a direct-mapped cache is 2n x (block size + tag size + valid field size). • Since the block size is one word (32 bits) and the address size is 32 bits, the number of bits in such a cache is 2n x (32 + (32 - n - 2) + 1) = 2n x (63 – n). Valid Tag Data Index Block0 00 Block1 01 Block2 10 Block3 11 Cache Structure 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example: How many total bits are required for a direct-mapped cache with 64KByte of data and one-word blocks, assuming a 32-bit address? Answer: – We know that 64KB is 64K/4 = 16K words, which is 210 x 24 = 214 words, and with a block size of one word, 214 blocks. – Each block has 32 bits of data plus a tag, which is 32 – 14 – 2 bits, plus a valid bit. – Thus the total cache size is : 214 x (32 + (32 – 14 – 2) + 1) = 214 x 49 = 784 x 210 = 784 Kbits or 784K/8 = 98KByte for a 64-KByte cache. – For this cache, the total number of bits in the cache is over 1.5 times as many needed just for the storage of the data. 4/14/2019 Dr. Hadi Hassan Computer architecture

Fully Associative Cache A block can be placed anywhere in cache  no indexing If m blocks exist then m comparators are needed to match tag Cache data size = m  2b bytes Address Tag offset Data Hit = V Block Data mux 4/14/2019 Dr. Hadi Hassan Computer architecture

Fully Associative Mapped Cache Main memory size = m words Cache memory size = c words Block size = b word No. of memory block=m/b No. of Cache Block= c/ b Tag=log2( No. of Main Mem Block ) Block Offset =log2 b Example: Main Memory size= 4 Gbyte Cache Memory size = 64 Mbyte Block size = 16 bytes No.of Main Mem Block= 4 Gbyte/16 b = 22*230/24 =228 Block Offset bit= log2 16=4 bits Tag=log2(228)= 28 bits Tag Block Offset 4/14/2019 Dr. Hadi Hassan Computer architecture

Set-Associative Cache A set is a group of blocks that can be indexed A block is first mapped onto a set Set index = Block address mod Number of sets in cache If there are m blocks in a set (m-way set associative) then m tags are checked in parallel using m comparators If 2n sets exist then set index consists of n bits Cache data size = m  2n+b bytes (with 2b bytes per block) Without counting tags and valid bits A direct-mapped cache has one block per set (m = 1) A fully-associative cache has one set (2n = 1 or n = 0) 4/14/2019 Dr. Hadi Hassan Computer architecture

Set-Associative Cache Let us assume that we have 8-block cache. This can be arrange as one-way, two-way, four-way, and fully-associative; 4/14/2019 Dr. Hadi Hassan Computer architecture

Set-Associative Cache Diagram Tag Block Data Address Index offset Data = mux Hit 4/14/2019 Dr. Hadi Hassan Computer architecture

Set-Associative Cache Main memory locations - 3 16 19 32 35 48 51 64 67 80 83 96 99 112 115 Valid bits Tags 1 2 bit set index in cache bit word offset in line Tag Word address Set 0 set 1 Read tag and specified word from each Set Com pare 1,Tag Data out Cache miss 1 if equal Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets. 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example Show cache addressing scheme for a byte-addressable memory with 32-bit addresses. Cache line width 2W = 16 B. Set size 2S = 2 lines. Cache size 2L = 4096 lines (64 KB). Solution Byte offset in line is log216 = 4 b. Cache set index is (log24096/2) = 11 b. This leaves 32 – 11 – 4 = 17 b for the tag. 11 - bit set index in cache 4 bit byte offset in line (Block ) Address in cache used to read out two candidate items and their control info 17 bit line tag 32 address Components of the 32-bit address in an example two-way set-associative cache. 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Example A 64 KB four-way set-associative cache is byte-addressable and contains 32 B lines. Memory addresses are 32 b wide. a. How wide are the tags in this cache? b. Which main memory addresses are mapped to set number 5? Solution a. Address (32 b) = 5 b byte offset + 9 b set index + 18 b tag b. Addresses that have their 9-bit set index equal to 5. So the memory addresses are 160-191, 16 554-16 575, . . . 32-bit address Tag Set index Offset 18 bits 9 bits 5 bits Cache size=64 KB=216 Set size = 4  32 B = 128 B= 27 B Number of sets = 216/27 = 29 Line width = 32 B = 25 B Tag width = 32 – 5 – 9 = 18 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 1: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss Set 0 Set 1 SA Block Replacement Rule: replace least recently used block in set 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 2: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 Set 0 Mem[0] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 3: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss miss 8 Set 0 Mem[0] Mem[8] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 3: Mapping: 0 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit Set 0 Mem[0] Mem[8] Set 1 Set 0, Block 0 contains Mem[0] 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 4: Mapping: 6 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[8] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 4: Mapping: 6 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[6] Set 1 Set 0, Block 1 is LRU: overwrite with Mem[6] 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 5: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[0] Mem[6] Set 1 4/14/2019 Dr. Hadi Hassan Computer architecture

Example: Accessing A Set-Associative Cache 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each. Find the # Misses for each cache given this sequence of memory block accesses: 0, 8, 0, 6, 8 SA Memory Access 5: Mapping: 8 mod 2 = 0 Mem Block DM Hit/Miss miss 8 hit 6 Set 0 Mem[8] Mem[6] Set 1 Set 0, Block 0 is LRU: overwrite with Mem[8] 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Which block should be replaced on a miss? The cache cant have all data found in lower level! Direct mapped: no choice! Set and fully associative? Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Random Gives approximately the same performance as LRU for high associativity FIFO Approximates LRU 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Memory Access Time: Terminology Memory Processor Cache X Y Hit: Data is in cache (e.g., X) Hit rate: Fraction of memory accesses that hit Hit time: Time to access cache Miss: Data is not in cache (e.g., Y) Miss rate = 1 – Hit rate Miss penalty: Time to replace cache block + deliver data Hit time < Miss penalty 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Hit Rate and Miss Rate Hit Rate = Hits / (Hits + Misses) Miss Rate = Misses / (Hits + Misses) I-Cache Miss Rate = Miss rate in the Instruction Cache D-Cache Miss Rate = Miss rate in the Data Cache Example: Out of 1000 instructions fetched, 150 missed in the I-Cache 25% are load-store instructions, 50 missed in the D-Cache What are the I-cache and D-cache miss rates? I-Cache Miss Rate = 150 / 1000 = 15% D-Cache Miss Rate = 50 / (25% × 1000) = 50 / 250 = 20% 4/14/2019 Dr. Hadi Hassan Computer architecture

Dr. Hadi Hassan Computer architecture Memory Access Time: Formula Average Access Time = Hit time + Miss rate  Miss penalty Average Memory Access Time = L1 Hit time + L1 Miss rate * L1 Miss penalty L1 Miss penalty = L2 Hit time + L2 Miss rate * L2 Miss penalty L2 Miss penalty = L3 Hit time + L3 Miss rate * L3 Miss penalty Example: In 1000 memory references there are 40 misses in L1 and 20 misses in L2. Assume all are reads with no bypass. L1: Hit time = 1 cycle L2: Hit time = 10 cycles Miss penalty = 100 cycles What is the average memory access time? Answer L1 Local miss rate = L1 Local misses / Total references = 40/1000 = 4% L2 Local miss rate = L2 Local misses / L1 Local misses = 20/40 = 50% Average Memory Access Time = L1 Hit Time + L1 Miss Rate * (L2 Hit Time + L2 Miss Rate * L2 Miss Penalty) = 1 + 4% * (10 + 50% * 100) = 3.4 clock cycles 4/14/2019 Dr. Hadi Hassan Computer architecture