Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 19: Cache Operation & Design Spring 2009 W. Rhett Davis NC State University with significant material from Paul Franzon, Bill Allen, & Xun Liu
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 2 Announcements l HW#8 Due Thursday l Proj#2 Due in 16 days (Start early!)
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 3 Summary of Last Lecture l How can you tell if an interface has flow- control? l What can you do to reduce the complexity of the state transition diagram for an interface with flow control?
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 4 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 5 Cache Memory Fundamentals l A cache memory is another memory block in the system, which works closely with the "main" memory block, to improve the performance of memory accesses: l Cache memory is: » faster than main memory » usually physically closer to the decode and execution units » smaller in capacity than the main memory » holds the frequently accessed data and/or instructions
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 6 Cache Memory Fundamentals l Programmers want large amounts of fast memory » for function and performance l Large main memories are usually slow l Programs do not access all code/data uniformly, rather smaller amounts of the total data and code (instructions) are accessed more frequently than the rest l Programs exhibit: » "Spatial Locality" - high probability that an instruction --- which is physically close in memory to the one just accessed --- will be accessed » "Temporal Locality" - high probability that a recently accessed instruction will be accessed again
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 7 Multi-Level Cache Hierarchy Main Memory Level 2 Cache Level 1 Cache Frequently access block of memory (data and/or instructions) Cache Memory Fundamentals
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 8 Elements of a Cache l Size » in relation to the main memory l Mapping » direct, set associative, fully associative, etc. l Replacement algorithm (for a cache "miss") » LRU, FIFO, LFU, Random, etc. l Write policy » write back, through, once, allocate, etc. l Line size (block size) l Cache Levels » number of caches, "memory hierarchy" l Cache Coherency » across multiple processors with caches l Type of Accesses » Unified (both instruction & data), Split (separate instr. & data caches)
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 9 Basic Cache Operation Cache Controller receives address of data or instruction to be accessed, from CPU Is the data / instruction in the cache ? Forward the data / instruction to the CPU Done yes = "cache hit" Cache Controller accesses main memory to get the requested data / instruction Allocate / replace the lines in the cache for the requested data / instruction REPLACEMENT POLICY? Load data / instruction and its associated block in cache READ / WRITE POLICY? Forward the data / instruction to the CPU no = "cache miss" order and sequence depends on the replacement and read/write policies
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 10 Cache Policies l Replacement Policy » Miss: –decide which location(s) and what contents in cache to replace with the requested data and its associated "block" » Policy Options: –LRU, FIFO, LFU, Random » We will use a direct-mapped cache, which means that only one cache location is mapped to each main-memory location. A Miss will always require replacement l Read Policy Options » Hit: –(1) Forward requested data to the CPU » Miss: –(1) "Load Through" - forward to CPU as the cache is filled from main memory –(2) Fill cache first from main memory, then forward to the CPU –We will use option (2)
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 11 Cache Policies l Write Policy Options » Hit: –(1) "Write Through" - write to both cache and main memory –(2) "Write Back" - write to cache, update main memory upon a cache "flush“ –We will use option (1) » Miss: –1) "Write Allocate" - write to main memory and then fill cache –2) "Write No-Allocate" - write to main memory only –We will use option (1)
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 12 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 13 Direct Mapped Caches l Each Main-Memory Address is divided into three fields: l Example 1: » 32 main-memory locations (5 address bits) » 16 bits per word » 0 offset bits (1 word per block) » 3 index bits (8 blocks in cache) » 2 tag bits » Cache RAM will be 8 words x 18 bits Tag IndexOffset
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 14 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index 10101
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 15 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index memory locations mapped to cache locations - the cache holds copies of what is in main memory
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 16 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index lower order bits used as the cache "index"
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 17 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index conflicted mappings
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 18 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index higher order bits used as the cache "tag"
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 19 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index higher order bits used as the cache "tag" - to determine which particular memory line is in cache - that is, which "index" is in cache Cache Tag
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 20 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index which particular memory line is in cache ? - that is, which "index" is in cache Cache Tag ?
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 21 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index compare the "tag" - to determine which particular memory line is in cache - that is, which "index" is in cache Cache Tag
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 22 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index Cache Tag "compare" "decode" - therefore, a comparator and a decoder is needed cache controller
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 23 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index Cache Tag Valid Array indicates if a Cache Block and Tag have been loaded. An invalid entry should always result in a “miss” Valid Array
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 24 Another Direct-Mapped Example l Example 2 (Used on HW#8 & Proj#2): » 2 16 main-memory locations (16 address bits) » 16 bits per word » 2 offset bits –How many words per block? » 4 index bits –How many blocks in cache? » How many tag bits? » How big will the Cache RAM be?
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 25 Example Program l AddressDataAssembly Language l // AND R0, R0, #0 l // ADD R0, R0, #7 l // AND R1, R1, #0 l // loop1ADD R1, R1, #5 l F // ADD R0, R0, #-1 l FD // BRP loop1 l // ST R1, var1 l 3007EC04 // LEA R6, dest l 3008C180 // JMP R6 l // var1 NOP l 300A0000 // var2 NOP l 300B0000 // var3 NOP l 300C25FC // dest LD R2, var1 l 300D14A1 // ADD R2, R2, #1 l 300E75BE // STR R2, R6, #-2 l 300F7DBF // STR R6, R6, #-1 l 3010A7FA // LDI R3, var3 l 3011B5F9 // STI R2, var3 l 30120FFF // last BRNZP last
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 26 Exercise l For the first 7 instructions, find the following: » tag, index, and offset for each memory access » Type of Cache Operation (e.g. read hit, read miss, write hit, or write miss) » Show the contents of the cache RAM
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 27 Exercise l 1 st instruction » Fetch from location 3000: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 28 Exercise l 2 nd instruction » Fetch from location 3001: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 29 Exercise l 5 th instruction » Fetch from location 3004: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 30 Exercise l 7 th instruction » Fetch from location 3006: » Write 0023 to location 3009: –offset: –index: –tag: » After writing to main memory, the block is loaded » Cache RAM Contents: IndexValidTagData … F
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 31 Exercise l What if the next instruction were a read from location 1105? –offset: –index: –tag: » l What if the next instruction were a write to location 300A? –offset: –index: –tag: »
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 32 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 33 Project #1 System l Synchronous Memory with Separate din/dout/address lines
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 34 Project #2 Changes l Asynchronous Off-Chip Memory with Shared din/dout/address lines l Cache sits between processor and memory l LC3 Unchanged except for “macc” signal » High when state is Fetch, Read Memory, Write Memory, or Read Indirect Address l SimpleLC3 and Memory blocks will be provided
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 35 Data Transfer Interface Cache Off-chip Memory Read request (rrqst) Data/Address(data) Read ready(rrdy) Read data ready(rdrdy) Read data accept(rdacpt) Write request(wrqst) Write accept(wacpt) addr din rd dout complete clock reset Memory Access (macc)
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 36 Protocol for Read Miss addressdata0data1data2data3 Read request Read ready Read data ready Read data accept
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 37 Protocol for Write Hit addressdata Write request Write accept
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 38 Protocol for Write Miss Read request addrdata data0data1data2data3 Read data ready Read data accept Write reqst Write acpt data
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 39 Cache System Block-Diagram
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 40 UnifiedCache Schematic
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 41 CacheController Block l Takes the handshaking signals from the LC-3 CPU and Off-Chip Memory as inputs l Takes miss indicator from CacheData as input l Maintains the state of the Cache and Interfaces l Maintains a 2-bit counter that specifies the word offset to be loaded into the cache
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 42 Controller State Machine 0 3 Read-hit Read-miss rrdy=1 rdrdy=1 Read-complete 5 Write wacpt=1 wacpt=0, hit 67 wacpt=1 wacpt=0 rdrdy=1 reset 8 wacpt=0, miss always macc =0 || Read-incomplete
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 43 Use Counter to Read Four Words 0 3 Read-hit Read-miss rrdy=1 rdrdy=1 Read-complete 5 Other Write wacpt=1 wacpt=0, hit 67 wacpt=1 wacpt=0 rdrdy=1 reset 8 wacpt=0, miss always macc =0 || Read-incomplete 32 rdrdy= rdrdy=0