CS/COE 1541 (term 2174) Jarrett Billingsley

Slides:



Advertisements
Similar presentations
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CMPE 421 Parallel Computer Architecture
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Memory – Virtual Memory, Virtual Machines
Virtual Memory: Implementing Paging
Memory – Caching: Writes
CMSC 611: Advanced Computer Architecture
Improving Memory Access The Cache and Virtual Memory
Main Memory Cache Architectures
CSE 351 Section 9 3/1/12.
Virtual Memory: the Page Table and Page Swapping
Improving Memory Access 1/3 The Cache and Virtual Memory
Memory: Putting it all together
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Basic Performance Parameters in Computer Architecture:
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
Morgan Kaufmann Publishers
CS61C : Machine Structures Lecture 6. 2
ECE 445 – Computer Organization
Bojian Zheng CSCD70 Spring 2018
Cache Memories September 30, 2008
Lecture 08: Memory Hierarchy Cache Performance
Module IV Memory Organization.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Performance metrics for caches
ECE232: Hardware Organization and Design
Performance metrics for caches
How can we find data in the cache?
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Main Memory Cache Architectures
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
CS-447– Computer Architecture Lecture 20 Cache Memories
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory and Performance
Performance metrics for caches
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CS/COE 1541 (term 2174) Jarrett Billingsley Memory – Caching CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements AaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAA one day, you'll have your work back. (I'm just pipelining my work) Homework 2 Little smaller! Due in one week Project 1! I had an idea but then I talked to someone and got another idea and aaaaaaaaaaaa But ultimately I think I'll stick with the first idea – writing a cache simulator in C You'll have about a month to do it, so no pressure but some of you will start it a few days before it's due so like whatever 2/8/2017 CS/COE 1541 term 2174

Caching 2/8/2017 CS/COE 1541 term 2174

Terms, terms... A cache holds temporary copies of data. A cache is made of multiple lines/blocks, which can each hold multiple data items. We'll be talking about the memory cache – the layer between the CPU registers and the system memory. Caches take advantage of locality: temporal (in time) and spatial (in space/addresses). When we want to access data, a hit means it's in the cache. A miss means it's not in the cache. Hit rate/miss rate are the percentage of accesses that hit or miss. Both hits and misses take time: hit time and miss penalty. 2/8/2017 CS/COE 1541 term 2174

For those of you who are unfamiliar with caches... I want to make a read-only cache! (we'll deal with writes later.) Let's just start with lw instructions. Here are some ideas to get started: Every memory access has an address. We need some way of seeing if things are in the cache. We need some way of inserting things into the cache. And what if you run out of cache space? Give me some ideas on how you'd approach designing a cache. I'm not looking for "right" answers! 2/8/2017 CS/COE 1541 term 2174

Cache design goals Unlike the BTB, false positives/negatives are unacceptable. Caching is not a predictive heuristic. It's the real data. Getting data from the cache when it hits should be faster than accessing the next lower level in the memory hierarchy. Kinda obvious, but this means we can't waste too many cycles on complex caching schemes, or the effort won't be worth it! We want our miss rate to be as low as possible... But miss rate isn't everything, as we'll see. 2/8/2017 CS/COE 1541 term 2174

Direct-mapped Caches 2/8/2017 CS/COE 1541 term 2174

Easy peasy In a direct-mapped cache, each real memory location maps to one location in the cache. Implementing this is easy! Memory 000000 000001 000010 000011 000100 000101 000110 000111 Cache For this 8-entry cache, to find the cache block for a memory address, take the lowest 3 address bits. 000 001 010 011 100 101 110 111 001000 001001 001010 001011 001100 001101 001110 001111 But if our program accesses 001000, then 000000, how do we tell them apart? (How did the BTB handle this?) 2/8/2017 CS/COE 1541 term 2174

Tags Each cache block has a tag which indicates the memory address that it corresponds to. For address 110011, 011 is the block, and 110 is the tag. This seems redundant... Tag Data 000 001000 DEADBEEF 001 010 011 110011 CAFEFACE 100 101100 B0DECA7 101 110 111 Tag Data 000 001 DEADBEEF 010 011 110 CAFEFACE 100 101 B0DECA7 111 How do we tell what entries are empty/full? Just add another bit (a valid bit)! 2/8/2017 CS/COE 1541 term 2174

Watching it in action When the program first starts, we set all the valid bits to 0. The cache is empty. Now let's try a sequence of reads... do these hit or miss? How do the cache contents change? V Tag Data 000 001 010 011 100 101 110 111 1 010 something 1 000 something 000000 100101 100100 010000 miss hit Why did these misses happen? 1 100 something 1 100 something Why did this miss happen? 2/8/2017 CS/COE 1541 term 2174

The 3 C's There are three ways we can have cache misses: Compulsory (or cold-start) misses: when you've never had that block in the cache before. Capacity misses: when the cache isn't big enough to hold all the data that you need. Collision (or conflict) misses: when two different addresses map to the same cache block. How could we improve/avoid each of these? Many of the techniques we'll be discussing will be focusing on reducing each category of miss. 2/8/2017 CS/COE 1541 term 2174

The steps and hardware Split the address into two (three?) parts: block and tag (and byte). a bus splitter Use the block index to find the right cache entry. a mux, or more realistically a decoder + tristate bus If the valid bit is 1, and the entry's tag matches, an equality comparator and an AND gate It's a hit! Read the cached data. Optionally, select the right byte/halfword, with a mux. Otherwise... It's a miss. Huh, what do we do now?? Stall! Or find something else to do, if you're out-of-order ;o 2/8/2017 CS/COE 1541 term 2174

How many bits will each have? One word per block? The example shows one word in each block. But is that the best? What about spatial locality? In reality, most caches put multiple words in each block. Common sizes are 4-8 words. (i7 Skylake uses 64 bytes = 8 8-byte words) Memory buses are usually designed to the same/half width as a cache block/line to improve bandwidth. Memory has high latency, but can have great throughput! Now the address is split into four parts... 0110 1010 0011 1000 tag block word byte How many bits will each have? 2/8/2017 CS/COE 1541 term 2174

Multiple words per block means… Multiple valid bits! A block can be either partly or completely valid. Each memory access has to check the per-word valid bit. This complicates the memory controller because it has to account for memory transfers smaller than a whole block… But it reduces memory traffic, and gives better granularity for determining validity. But when you have a conflict… You have to invalidate all the other words when you change tag. V Tag Data0 Data1 000 00 001 01 11 23 010 011 100 3A BB 101 110 111 2/8/2017 CS/COE 1541 term 2174

How big... Picking the right block size is a balancing act: Larger blocks reduce compulsory misses. But they increase competition for cache space. This is because for the same amount of cached data, you now have fewer rows! E.g. for a cache with 32 words, one word per block means 32 rows, but 2 words per block means only 16 rows. 2/8/2017 CS/COE 1541 term 2174

Handling writes 2/8/2017 CS/COE 1541 term 2174

oh boy Once you introduce writes, you greatly increase the complexity of cache behavior. The cache is supposed to be an accurate copy of the data in the next lower level. So what if you change the data in the cache? The cache is now inconsistent or invalid. How do we ensure consistency? What about exceptions???? What about multiprocessor/core systems?????? What about multiple caches that all cache the same memory???? All this, and more, on the next episode... 2/8/2017 CS/COE 1541 term 2174