Download presentation
Presentation is loading. Please wait.
1
CS/COE 1541 (term 2174) Jarrett Billingsley
Memory – Caching CS/COE 1541 (term 2174) Jarrett Billingsley
2
Class Announcements AaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAA
one day, you'll have your work back. (I'm just pipelining my work) Homework 2 Little smaller! Due in one week Project 1! I had an idea but then I talked to someone and got another idea and aaaaaaaaaaaa But ultimately I think I'll stick with the first idea – writing a cache simulator in C You'll have about a month to do it, so no pressure but some of you will start it a few days before it's due so like whatever 2/8/2017 CS/COE 1541 term 2174
3
Caching 2/8/2017 CS/COE 1541 term 2174
4
Terms, terms... A cache holds temporary copies of data. A cache is made of multiple lines/blocks, which can each hold multiple data items. We'll be talking about the memory cache – the layer between the CPU registers and the system memory. Caches take advantage of locality: temporal (in time) and spatial (in space/addresses). When we want to access data, a hit means it's in the cache. A miss means it's not in the cache. Hit rate/miss rate are the percentage of accesses that hit or miss. Both hits and misses take time: hit time and miss penalty. 2/8/2017 CS/COE 1541 term 2174
5
For those of you who are unfamiliar with caches...
I want to make a read-only cache! (we'll deal with writes later.) Let's just start with lw instructions. Here are some ideas to get started: Every memory access has an address. We need some way of seeing if things are in the cache. We need some way of inserting things into the cache. And what if you run out of cache space? Give me some ideas on how you'd approach designing a cache. I'm not looking for "right" answers! 2/8/2017 CS/COE 1541 term 2174
6
Cache design goals Unlike the BTB, false positives/negatives are unacceptable. Caching is not a predictive heuristic. It's the real data. Getting data from the cache when it hits should be faster than accessing the next lower level in the memory hierarchy. Kinda obvious, but this means we can't waste too many cycles on complex caching schemes, or the effort won't be worth it! We want our miss rate to be as low as possible... But miss rate isn't everything, as we'll see. 2/8/2017 CS/COE 1541 term 2174
7
Direct-mapped Caches 2/8/2017 CS/COE 1541 term 2174
8
Easy peasy In a direct-mapped cache, each real memory location maps to one location in the cache. Implementing this is easy! Memory 000000 000001 000010 000011 000100 000101 000110 000111 Cache For this 8-entry cache, to find the cache block for a memory address, take the lowest 3 address bits. 000 001 010 011 100 101 110 111 001000 001001 001010 001011 001100 001101 001110 001111 But if our program accesses , then , how do we tell them apart? (How did the BTB handle this?) 2/8/2017 CS/COE 1541 term 2174
9
Tags Each cache block has a tag which indicates the memory address that it corresponds to. For address , 011 is the block, and 110 is the tag. This seems redundant... Tag Data 000 001000 DEADBEEF 001 010 011 110011 CAFEFACE 100 101100 B0DECA7 101 110 111 Tag Data 000 001 DEADBEEF 010 011 110 CAFEFACE 100 101 B0DECA7 111 How do we tell what entries are empty/full? Just add another bit (a valid bit)! 2/8/2017 CS/COE 1541 term 2174
10
Watching it in action When the program first starts, we set all the valid bits to 0. The cache is empty. Now let's try a sequence of reads... do these hit or miss? How do the cache contents change? V Tag Data 000 001 010 011 100 101 110 111 1 010 something 1 000 something 000000 100101 100100 010000 miss hit Why did these misses happen? 1 100 something 1 100 something Why did this miss happen? 2/8/2017 CS/COE 1541 term 2174
11
The 3 C's There are three ways we can have cache misses:
Compulsory (or cold-start) misses: when you've never had that block in the cache before. Capacity misses: when the cache isn't big enough to hold all the data that you need. Collision (or conflict) misses: when two different addresses map to the same cache block. How could we improve/avoid each of these? Many of the techniques we'll be discussing will be focusing on reducing each category of miss. 2/8/2017 CS/COE 1541 term 2174
12
The steps and hardware Split the address into two (three?) parts: block and tag (and byte). a bus splitter Use the block index to find the right cache entry. a mux, or more realistically a decoder + tristate bus If the valid bit is 1, and the entry's tag matches, an equality comparator and an AND gate It's a hit! Read the cached data. Optionally, select the right byte/halfword, with a mux. Otherwise... It's a miss. Huh, what do we do now?? Stall! Or find something else to do, if you're out-of-order ;o 2/8/2017 CS/COE 1541 term 2174
13
How many bits will each have?
One word per block? The example shows one word in each block. But is that the best? What about spatial locality? In reality, most caches put multiple words in each block. Common sizes are 4-8 words. (i7 Skylake uses 64 bytes = 8 8-byte words) Memory buses are usually designed to the same/half width as a cache block/line to improve bandwidth. Memory has high latency, but can have great throughput! Now the address is split into four parts... tag block word byte How many bits will each have? 2/8/2017 CS/COE 1541 term 2174
14
Multiple words per block means…
Multiple valid bits! A block can be either partly or completely valid. Each memory access has to check the per-word valid bit. This complicates the memory controller because it has to account for memory transfers smaller than a whole block… But it reduces memory traffic, and gives better granularity for determining validity. But when you have a conflict… You have to invalidate all the other words when you change tag. V Tag Data0 Data1 000 00 001 01 11 23 010 011 100 3A BB 101 110 111 2/8/2017 CS/COE 1541 term 2174
15
How big... Picking the right block size is a balancing act:
Larger blocks reduce compulsory misses. But they increase competition for cache space. This is because for the same amount of cached data, you now have fewer rows! E.g. for a cache with 32 words, one word per block means 32 rows, but 2 words per block means only 16 rows. 2/8/2017 CS/COE 1541 term 2174
16
Handling writes 2/8/2017 CS/COE 1541 term 2174
17
oh boy Once you introduce writes, you greatly increase the complexity of cache behavior. The cache is supposed to be an accurate copy of the data in the next lower level. So what if you change the data in the cache? The cache is now inconsistent or invalid. How do we ensure consistency? What about exceptions???? What about multiprocessor/core systems?????? What about multiple caches that all cache the same memory???? All this, and more, on the next episode... 2/8/2017 CS/COE 1541 term 2174
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.