Download presentation
Presentation is loading. Please wait.
1
CDA 5155 Caches
2
New Topic: Memory Systems
Cache 101 – review of undergraduate material Associativity and other organization issues Advanced designs and interactions with pipelines Tomorrow’s cache design (power/performance) Advances in memory design Virtual memory (and how to do it fast)
3
Cache Design 101 Memory pyramid 1 cycle access (early in pipeline)
Reg 100s bytes 1 cycle access (early in pipeline) L1 Cache (several KB) 1-3 cycle access L2 Cache (½-32MB) 6-15 cycle access Memory (128MB – fewGB) cycle access Millions cycle access! Disk (Many GB)
4
Terminology Temporal locality: Spatial locality:
If memory location X is accessed, then it is more likely to be re-accessed in the near future than some random location Y. Caches exploit temporal locality by placing a memory element that has been referenced into the cache. Spatial locality: If memory location X is accessed, then locations near X are more likely to be accessed in the near future than some random location Y. Caches exploit spatial locality by allocating a cache line of data (including data near the referenced location).
5
A simple cache Processor Cache Memory 2 cache lines 3 bit tag field
2 byte block 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 V 150 160 V 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250
6
1st access Processor Cache Memory 100 110 120 130 tag data 140 150 160
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 150 160 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250
7
1st access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 1 100 150 110 Addr: 0001 160 lru 170 180 block offset 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 230 240 250
8
2nd access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 1 100 150 110 160 lru 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 230 240 250
9
2nd access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0101 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 240 250
10
3rd access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0001 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 240 250
11
3rd access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ 0 ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 110 240 250
12
4th access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 block offset 190 Addr: 0100 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 110 240 250
13
4th access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 140 240 250
14
5th access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0000 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 140 240 250
15
5th access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1 M[ ] Ld R2 M[ ] Ld R3 M[ ] Ld R3 M[ ] Ld R2 M[ ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 140 100 230 140 240 250
16
Basic Cache Organization
Decide on the block size How? Simulate lots of different block sizes and see which one gives the best performance Most systems use a block size between 32 bytes and 128 bytes Longer sizes reduce the overhead by: Reducing the number of tags Reducing the size of each tag Tag Block offset
17
What About Stores? Where should you write the result of a store?
If that memory location is in the cache? Send it to the cache. Should we also send it to memory right away? (write-through policy) Wait until we kick the block out (write-back policy) If it is not in the cache? Allocate the line (put it in the cache)? (write allocate policy) Write it directly to memory without allocation? (no write allocate policy)
18
Handling Stores (write-through)
Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Assume write-allocate policy 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225
19
write-through (REF 1) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225
20
write-through (REF 1) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225
21
write-through (REF 2) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225
22
write-through (REF 2) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
23
write-through (REF 3) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
24
write-through (REF 3) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 173 1 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
25
write-through (REF 4) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
26
write-through (REF 4) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 173 29 150 29 162 1 2 71 173 29 150 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225
27
write-through (REF 5) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 173 29 29 162 1 2 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225
28
write-through (REF 5) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 5 33 29 28 162 lru 1 2 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 33 200 210 225
29
How many memory references?
Each miss reads a block 2 bytes in this cache Each store writes a byte Total reads: 8 bytes Total writes: 2 bytes but caches generally miss < 20%
30
Write-through vs. write-back
We can also design the cache to NOT write all stores to memory immediately? We can keep the most current copy in the cache and update the memory when that data is evicted from the cache (a write-back policy). Do we need to write-back all evicted lines? No, only blocks that have been stored into Keep a “dirty bit”, reset when the line is allocated, set when the block is stored into. If a block is “dirty” when evicted, write its data back into memory.
31
Handling Stores (write-back)
Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225
32
write-back (REF 1) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225
33
write-back (REF 1) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225
34
write-back (REF 2) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225
35
write-back (REF 2) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
36
write-back (REF 3) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
37
write-back (REF 3) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
38
write-back (REF 4) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225
39
write-back (REF 4) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225
40
write-back (REF 5) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225
41
write-back (REF 5) Processor Cache Memory 78 173 29 120 123
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 173 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 173 200 210 225
42
write-back (REF 5) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1 M[ ] Ld R2 M[ ] St R2 M[ ] St R1 M[ ] Ld R2 M[ 10 ] 71 1 5 33 150 28 162 lru 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 33 200 210 225
43
How many memory references?
Each miss reads a block 2 bytes in this cache Each evicted dirty cache line writes a block Total reads: 8 bytes Total writes: 4 bytes (after final eviction) Choose write-back or write-through?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.