Presentation is loading. Please wait.

Presentation is loading. Please wait.

CDA 5155 Caches.

Similar presentations


Presentation on theme: "CDA 5155 Caches."— Presentation transcript:

1 CDA 5155 Caches

2 New Topic: Memory Systems
Cache 101 – review of undergraduate material Associativity and other organization issues Advanced designs and interactions with pipelines Tomorrow’s cache design (power/performance) Advances in memory design Virtual memory (and how to do it fast)

3 Cache Design 101 Memory pyramid 1 cycle access (early in pipeline)
Reg 100s bytes 1 cycle access (early in pipeline) L1 Cache (several KB) 1-3 cycle access L2 Cache (½-32MB) 6-15 cycle access Memory (128MB – fewGB) cycle access Millions cycle access! Disk (Many GB)

4 Terminology Temporal locality: Spatial locality:
If memory location X is accessed, then it is more likely to be re-accessed in the near future than some random location Y. Caches exploit temporal locality by placing a memory element that has been referenced into the cache. Spatial locality: If memory location X is accessed, then locations near X are more likely to be accessed in the near future than some random location Y. Caches exploit spatial locality by allocating a cache line of data (including data near the referenced location).

5 A simple cache Processor Cache Memory 2 cache lines 3 bit tag field
2 byte block 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 V 150 160 V 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250

6 1st access Processor Cache Memory 100 110 120 130 tag data 140 150 160
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 150 160 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250

7 1st access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 1 100 150 110 Addr: 0001 160 lru 170 180 block offset 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 230 240 250

8 2nd access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 1 100 150 110 160 lru 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 230 240 250

9 2nd access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0101 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 240 250

10 3rd access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0001 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 240 250

11 3rd access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ 0 ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

12 4th access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 block offset 190 Addr: 0100 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 110 240 250

13 4th access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

14 5th access Processor Cache Memory 100 110 120 130 tag data 140 lru 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 lru 1 100 150 110 160 1 2 140 170 150 180 block offset 190 Addr: 0000 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 150 230 140 240 250

15 5th access Processor Cache Memory 100 110 120 130 tag data 140 1 100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ ] Ld R2  M[ ] Ld R3  M[ ] Ld R3  M[ ] Ld R2  M[ ] 130 tag data 140 1 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 140 100 230 140 240 250

16 Basic Cache Organization
Decide on the block size How? Simulate lots of different block sizes and see which one gives the best performance Most systems use a block size between 32 bytes and 128 bytes Longer sizes reduce the overhead by: Reducing the number of tags Reducing the size of each tag Tag Block offset

17 What About Stores? Where should you write the result of a store?
If that memory location is in the cache? Send it to the cache. Should we also send it to memory right away? (write-through policy) Wait until we kick the block out (write-back policy) If it is not in the cache? Allocate the line (put it in the cache)? (write allocate policy) Write it directly to memory without allocation? (no write allocate policy)

18 Handling Stores (write-through)
Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Assume write-allocate policy 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225

19 write-through (REF 1) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225

20 write-through (REF 1) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225

21 write-through (REF 2) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225

22 write-through (REF 2) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

23 write-through (REF 3) Processor Cache Memory 78 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

24 write-through (REF 3) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 173 1 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

25 write-through (REF 4) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

26 write-through (REF 4) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 173 29 150 29 162 1 2 71 173 29 150 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225

27 write-through (REF 5) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 173 29 29 162 1 2 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225

28 write-through (REF 5) Processor Cache Memory 173 29 120 123 V tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 5 33 29 28 162 lru 1 2 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 33 200 210 225

29 How many memory references?
Each miss reads a block 2 bytes in this cache Each store writes a byte Total reads: 8 bytes Total writes: 2 bytes but caches generally miss < 20%

30 Write-through vs. write-back
We can also design the cache to NOT write all stores to memory immediately? We can keep the most current copy in the cache and update the memory when that data is evicted from the cache (a write-back policy). Do we need to write-back all evicted lines? No, only blocks that have been stored into Keep a “dirty bit”, reset when the line is allocated, set when the block is stored into. If a block is “dirty” when evicted, write its data back into memory.

31 Handling Stores (write-back)
Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225

32 write-back (REF 1) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 150 162 173 18 21 33 R0 R1 R2 R3 28 19 Misses: 0 Hits: 200 210 225

33 write-back (REF 1) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225

34 write-back (REF 2) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 78 150 29 162 lru 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 1 Hits: 200 210 225

35 write-back (REF 2) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

36 write-back (REF 3) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 78 150 29 162 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

37 write-back (REF 3) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

38 write-back (REF 4) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 1 173 150 29 162 lru 1 3 162 173 173 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 2 Hits: 173 200 210 225

39 write-back (REF 4) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225

40 write-back (REF 5) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 3 Hits: 173 200 210 225

41 write-back (REF 5) Processor Cache Memory 78 173 29 120 123
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 173 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 lru 1 1 173 150 29 162 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 173 200 210 225

42 write-back (REF 5) Processor Cache Memory 78 29 120 123 V d tag data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data Ld R1  M[ ] Ld R2  M[ ] St R2  M[ ] St R1  M[ ] Ld R2  M[ 10 ] 71 1 5 33 150 28 162 lru 1 1 3 71 173 29 18 21 33 R0 R1 R2 R3 28 29 19 Misses: 4 Hits: 33 200 210 225

43 How many memory references?
Each miss reads a block 2 bytes in this cache Each evicted dirty cache line writes a block Total reads: 8 bytes Total writes: 4 bytes (after final eviction) Choose write-back or write-through?


Download ppt "CDA 5155 Caches."

Similar presentations


Ads by Google