Download presentation
Presentation is loading. Please wait.
Published byWinifred Chandler Modified over 8 years ago
1
Memory Hierarchy and Cache
2
A Mystery…
3
Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit wide DDR : Dual Data Rate S : Synchronous D : synamic
4
Memory SRAM : Static RAM – Register technology – Maintains state as long as power is on – Flip flops – 4-6 transistors each
5
Memory DRAM : Dynamic Ram – Main memory technology – Each cell only one transistor an one capacitor Capacitor charge represents value – Slower to read/write – Must be refreshed
6
Since 1980, CPU has outpaced DRAM... 3 cycle delay for memory access
7
Since 1980, CPU has outpaced DRAM... 2010 i7 107 cycle delay for main memory access
8
Cache Cache memory – Small, fast (SRAM) memory – Stores subset of main memory we think is most important
9
Cache L1 – closest/fastest to CPU – Often separate instruction/data caches – ~64KB
10
Cache L2 & L3 – May be on chip or board – May be shared by cores – ~ 1 MB (L2) ~5-10 MB (L3)
11
Differences No hard rules about – What cache you have – Where it lives
12
Cache How important is it?
13
Hierarchy Cache / Main Memory part of a hierarchy – http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
14
Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – use it No : Miss – go search next level – Is it in L2? Yes : Hit – use it No : Miss – go search next level – Is it in L3… – Is it in memory…
15
Memory Access Speedup Assume only L1 cache and main memory – S : Speedup – t m : time to access main memory – t c : time to access cache – h : hit ratio
16
Memory Access Speedup Divide through by t m Call t c /t m "k" – k : ratio of cache access time to memory access time
17
Speedup vs HitRate If cache is 100x faster than main memory: – Need high hit rate for large speedup https://tube.geogebra.org/upload/vtnvxoxodocaabevzwgaaaaa56d3555ebed31
18
Cache & Locality Cache effectiveness based on: – Temporal locality : Recently used things tend to be needed again soon – Spatial locality : Memory accesses tend to cluster Sequential instruction access
19
Memory Units Main memory – Byte addressed
20
Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Word 0 Word 1 Word 2 Word 3 Word 4 Word 5 …
21
Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Cache – Line of 1+ words Line 0 Line 1 …
22
Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – return it No : Miss – go search next level and bring back whole line – Is it in L2? Yes : Hit – return line No : Miss – go search next level bring back whole line – Is it in L3… – Is it in memory…
23
Associative Memory Data is looked up with a key:
24
Associativity – What chunks of memory can go in which cache lines
25
Fully Associative Fully associative cache – Any memory line can go in any cache entry
26
Fully Associative Memory address – 4 bytes per word – 2 words per line – xxx lines
27
Fully Associative Address Decoding 0010100 2
28
Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match
29
Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match – Large amounts of hardware Only practical for very small caches
30
Direct Mapping Direct mapping : every memory block has one cache entry it can use
31
Direct Mapped Cache 4 byte words 2 word lines (8 bytes) Cache of 4 lines (32 bytes)
32
Direct Mapped Cache Direct Mapped Cache : Every line mapped to one cache slot slot = line % 4
33
Direct Mapped Cache Direct Mapped Cache : Need to track who is in the slot 0? 4? 8?
34
Direct Mapped Cache Set: Group of lines = size of cache Tag: Records set each line is from
35
Direct Mapped Cache Address format based on – 4 bytes per word – 2 words per line – 4 lines per set – xxx sets of total memory
36
Direct Mapped Cache Address Decoding 0001000 2
37
Address Decoding 1000110 2 Direct Mapped Cache
38
Using tags Need: 0110000 2 Tag shows line is from the right set
39
Using tags Need: 1000110 2 Tag shows wrong set is cached - fetch correct line
40
Scaled Up Byte-addressable memory of 2 14 bytes Cache has 16 blocks, each has 8 bytes What do addresses look like?
41
Scaled Up Byte-addressable memory of 2 32 bytes Words of 4 bytes Cache has 16 lines, each has 8 words What do addresses look like? – 32 bit address – 2 bits for byte in word – 3 bits for word in line – 4 bits for line – Set is leftovers… 23 bits
42
Issue : Thrashing Direct Mapped Cache
43
Issue : Thrashing 0x0040 = 0x0000 + 0x0020 Fetch Line 0/ Word 0 Replace with 1/0 Replace with 2/0 0x0044 = 0x0004 + 0x0024 Fetch 0/1 Replace with 1/1 Replace with 2/1 Direct Mapped Cache
44
Set Associative n-way Set Associative : every memory block has n-slots it can be in 2-way
45
Set Associative n-way Set Associative : every memory block has n-slots it can be in 4-way
46
Set Associative Address 2 way set associative:
47
Set Associative Address Need to check all slots in parallel for right tag
48
Replacement Strategies How do what block to kick out? – FIFO : Track age – Least Used : Track accesses Very susceptible to thrashing – Least Recently Used : Track age of accesses Very complex – Random
49
Set Accociative Performance Larger caches = higher hit rate Smaller caches benefit more from associativity
50
What do they use? Intell Haswell generation AMD
51
Bad Situations for Cache Data with poor locality – Complex object oriented programming structure Large 2D arrays traversed in column major order… 012 0 768276 1 838494 2 889383 012345678 768276838494889383 Row Major Access Col Major Access
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.