Memory Hierarchy and Cache
A Mystery…
Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit wide DDR : Dual Data Rate S : Synchronous D : synamic
Memory SRAM : Static RAM – Register technology – Maintains state as long as power is on – Flip flops – 4-6 transistors each
Memory DRAM : Dynamic Ram – Main memory technology – Each cell only one transistor an one capacitor Capacitor charge represents value – Slower to read/write – Must be refreshed
Since 1980, CPU has outpaced DRAM... 3 cycle delay for memory access
Since 1980, CPU has outpaced DRAM i7 107 cycle delay for main memory access
Cache Cache memory – Small, fast (SRAM) memory – Stores subset of main memory we think is most important
Cache L1 – closest/fastest to CPU – Often separate instruction/data caches – ~64KB
Cache L2 & L3 – May be on chip or board – May be shared by cores – ~ 1 MB (L2) ~5-10 MB (L3)
Differences No hard rules about – What cache you have – Where it lives
Cache How important is it?
Hierarchy Cache / Main Memory part of a hierarchy –
Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – use it No : Miss – go search next level – Is it in L2? Yes : Hit – use it No : Miss – go search next level – Is it in L3… – Is it in memory…
Memory Access Speedup Assume only L1 cache and main memory – S : Speedup – t m : time to access main memory – t c : time to access cache – h : hit ratio
Memory Access Speedup Divide through by t m Call t c /t m "k" – k : ratio of cache access time to memory access time
Speedup vs HitRate If cache is 100x faster than main memory: – Need high hit rate for large speedup
Cache & Locality Cache effectiveness based on: – Temporal locality : Recently used things tend to be needed again soon – Spatial locality : Memory accesses tend to cluster Sequential instruction access
Memory Units Main memory – Byte addressed
Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Word 0 Word 1 Word 2 Word 3 Word 4 Word 5 …
Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Cache – Line of 1+ words Line 0 Line 1 …
Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – return it No : Miss – go search next level and bring back whole line – Is it in L2? Yes : Hit – return line No : Miss – go search next level bring back whole line – Is it in L3… – Is it in memory…
Associative Memory Data is looked up with a key:
Associativity – What chunks of memory can go in which cache lines
Fully Associative Fully associative cache – Any memory line can go in any cache entry
Fully Associative Memory address – 4 bytes per word – 2 words per line – xxx lines
Fully Associative Address Decoding
Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match
Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match – Large amounts of hardware Only practical for very small caches
Direct Mapping Direct mapping : every memory block has one cache entry it can use
Direct Mapped Cache 4 byte words 2 word lines (8 bytes) Cache of 4 lines (32 bytes)
Direct Mapped Cache Direct Mapped Cache : Every line mapped to one cache slot slot = line % 4
Direct Mapped Cache Direct Mapped Cache : Need to track who is in the slot 0? 4? 8?
Direct Mapped Cache Set: Group of lines = size of cache Tag: Records set each line is from
Direct Mapped Cache Address format based on – 4 bytes per word – 2 words per line – 4 lines per set – xxx sets of total memory
Direct Mapped Cache Address Decoding
Address Decoding Direct Mapped Cache
Using tags Need: Tag shows line is from the right set
Using tags Need: Tag shows wrong set is cached - fetch correct line
Scaled Up Byte-addressable memory of 2 14 bytes Cache has 16 blocks, each has 8 bytes What do addresses look like?
Scaled Up Byte-addressable memory of 2 32 bytes Words of 4 bytes Cache has 16 lines, each has 8 words What do addresses look like? – 32 bit address – 2 bits for byte in word – 3 bits for word in line – 4 bits for line – Set is leftovers… 23 bits
Issue : Thrashing Direct Mapped Cache
Issue : Thrashing 0x0040 = 0x x0020 Fetch Line 0/ Word 0 Replace with 1/0 Replace with 2/0 0x0044 = 0x x0024 Fetch 0/1 Replace with 1/1 Replace with 2/1 Direct Mapped Cache
Set Associative n-way Set Associative : every memory block has n-slots it can be in 2-way
Set Associative n-way Set Associative : every memory block has n-slots it can be in 4-way
Set Associative Address 2 way set associative:
Set Associative Address Need to check all slots in parallel for right tag
Replacement Strategies How do what block to kick out? – FIFO : Track age – Least Used : Track accesses Very susceptible to thrashing – Least Recently Used : Track age of accesses Very complex – Random
Set Accociative Performance Larger caches = higher hit rate Smaller caches benefit more from associativity
What do they use? Intell Haswell generation AMD
Bad Situations for Cache Data with poor locality – Complex object oriented programming structure Large 2D arrays traversed in column major order… Row Major Access Col Major Access