Download presentation
Presentation is loading. Please wait.
Published byNorah Hart Modified over 9 years ago
1
CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially
2
CSCI 232© 2005 JW Ryder2 Motivations Main memory access time 5 to 25 times slower than accessing register –on chip vs. off chip issues et al. Can’t have too many registers in the CPU Program locality should allow small fast buffer between the CPU and MM Should be managed by hardware to be effective
3
CSCI 232© 2005 JW Ryder3 Motivations Continued Most of time, MM data has to be found in cache to be worth it Can only happen if dynamic locality is tracked well Automatic management, transparent to Instruction Set Architecture (ISA)
4
CSCI 232© 2005 JW Ryder4 Access and Cost T cache < T MM T reg < T cache C reg > C cache > C MM (per bit - real estate)
5
CSCI 232© 2005 JW Ryder5 Cache vs. Registers Cache –Locality: Tracked dynamically –Management: Hardware –Expandability: Easy –ISA Visibility: Invisible (mostly) Registers –Locality: Static by compiler –Management: Software/Programmer –Expandability: Not possible –ISA Visibility: Visible
6
CSCI 232© 2005 JW Ryder6 4 2 1 5 3 Simple Cache Based System MM Registers CPU Cache
7
CSCI 232© 2005 JW Ryder7 Read Operation See if desired MM word is in the cache (1) If it is (‘cache hit’) get it from the cache (2) If it isn’t (cache miss) get it from MM - supply simultaneously to CPU and cache (3) –Make room in cache by selecting a victim - may have to be written back to MM (4) and then copy installed (5) CPU stalls until missing word is supplied
8
CSCI 232© 2005 JW Ryder8 Locality of Reference Temporal –If this word is needed now, then there is a good chance it will be needed again Spatial –When the fetch from MM is done, it actually gets a chunk of words –Probably some word near the word will also be needed Registers use TLOR Caches use TLOR, SLOR
9
CSCI 232© 2005 JW Ryder9 Selecting a Victim Must not be accessed in near future Maintain a history of usage Basic unit of transfer between cache and MM is a block (line) consisting of 2 b words –b is small (2 - 4) On miss, block containing missing word loaded into cache (by cache controller) Ensures neighboring words also cached (SLOR)
10
CSCI 232© 2005 JW Ryder10 Addressing Cache Same as memory Cache stores entries in form – Cache controller compares address issued by CPU with address field of cache entries to determine a hit or miss Transfer between Cache and CPU is only a word or 2 Between Cache and MM in block(s) Hit - Data back from cache in 1 clock cycle Miss - 15 - 20 cycles
11
CSCI 232© 2005 JW Ryder11 Functions of Cache Controller Given an address issued by CPU, CC should be able to determine if block containing word is in cache or not –requires assoc. logic / comparators CC needs to keep track of usage of blocks in cache Hardware logic for victim selection May need to write back line (victim) from cache to MM Must implement a placement policy that determines how blocks from MM are placed in cache Replacement policy needed only if there is a choice for victim
12
CSCI 232© 2005 JW Ryder12 Cache Loading Strategies Load block into cache from MM only on a miss Prefetch (anticipating a miss) block into cache –Prefetch on Miss: On block i miss, prefetch block i + 1 too –Always Prefetch: Prefetch block i + 1 on first reference to block i –Tagged Prefetch: Prefetch on miss and prefetch block i + 1 if a reference to a previously prefetched block is made for the first time –Keep prefetching if last prefetch was useful –Tags distinguish not yet accessed blocks from others
13
CSCI 232© 2005 JW Ryder13 More Strategies Previous prefetches are 1 block, can be > 1 block Selective Fetch –Don’t fetch shared writeable blocks –Used in many systems to avoid cache incoherence (multiprocessors)
14
CSCI 232© 2005 JW Ryder14 Load-Thru / Read-Thru Missing word forwarded to CPU and cache concurrently Remaining words of block are then fetched in wraparound fashion 0 1 2 … … 2 k w Order of loading for remaining words in block Wrapping around saves pointer resetting Write pointer already positioned Not needed if load can be in one shot
15
CSCI 232© 2005 JW Ryder15 Cache with Writeback Buffers Cache CPUMM Write-Thru caches Write-Back caches Special R W W W Writeback buffer = fast registers Special: Used with both types of caches; used when wrote word to writeback buffer then there is a cache miss Cache speed, buffer speed, memory speed
16
CSCI 232© 2005 JW Ryder16 Write-Thru Caches Write generated by CPU writes into cache and also deposits the write into writeback buffer –Eventually written back to MM Delay perceived by CPU –max (T cache, T WB ) T cache Cache access time T WB Time to write into writeback buffer T cache, T WB < T MM
17
CSCI 232© 2005 JW Ryder17 Writeback Cache Write to cache Write modified victims to MM via writeback buffer Delay perceived by CPU = T cache Special happens on a miss, read or write
18
CSCI 232© 2005 JW Ryder18 Cache Update Policies Keeps MM copy and cache copy of a word (ergo block) consistent Write-Thru (Store-Thru) –On hit if operation is a write, copies in MM and cache are both updated simultaneously –No need to writ e back blocks selected as victims –Useful for multiprocessing systems (MM always has latest copy) –If cache fails MM copy can serve as hot back up –Can slow up CPU on writes (since MM updates take place at slower rates)
19
CSCI 232© 2005 JW Ryder19 Write-Back (No Write-Thru) On write hit, only cache copy is updated Faster writes on a cache hit Need to write back dirty blocks selected as victims –Dirty Block: A block modified after being brought into the cache Requires a clean/dirty bit for every block
20
CSCI 232© 2005 JW Ryder20 Allocation Policies WTWA - Write Thru Write Allocate - allocate missing block in cache on both read and write miss WTNWA - Write Thru No Write Allocate - Don’t allocate on a write miss, allocate only for a read miss
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.