March R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class CachesCaches Direct mapped cacheDirect mapped cache Set associative cacheSet associative cache Magic: fully associative cacheMagic: fully associative cache Four Questions/3 C’sFour Questions/3 C’s
March R. Smith - University of St Thomas - Minnesota Caches Make computers faster via bits of extra RAMMake computers faster via bits of extra RAM –CPU “sees” RAM through the MAR and MDR –Cache sits “behind” the MAR/MDR providing the data “Local” data is saved in the faster cache storage“Local” data is saved in the faster cache storage –Fast to retrieve –Handles most cases “Other” data in the regular RAM“Other” data in the regular RAM –Slower to retrieve –Stays in the cache in case it’s used again soon
March R. Smith - University of St Thomas - Minnesota Direct Mapped Cache The basis of today’s designsThe basis of today’s designs –A collection of high speed RAM locations –Broken into individually addressed “cache entries” –Part of RAM address chooses cache entry (“Direct mapping”) A cache entryA cache entry –“Index” is its address in the cache –Valid bit - true if the entry contains valid RAM data –“Tag” holds the address bits not matching the cache address –Data area - where the stored data resides Store multiple words (spatial locality)Store multiple words (spatial locality)
March R. Smith - University of St Thomas - Minnesota Example 32 bit RAM addresses32 bit RAM addresses 64 cache entries, each contains 16 bytes64 cache entries, each contains 16 bytes How do we resolve cache addresses?How do we resolve cache addresses? How big is the tag field?How big is the tag field? How much RAM does it need, in bits, per entry?How much RAM does it need, in bits, per entry? How much for the whole cache?How much for the whole cache?
March R. Smith - University of St Thomas - Minnesota CPU and Cache Handling What happens with a cache hit?What happens with a cache hit? What happens with a cache miss?What happens with a cache miss? –A stall, like a pipeline stall, but simpler –We stall the whole CPU - inefficient but it’s the best approach How do we replace a word in the cache?How do we replace a word in the cache? –Pick one to replace –Option: pick at random Easy to implementEasy to implement Not always optimalNot always optimal –Option: LRU – least recently used OptimalOptimal Hard to implement – usually just approximatedHard to implement – usually just approximated
March R. Smith - University of St Thomas - Minnesota What happens when we write data? Option: write throughOption: write through –Do the write in the ‘background’ after it hits the cache –Often needs a buffer to hold the data being written –The usual choice in caches Option: write backOption: write back –Save the updated data in the cache –Write data back only when replacing the word in the cache –Makes it much slower to replace a cache entry We have to wait for the write to finishWe have to wait for the write to finish
March R. Smith - University of St Thomas - Minnesota Set Associative Caches That 2-way, 4-way, 8-way stuffThat 2-way, 4-way, 8-way stuff Provides multiple ‘hit’ entries per mappingProvides multiple ‘hit’ entries per mapping Problem:Problem: –Calculate size information for a set associative cache AttributesAttributes –Address size –Block size –Number of lines –N-way
March R. Smith - University of St Thomas - Minnesota A specific problem We are building an 8-way set associative cache to handle 32 bit addresses.We are building an 8-way set associative cache to handle 32 bit addresses. –We will use 32 byte blocks. –We have 256K bytes of high speed RAM we can use for the data space. –How much extra space do we need for address tags? How large are the address tags in bits? –How much extra space do we need for address tags? How large are the address tags in bits? –How many "valid" bits do we need?
March R. Smith - University of St Thomas - Minnesota Fully associative cache “Association list” approach“Association list” approach –Accepts an address –Returns the data Not a RAM – stores tags and dataNot a RAM – stores tags and data –Tag field = full address – block size –Data field = data block Parallel tag field checkingParallel tag field checking –Automatically matches, retrieves data with matching tag –Expensive in terms of logic
March R. Smith - University of St Thomas - Minnesota Four Questions General framework for memory hierarchiesGeneral framework for memory hierarchies 1. Where can a block be placed?1. Where can a block be placed? –Different schemes have different restrictions –Some have no restrictions (fully associative) 2. How is a block found?2. How is a block found? –Fully associative - logic does all the work in one cycle –Direct addressing does much of the work 3. How do we choose a block to replace?3. How do we choose a block to replace? –Option: Randomly –Option: LRU 4. What happens during a write?4. What happens during a write? –Write-back –Write-through
March R. Smith - University of St Thomas - Minnesota Types of Misses (Three C’s) Compulsory misses or Cold start missesCompulsory misses or Cold start misses –When a block is first accessed by the program –Impossible to eliminate these –Right block size can reduce the number Capacity missesCapacity misses –Cache can’t contain all blocks needed by the program –i.e. the program keeps pulling blocks back in after they’ve been replaced by other referenced blocks –Suggests the cache isn’t big enough Conflict misses or Collision missesConflict misses or Collision misses –When multiple blocks compete for the same set/location –Happens in set associative and direct mapped –Doesn’t happen in fully associative cache
March R. Smith - University of St Thomas - Minnesota That’s it. Questions?Questions? Creative Commons License This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.