© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. CPUs zCaches. zMemory management.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Caches and CPUs CPU cache controller cache main memory data address data address
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache operation zMany main memory locations are mapped onto one cache entry. zMay have caches for: yinstructions; ydata; ydata + instructions (unified). zMemory access time is no longer deterministic.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Terms zCache hit: required location is in cache. zCache miss: required location is not in cache. zWorking set: set of locations used by program in a time interval.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Types of misses zCompulsory (cold): location has never been accessed. zCapacity: working set is too large. zConflict: multiple locations in working set map to same cache entry.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Memory system performance zh = cache hit rate. zt cache = cache access time, t mai n = main memory access time. zAverage memory access time: yt av = ht cache + (1-h)t main
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Multiple levels of cache CPU L1 cache L2 cache
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Multi-level cache access time zh 1 = cache hit rate. zh 2 = hit rate on L2. zAverage memory access time: yt av = h 1 t L1 + (h 2 -h 1 )t L2 + (1- h 2 -h 1 )t main
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Replacement policies zReplacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. zTwo popular strategies: yRandom. yLeast-recently used (LRU).
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache organizations zFully-associative: any memory location can be stored anywhere in the cache (almost never implemented). zDirect-mapped: each memory location maps onto exactly one cache entry. zN-way set-associative: each memory location can go into one of n sets.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache performance benefits zKeep frequently-accessed locations in fast cache. zCache retrieves more than one word at a time. ySequential accesses are faster after first access.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Direct-mapped cache valid = tagindexoffset hit value tagdata 10xabcdbyte byte byte... byte cache block
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Write operations zWrite-through: immediately copy write to main memory. zWrite-back: write to main memory only when location is removed from cache.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Direct-mapped cache locations zMany locations map onto the same cache block. zConflict misses are easy to generate: yArray a[] uses locations 0, 1, 2, … yArray b[] uses locations 1024, 1025, 1026, … yOperation a[i] + b[i] generates conflict misses.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Set-associative cache zA set of direct-mapped caches: Set 1Set 2Set n... hit data
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Example: direct-mapped vs. set-associative
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Direct-mapped cache behavior zAfter 001 access: blocktagdata z After 010 access: blocktagdata
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Direct-mapped cache behavior, cont’d. zAfter 011 access: blocktagdata z After 100 access: blocktagdata
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Direct-mapped cache behavior, cont’d. zAfter 101 access: blocktagdata z After 111 access: blocktagdata
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. 2-way set-associtive cache behavior zFinal state of cache (twice as big as direct-mapped): setblk 0 tagblk 0 datablk 1 tagblk 1 data
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. 2-way set-associative cache behavior zFinal state of cache (same size as direct- mapped): setblk 0 tagblk 0 datablk 1 tagblk 1 data
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Example caches zStrongARM: y16 Kbyte, 32-way, 32-byte block instruction cache. y16 Kbyte, 32-way, 32-byte block data cache (write-back). zC55x: yVarious models have 16KB, 24KB cache. yCan be used as scratch pad memory.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Scratch pad memories zAlternative to cache: ySoftware determines what is stored in scratch pad. zProvides predictable behavior at the cost of software control. zC55x cache can be configured as scratch pad.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Memory management units zMemory management unit (MMU) translates addresses: CPU main memory management unit logical address physical address
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Memory management tasks zAllows programs to move in physical memory during execution. zAllows virtual memory: ymemory images kept in secondary storage; yimages returned to main memory on demand during execution. zPage fault: request for location not resident in memory.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Address translation zRequires some sort of register/table to allow arbitrary mappings of logical to physical addresses. zTwo basic schemes: ysegmented; ypaged. zSegmentation and paging can be combined (x86).
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Segments and pages memory segment 1 segment 2 page 1 page 2
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Segment address translation segment base addresslogical address range check physical address + range error segment lower bound segment upper bound
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Page address translation pageoffset pageoffset page i base concatenate
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Page table organizations flattree page descriptor page descriptor
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Caching address translations zLarge translation tables require main memory access. zTLB: cache for address translation. yTypically small.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. ARM memory management zMemory region types: ysection: 1 Mbyte block; ylarge page: 64 kbytes; ysmall page: 4 kbytes. zAn address is marked as section-mapped or page-mapped. zTwo-level translation scheme.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. ARM address translation offset1st index2nd index physical address Translation table base register 1st level table descriptor 2nd level table descriptor concatenate