Caches in Systems COMP25212 Cache 4
Learning Objectives To understand: –“3 x C’s” model of cache performance –Time penalties for starting with empty cache –Systems interconnect issues with caching –and solutions! –Caching and Virtual Memory COMP25212 Cache 4
Describing Cache Misses Compulsory Misses Capacity Misses Conflict Misses COMP25212 Cache 4
Cache Performance again Today’s caches, how long does it take: a) to fill L3 cache? (8MB) b) to fill L2 cache? (256KB) c) to fill L1 D cache? (32KB) Number of lines = (cache size) / (line size) Number of lines = 32K/64 = x memory access times at 20nS = 10 uS 20,000 clock cycles at 2GHz COMP25212 Cache 4
Caches in Systems e.g. disk, network COMP25212 Cache 4 L1 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Outp ut how often? internc onnect stuff
Cache Consistency Problem 1 Problem: –I/O writes to mem; cache outdated COMP25212 Cache 4 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Outp ut internc onnect stuff “I”
Cache Consistency Problem 2 COMP25212 Cache 4 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Outp ut internc onnect stuff “I” Problem: –I/O reads mem; cache holds newer
Cache Consistency Software Solutions O/S knows where I/O takes place in memory –Mark I/O areas as non-cachable (how?) O/S knows when I/O starts and finishes –Clear caches before&after I/O? COMP25212 Cache 4
Hardware Solutions:1 COMP25212 Cache 4 Unfortunately: tends to slow down cache COMP25212 Cache 4 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Outp ut internc onnect stuff “I”
Hardware Solutions: 2 - Snooping COMP25212 Cache 4 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Outp ut internc onnect stuff “I” Snoop logic in cache observes every memory cycle snoop L2 keeps track of L1 contents
Caches and Virtual Addresses CPU addresses – virtual Memory addresses – physical Recap – use TLB to translate v-to-p What addresses in cache? COMP25212 Cache 4
Option 1: Cache by Physical Addresses COMP25212 Cache 4 CPU RAM Memory On-chip address data $ TLB BUT: –Address translation in series with cache SLOW
COMP25212 Cache 4 Option 2: Cache by Virtual Addresses COMP25212 Cache 4 CPU RAM Memory On-chip address data $ TLB BUT: –Snooping? –Aliasing? More Functional Difficulties
3: Translate in parallel with Cache Lookup Translation only affects high-order bits of address Address within page remains unchanged Low-order bits of Physical Address = low-order bits of Virtual Address Select “index” field of cache address from within low- order bits Only “Tag” bits changed by translation COMP25212 Cache 4
Option 3 in operation: within line index virtual page no Virtual address data line tag line multiplexer compare = ? TLB Physical address Hit? Data
The Last Word on Caching? COMP25212 Cache 4 RAM Memory On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 Input/Outp ut On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 You ain’t seen nothing yet!