COMP SYSTEM ARCHITECTURE CACHES IN SYSTEMS Sergio Davies Feb/Mar 2014COMP25212 – Lecture 4
Learning Objectives To understand: –“3 x C’s” model of cache performance –Time penalties for starting with empty cache –Systems interconnect issues with caching and solutions! –Caching and Virtual Memory Feb/Mar 2014COMP25212 – Lecture 4
Describing Cache Misses Compulsory Misses –Cold start Capacity Misses –Even with full associativity, cache cannot contain all the blocks of the program Conflict Misses –Multiple blocks compete for the same set. This would not happen in fully associative cache Feb/Mar 2014COMP25212 – Lecture 4
Cache Performance again Today’s caches, how long does it take: a) to fill L3 cache? (8MB) b) to fill L2 cache? (256KB) c) to fill L1 D cache? (32KB) –(e.g.) Number of lines = (cache size) / (line size) Number of lines = 32K/64 = x memory access times at 20nS = 10 uS 20,000 clock cycles at 2GHz Feb/Mar 2014COMP25212 – Lecture 4
Caches in Systems e.g. disk, network L1 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Input/Output how often? (bandwidth required) Inter connect stuff Feb/Mar 2014COMP25212 – Lecture 4
Cache Consistency Problem 1 Problem: –I/O writes to mem; cache outdated Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 “3” “5” Input/Output Inter connect stuff Feb/Mar 2014COMP25212 – Lecture 4
Cache Consistency Problem 2 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 “3” “5” Problem: –I/O reads mem; cache holds newer Input/Output Inter connect stuff Feb/Mar 2014COMP25212 – Lecture 4
Cache Consistency Software Solutions O/S knows where I/O takes place in memory –Mark I/O areas as non-cachable (how?) O/S knows when I/O starts and finishes –Clear caches before&after I/O? Feb/Mar 2014COMP25212 – Lecture 4
Hardware Solutions:1 Disadvantage: tends to slow down cache Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 “5” Input/Output Inter connect stuff Feb/Mar 2014COMP25212 – Lecture 4
Hardware Solutions: 2 - Snooping Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 “5” Snoop logic in cache observes every memory cycle snoop L2 keeps track of L1 contents Input/Output Inter connect stuff Feb/Mar 2014COMP25212 – Lecture 4
Caches and Virtual Addresses CPU addresses – virtual Memory addresses – physical Recap – use Translation-Lookaside Buffer (TLB) to translate V-to-P What addresses in cache? Feb/Mar 2014COMP25212 – Lecture 4
Option 1: Cache by Physical Addresses CPU RAM Memory On-chip address data $ TLB BUT: –Address translation in series with cache SLOW Feb/Mar 2014COMP25212 – Lecture 4
Option 2: Cache by Virtual Addresses CPU RAM Memory On-chip address data $ TLB BUT: –Snooping? –Aliasing? More Functional Difficulties Feb/Mar 2014COMP25212 – Lecture 4
3: Translate in parallel with Cache Lookup Translation only affects high-order bits of address Address within page remains unchanged Low-order bits of Physical Address = low-order bits of Virtual Address Select “index” field of cache address from within low- order bits Only “Tag” bits changed by translation Feb/Mar 2014COMP25212 – Lecture 4
Option 3 in operation: within line index virtual page no Virtual address data line tag line multiplexer compare = ? TLB Physical address Hit? Data Feb/Mar 2014COMP25212 – Lecture 4
The Last Word on Caching? RAM Memory On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 Input/Outp ut On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 On-chip L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L1 Data Cache CPU L1 Inst Cache fetch dat a L2 L3 You ain’t seen nothing yet! Feb/Mar 2014COMP25212 – Lecture 4
Summary “3 x C’s” model of cache performance Systems interconnect issues with caching and solutions! –Non-cacheable areas –Cache flushing –Snooping Caching and Virtual Memory –Physical to virtual conversion (TLB) –Cache architectures to support P-to-V conversion Feb/Mar 2014COMP25212 – Lecture 4