COMP25212 - SYSTEM ARCHITECTURE CACHES IN SYSTEMS Antoniu Pop Antoniu.Pop@manchester.ac.uk Jan/Feb 2015 COMP25212 – Lecture 4
Learning Objectives To understand: “3 x C’s” model of cache performance Time penalties for starting with empty cache Systems interconnect issues with caching and solutions! Caching and Virtual Memory Jan/Feb 2015 COMP25212 – Lecture 4
Describing Cache Misses Compulsory Misses Cold start Capacity Misses Even with full associativity, cache cannot contain all the blocks of the program Conflict Misses Multiple blocks compete for the same set. This would not happen in fully associative cache How can we avoid them? Jan/Feb 2015 COMP25212 – Lecture 4
Cache Performance again Today’s caches, how long does it take: a) to fill L3 cache? (8MB) b) to fill L2 cache? (256KB) c) to fill L1 D cache? (32KB) (e.g.) Number of lines = (cache size) / (line size) Number of lines = 32K/64 = 512 512 x memory access times at 20nS = 10 uS 20,000 clock cycles at 2GHz 131 072 x 20nS = 2.5 mS 4K x 20nS = 100 uS Jan/Feb 2015 COMP25212 – Lecture 4
Caches in Systems how often? (bandwidth required) e.g. disk, network L1 Inst Cache L2 CPU fetch RAM Memory L1 Data Cache data e.g. SATA = 300 MB/s – one byte every 3 nS (64 bytes every 200 nS) e.g. 1G Ethernet = 1 bit every nS, or 64 bytes every 512 nS e.g. 10Gig E? On-chip Inter connect stuff how often? (bandwidth required) e.g. disk, network Input/Output Jan/Feb 2015 COMP25212 – Lecture 4
Cache Consistency Problem 1 Inst Cache L2 “5” CPU fetch RAM Memory Data Cache “3” data “3” On-chip Inter connect stuff Problem: I/O writes to mem; cache outdated Input/Output Jan/Feb 2015 COMP25212 – Lecture 4
Cache Consistency Problem 2 Inst Cache L2 “5” CPU fetch RAM Memory Data Cache “3” data “5” On-chip Inter connect stuff Problem: I/O reads mem; cache holds newer Input/Output Jan/Feb 2015 COMP25212 – Lecture 4
Cache Consistency Software Solutions O/S knows where I/O takes place in memory Mark I/O areas as non-cachable (how?) O/S knows when I/O starts and finishes Clear caches before&after I/O? Jan/Feb 2015 COMP25212 – Lecture 4
Hardware Solutions:1 Disadvantage: tends to slow down cache L1 “5” Inst Cache L2 “5” CPU fetch RAM Memory Data Cache “5” data “5” Issues? On-chip Inter connect stuff Disadvantage: tends to slow down cache Input/Output Jan/Feb 2015 COMP25212 – Lecture 4
Hardware Solutions: 2 - Snooping Inst Cache L2 “5” CPU fetch RAM Memory Data Cache “5” data “5” Issues? On-chip Inter connect stuff L2 keeps track of L1 contents Snoop logic in cache observes every memory cycle Input/Output Jan/Feb 2015 COMP25212 – Lecture 4
Caches and Virtual Addresses CPU addresses – virtual Memory addresses – physical Recap – use Translation-Lookaside Buffer (TLB) to translate V-to-P What addresses in cache? Jan/Feb 2015 COMP25212 – Lecture 4
Option 1: Cache by Physical Addresses TLB $ CPU address RAM Memory data On-chip BUT: Address translation in series with cache SLOW Jan/Feb 2015 COMP25212 – Lecture 4
Option 2: Cache by Virtual Addresses $ TLB CPU address RAM Memory data On-chip BUT: Snooping? Aliasing? More Functional Difficulties Jan/Feb 2015 COMP25212 – Lecture 4
3: Translate in parallel with Cache Lookup Translation only affects high-order bits of address Address within page remains unchanged Low-order bits of Physical Address = low-order bits of Virtual Address Select “index” field of cache address from within low-order bits Only “Tag” bits changed by translation Jan/Feb 2015 COMP25212 – Lecture 4
Option 3 in operation: 20 5 7 Virtual address virtual page no index within line TLB tag line data line What are my assumptions? Line size = 2^7 bytes – 128 bytes Index = 5 bits => cache has 2^5 lines (32 lines) – 4K byte cache Page size = 12 bits – 4K page How can we increase cache size? (multi-way set-associativity) – 8-way set associativity would give us 32K cache Physical address multiplexer compare = ? Hit? Data Jan/Feb 2015 COMP25212 – Lecture 4
The Last Word on Caching? On-chip L1 Data Cache CPU Inst fetch data L2 L3 On-chip L1 Data Cache CPU Inst fetch data L2 L3 On-chip L1 Data Cache CPU Inst fetch data L2 L3 RAM Memory Input/Output You ain’t seen nothing yet! Jan/Feb 2015 COMP25212 – Lecture 4
Summary “3 x C’s” model of cache performance Systems interconnect issues with caching and solutions! Non-cacheable areas Cache flushing Snooping Caching and Virtual Memory Physical to virtual conversion (TLB) Cache architectures to support P-to-V conversion Jan/Feb 2015 COMP25212 – Lecture 4