Processor support devices Part 2: Caches and the MESI protocol

1 Processor support devices Part 2: Caches and the MESI protocol A.C. Verschueren Eindhoven University of Technology Section of Digital Information Systems

2 The memory speed ‘gap’ High-performance processors are much too fast for the main memory they are connected to Processors running at 1000 MegaHerz would like a memory read/write cycle time of 1 nanosecond Large memories with (relatively) cheap RAM’s have cycle times on the order of 100 nanoseconds 100 times slower, this speed gap continues to grow...

3 Wide words and memory banking
The gap can be closed IF the processor tolerates a long delay between the start and end of a cycle 4 words in parallel 4 accesses in parallel 0..3 read use use read 1 1 2 2 3 Complex timing 3 4..7 4 4 Lots of pins 5 5 6 6 7 7 1) Wide memory words 2) Multiple memory 'banks'

4 The big IF in closing the gap
Long memory access delays can be tolerated IF addresses are known in advance True for sequential instruction reads NOT true for most of the other read operations Memory reading MUST become quicker! Not interested in (timing of) write operations Data & address to memory, then forget about it...

5 Small-scale virtual memory: the cache
‘Cache’ is French: ‘secret hiding place’ Small-scale virtual memory: the cache A 'cache' is a small but very fast memory which contains the 'most active' memory words IF a requested memory word is in the cache THEN supply the word from the cache {very fast} ELSE supply the word from main memory {rather slow} and place it in the cache for later references (throwing out not used words when needed) An ideal cache knows which words will be used soon A good cache reaches 95% THEN and only 5% ELSE

6 Keeping the cache hidden
The cache must keep a copy of memory words Memory mapped I/O ports are problematic These can spontaneously change their value ! Have to be made'non-cacheable’ at all times Shared memory is problematic too Make it non-cacheable (from all sides), or better Inform all attached caches of changes (write actions)

7 Cache writing policies
'write-through’: written data copied into memory Option: write to cache only if word is already present The amount of data in the cache can be reduced Read after non-cached write requires true memory read 'posted write’: writes buffered until the bus is free Gives priority to reads, allows high speed write bursts More hardware, delay between CPU and memory write 'late write’: write only to make free space in cache Reduces the amount of memory write cycles drastically Complex cache control, especially with shared memory! Pentium

8 An example of a cache CPU (80386) bus switch main memory data address control CPU bus system bus cache memory cache controller (82385) administration To reduce the amount of administration memory, a single cache 'line' administrates 8 word blocks

9 Intel 82385 'direct mapped’ cache mode
'tag' 17 line 10 word 3 byte 2 32 bits address: Line select 'hit' word select 32 bit data 'word valid' 'line valid' 17 bit tags 1024 lines word #0 word #7 Also known as '1-way set associative’ prone to ‘tag clashing’ !

10 Intel 82385 ’2-way set associative’ mode
32 bits address: word 3 17 bit tags 1024 lines 'line valid' Line select word select byte 2 line 10 'tag' 17 32 bit data 'word valid' 'hit' word #0 word #7 9 18 18 bit tags 512 lines 'hit' hit logic LRU bits ’Least Recently Used' bits indicate which set in each line has been used last (the other is replacement target)

11 The MESI protocol Late write and shared memory combine badly
The 'MESI' protocol solves this with four states for each of the cache words (or lines) Modified: cached data differs from the main memory and is only located in this cache Exclusive: cached data is the same as main memory and is only located in this cache Shared: cached data is the same as main memory and also located in one or more other caches Invalid: cache word/line not loaded with memory data

12 State changes in the MESI protocol
Induced by processor read/write actions and actions of other cache controllers Caches keep track of other read/write actions Uses ’bus snooping’: monitoring the address and control buses when they are driven by someone else During a memory access, other cache controllers indicate if one of them contains the accessed location Needed to decide between the Shared/Exclusive states!

13 Intel CPU accesses Pentium A read hit reads the cache, does not change state A read miss reads memory, other controllers check if they also contain the address read A write hit handling depends on the state If Shared, write is done in main memory too If Exclusive or Modified, write is only done in cache A write miss writes to memory, but not the cache Other caches may change their state! Normal MESI: write cache too

14 Intel 82496 state diagram read hit write miss
read miss & somewhere else Invalid Modified Shared Exclusive snoop read any snoop snoop write snoop read (*) read miss, only here snoop write snoop write snoop read write hit (write to memory) read hit write hit (setup for late write) read/write hit (*): This controller copies local data to memory immediately

15 Final remarks on caches (1)
High performance processors rely on caches Main memory must be accessed in a single clock cycle At 1 GHz, the cache must be on the CPU chip But a large & fast cache takes a lot of chip space! Second level cache CPU chip off-chip cache large(r) & slow(er) main memory huge & very slow CPU on-chip cache small & fast First level cache

16 Final remarks on caches (2)
The off-chip cache becomes as slow as main memory was some time ago... Second level cache placed on the CPU chip too Examples: power-PC, Crusoe (both > 256 KiloByte!) The external cache becomes a third-level cache Data transfer between on-chip caches can be done a complete cache line in parallel: a huge speedup

