Chapter 5 Part I: Shared Memory Multiprocessors Small multiprocessor Typically uses SMP (symmetric multiprocessor) architecture Shared address space directed supported by the hardware Common memory hierarchy configurations: Figure 5.2 Shared cache Bus-based SMP most common SMP arch. Dancehall Typically uses MIN (multistage interconnection network) Distributed memory (asymmetric) Shared memory supported through “directory” methods EECE 550
Cache Coherence When a memory location is read, memory should provide the latest value written to that location Uniprocessor systems use a memory hierarchy There is no cache coherence problem Multiprocessor systems typically have multiple caches Copies of the same data may reside in different caches Potential cache coherence problem EECE 550
Example of Cache Coherence Problem U = ? U = ? U = 7 Cache 4 Cache 5 Cache 3 U: 5 U: 5 1 2 U: 5 Memory EECE 550
Cache Coherency Formal Definition (bottom of p. 276) Informal Definition The memory system should “behave” as if all processors obtain all of their data from a single memory store. Properties required for cache coherence Write propagation Writes must become visible to all other processes Write serialization All writes to a location (by 1 or more processes) are seen in the same order by ALL processes EECE 550
Bus Snooping Concept shown in Figure 5.4 Snooping protocol requires Requires continuous monitoring of the bus by each cache’s cache controller Snooping protocol requires A set of states associated with memory blocks in local caches A state transition diagram, showing the required state changes for a matching block Actions associated with each state transition EECE 550
Uniprocessor Cache Concepts Write-through Information is written to BOTH cache AND to main memory Write-back Information is written to cache only Modified cache block is tagged as “dirty” and later written to main memory Dirty block written when it needs to be flushed to to block replacement EECE 550
Possible write miss policies Write-allocate Transfer block to cache, and then update value Write-no-allocate Block is modified in main memory only Cache block placement strategies Direct-mapped Only one possible location for each memory address Fully-associative Data for a given memory address can be stored anywhere in the cache Set-associative Data for a given memory address can be stored in a limited set of locations in the cache EECE 550
Bus Snooping Write-through cache Figure 5.5 Snooping is simpler since all writes can be seen on the bus Problems with scaling All writes generate bus traffic Figure 5.5 Bus snooping with write-through, write-no-allocate policy Suppose that a write-through, write-allocate policy is used How should Figure 5.5 be modified? EECE 550
Partial Order for Cache Coherence Total ordering can be based on partial orders Refer to middle of p. 282 Example: Figure 5.6 Partial order with write-through invalidation protocol Example 5.3 EECE 550
Memory Consistency “A memory consistency model … specifies constraints on the order in which memory operations must appear to be performed … with respect to one another.” [Culler et. al. 1999, p. 285] Event synchronization through flags Figure 5.7 Explicit synchronization using barriers Figure 5.8 Order among accesses without synchronization Figure 5.9 EECE 550
Sequential Consistency Values become visible to a process according to some sequential interleaving of the memory accesses for all processes Formal definition p. 286 (referenced from [Lamport 1979]) Figure 5.10: Programmer’s view of sequential consistency Note: inter-process synchronization still required Write atomicity Example 5.4 All writes (to any location) should appear to all processors to have occurred in the same order EECE 550
Sufficient conditions for preserving sequential consistency (p. 289) Every process issues memory operations in program order After a write is issued, the issuing process waits for the write to complete before issuing next operation After a read operation is issued If the write whose value is being returned has performed with respect to this processor, then the processor should wait until the write has performed with respect to all processors. Example 5.5: Re-ordering of memory operations (Figure 5.7) Creates problems for parallel or multithreaded program EECE 550