Shared Memory Multiprocessors
Symmetric Multiprocessors SMPs are the most prevalent form of parallel architectures Provide global access to a global physical address Dominate server market On their way to dominating desktop “Throughput engines” for multiple sequential jobs Also, attractive for parallel programming Uniform access via ordinary loads/stores Automatic movement/replication of shared data in local caches Can support message passing programming model No operating system involvement needed; address translation and buffer protection provided by hardware
Extended Memory Hierarchies P 1 Switch Main memory n First-level $ Shared cache I/O devices Mem P 1 $ n Bus Bus-based shared memory SMP P 1 $ Inter connection network n Mem P 1 $ Inter connection network n Mem Distributed memory (NUMA) Dancehall (UMA)
Shared Cache Interconnect between processors and shared cache 1 Switch Main memory n First-level $ Interconnect between processors and shared cache Useful for connecting small number of processors (2-8) on a board or chip Scalability: Limited Interconnect comes in the way while accessing cache Cache required to have tremendous bandwidth
Bus-Based Symmetric Shared Memory I/O devices Mem P 1 $ n Bus Small to medium scale (20-30 processors) Dominating parallel machine market Scalability Bus bandwidth is the bottleneck
Dancehall Scalability Symmetry still holds P 1 $ Inter connection network n Mem Symmetry still holds Any processor is uniformly far away from any memory block Scalability Limited by interconnection network Distance between a processor and a memory block is several hops
Distributed memory Asymmetric P 1 $ Inter connection network n Mem Asymmetric Processors are closer to their local memory blocks Exploits data locality to handle cache misses locally Scalability Most scalable among all hierarchies
Cache Coherence Problem (1) Caches play key role Reduce average data access time Reduce bandwidth demands on shared interconnect Problem with private processor caches Copies of a variable can be present in multiple caches Writes may not become visible to other processors They’ll keep accessing stale value in their caches Frequent and unacceptable!
Cache Coherence Problem (2) Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory
Cache Memory: Background
Background Block placement Block identification Block replacement Write policy Performance
Block Placement Three categories Direct mapped Fully associative Set associative
Direct Mapped Cache A memory block has only one place to go to in cache Can also be called One-way set associative (will know why soon) Block 0 Block 0 MOD (cache size) Cache Block 7 Memory Block 31
Fully Associative Cache Any memory block can be placed anywhere in the cache Block 0 Block 0 Cache Block 7 Memory Block 31
Set Associative Cache In a w-way set associative cache Divide cache into sets; each set has w blocks A memory block has w place to go to in cache Example: 2-way set associative #sets = (#cache blocks)/w = 8/2 = 4 Block 0 Block 0 Set 0 MOD (# sets) Set 3 Cache Block 7 Memory Block 31
Block Identification (1) Physical address Block address Offset Tag Index Tag Index Offset Block address Physical address Log (#sets) Log (block size) -Base 2 logarithms
Block Identification (2) Index identifies set All stored tags in the set are compared against Tag Only one should match Tag Index Offset ? 1
Block Replacement FIFO: first in first out LRU: least recently used Random
Write Policy Write-through caches Write-back caches Values get updated in main memory immediately Main memory always has up-to-date values Leads to slower performance Easier to implement Write-back caches Values don’t get updated in main memory Main memory may contain outdated values Leads to faster performance Harder to implement
Cache Performance Average access time Miss penalty Miss rate hit time + miss rate x miss penalty Miss penalty Time taken to access memory In the order of 100s of times of hit time Miss rate Depends on several factors Design Program If miss rate is too small, average access time approaches the hit time
More on Cache Memory For more, read Sections 5.1 through 5.3 of J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA, third edition, 2002.