Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared Memory Multiprocessors

Similar presentations


Presentation on theme: "Shared Memory Multiprocessors"— Presentation transcript:

1 Shared Memory Multiprocessors

2 Symmetric Multiprocessors
SMPs are the most prevalent form of parallel architectures Provide global access to a global physical address Dominate server market On their way to dominating desktop “Throughput engines” for multiple sequential jobs Also, attractive for parallel programming Uniform access via ordinary loads/stores Automatic movement/replication of shared data in local caches Can support message passing programming model No operating system involvement needed; address translation and buffer protection provided by hardware

3 Extended Memory Hierarchies
P 1 Switch Main memory n First-level $ Shared cache I/O devices Mem P 1 $ n Bus Bus-based shared memory SMP P 1 $ Inter connection network n Mem P 1 $ Inter connection network n Mem Distributed memory (NUMA) Dancehall (UMA)

4 Shared Cache Interconnect between processors and shared cache
1 Switch Main memory n First-level $ Interconnect between processors and shared cache Useful for connecting small number of processors (2-8) on a board or chip Scalability: Limited Interconnect comes in the way while accessing cache Cache required to have tremendous bandwidth

5 Bus-Based Symmetric Shared Memory
I/O devices Mem P 1 $ n Bus Small to medium scale (20-30 processors) Dominating parallel machine market Scalability Bus bandwidth is the bottleneck

6 Dancehall Scalability Symmetry still holds
P 1 $ Inter connection network n Mem Symmetry still holds Any processor is uniformly far away from any memory block Scalability Limited by interconnection network Distance between a processor and a memory block is several hops

7 Distributed memory Asymmetric
P 1 $ Inter connection network n Mem Asymmetric Processors are closer to their local memory blocks Exploits data locality to handle cache misses locally Scalability Most scalable among all hierarchies

8 Cache Coherence Problem (1)
Caches play key role Reduce average data access time Reduce bandwidth demands on shared interconnect Problem with private processor caches Copies of a variable can be present in multiple caches Writes may not become visible to other processors They’ll keep accessing stale value in their caches Frequent and unacceptable!

9 Cache Coherence Problem (2)
Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory

10 Cache Memory: Background

11 Background Block placement Block identification Block replacement
Write policy Performance

12 Block Placement Three categories Direct mapped Fully associative
Set associative

13 Direct Mapped Cache A memory block has only one place to go to in cache Can also be called One-way set associative (will know why soon) Block 0 Block 0 MOD (cache size) Cache Block 7 Memory Block 31

14 Fully Associative Cache
Any memory block can be placed anywhere in the cache Block 0 Block 0 Cache Block 7 Memory Block 31

15 Set Associative Cache In a w-way set associative cache
Divide cache into sets; each set has w blocks A memory block has w place to go to in cache Example: 2-way set associative #sets = (#cache blocks)/w = 8/2 = 4 Block 0 Block 0 Set 0 MOD (# sets) Set 3 Cache Block 7 Memory Block 31

16 Block Identification (1)
Physical address Block address Offset Tag Index Tag Index Offset Block address Physical address Log (#sets) Log (block size) -Base 2 logarithms

17 Block Identification (2)
Index identifies set All stored tags in the set are compared against Tag Only one should match Tag Index Offset ? 1

18 Block Replacement FIFO: first in first out LRU: least recently used
Random

19 Write Policy Write-through caches Write-back caches
Values get updated in main memory immediately Main memory always has up-to-date values Leads to slower performance Easier to implement Write-back caches Values don’t get updated in main memory Main memory may contain outdated values Leads to faster performance Harder to implement

20 Cache Performance Average access time Miss penalty Miss rate
hit time + miss rate x miss penalty Miss penalty Time taken to access memory In the order of 100s of times of hit time Miss rate Depends on several factors Design Program If miss rate is too small, average access time approaches the hit time

21 More on Cache Memory For more, read Sections 5.1 through 5.3 of
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA, third edition, 2002.


Download ppt "Shared Memory Multiprocessors"

Similar presentations


Ads by Google