Download presentation
Presentation is loading. Please wait.
1
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462
2
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 2 Architectural Concepts l Distributed Memory MIMD –Replicate the processor/memory pairs –Connect them via an interconnection network l Shared Memory MIMD –Replicate the processors –Replicate the memories –Connect them via an interconnection network
3
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 3 Distributed Memory Machine l Access to local memory module is much faster than remote l Hardware remote accesses via –Load/Store primitive –Message passing layer l Cache memory for local memory traffic l Message –Memory-memory –Cache-cache Processor 1 Processor p Interconnection Network Memory
4
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 4 Advantages of Distributed Memory l Local memory traffic less contention than in shared memory l Highly scalable l Don’t need sophisticated synchronization features like monitors, semaphores. Message passing serves dual purpose –To send the data –Provide synchronization
5
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 5 Problems of Distributed Memory l Load balancing l Message passing can lead to synchronization failures, including deadlock –BlockingSend -> BlockingReceive –BlockingReceive -> BlockingSend l Intensive data copying of whole structures l Small message overheads are high
6
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 6 Shared Memory Architecture l All processors have equal access to shared memory modules l Local Caches reduce –Memory traffic –Network traffic –Memory access time l IP Synchronisation –Indivisible load/store Processor 1 Processor 2 Processor p Interconnection Network Memory Module 1 Memory Module 2 Memory Module m
7
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 7 Advantages of Shared Memory l No need to partition code or data –Occurs on the fly l No need to move data explicitly l Don’t need new programming languages or compilers.
8
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 8 Disadvantages of Shared Memory l Synchronization is difficult l Lack of scalability –IPC becomes bottleneck l Scalability can be addressed by –High throughput, low latency network –Cache Memories Causes coherence problem –Distributed shared memory architecture
9
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 9 Distributed Shared Memory l Three design choices –Non-uniform memory access (NUMA) Like Cray T3D –Cache coherent non-uniforms memory access (CC-NUMA) Convex SPP, Stanford DASH –Cache-only memory access (COMA) Like KSR-1
10
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 10 Non-uniform memory access (NUMA) P0P0 M0M0 PE 0 P1P1 M1M1 PE 1 PnPn MnMn PE n Interconnection Network
11
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 11 Cache coherent non-uniforms memory access (CC-NUMA) Interconnection Network P0P0 M0M0 PE 0 C0C0 P1P1 M1M1 PE 1 C1C1 PnPn MnMn PE n CnCn
12
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 12 Cache-only memory access (COMA) Interconnection Network P0P0 PE 0 C0C0 P1P1 PE 1 C1C1 PnPn PE n CnCn
13
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 13 Classification of MIMD Computers MIMD Computers Process-level architectures Single Address Space shared Memory Physical Shared memory (UMA) Virtual Distributed Shared Memory NUMA CC-NUMA COMA Multiple Address Space distributed Memory Thread Level architectures Single address space shared memory Physical Shared Memory (UMA) Virtual Distributed Shared Memory NUMA CC-NUMA
14
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 14 Problems of Scalable Computers l Tolerate and hide the latency of remote loads –Worse if output of one computation relies on another to complete l Tolerate and hide idling due to synchronization among processors
15
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 15 Tolerating Remote Loads P0P0 PE 0 Interconnection Network M0M0 rA rB Result P1P1 PE 1 M1M1 A PnPn PE n MnMn B Result:= A + B Load A rA A A Load B rB B B
16
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 16 Tolerating Latency l Cache memory –Simply lowers the cost of remote access –Introduces cache coherence problem l Prefetching –Already present, so cost is low –Increases network load l Threads + fast context switching –Accept that it will take a long time and cover the overhead l These solutions don’t solve synchronization issues –Latency tolerant algorithms
17
David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley 1997 17 Design issues of scalable MIMD l Processor Design –Pipelining, parallel instruction issue –Atomic data access, prefetching, cache memory, message passing, etc l Interconnection network design –Scalable, high bandwidth, low latency l Memory design –Shared memory design –Cache coherence l IO Subsystem –Parallel IO
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.