Download presentation
Presentation is loading. Please wait.
Published byJordan Bridges Modified over 5 years ago
1
CSL718 : Multiprocessors 13th April, 2006 Introduction
Anshul Kumar, CSE IITD
2
Parallel Architectures
Flynn’s Classification [1966] Architecture Categories SISD SIMD MISD MIMD Anshul Kumar, CSE IITD
3
MIMD IS C P M IS DS IS C IS P DS Anshul Kumar, CSE IITD
4
Parallel Architectures
Sima’s Classification Parallel architectures PAs Data-parallel architectures Function-parallel Anshul Kumar, CSE IITD
5
Function Parallel Architectures
Instruction level PAs Thread level PAs Process level PAs ILPs Pipelined processors VLIWs Superscalar processors MIMDs Shared Memory MIMD Distributed Memory MIMD Built using general purpose processors Anshul Kumar, CSE IITD
6
Issues from user’s perspective
Specification / Program design explicit parallelism or implicit parallelism + parallelizing compiler Partitioning / mapping to processors Scheduling / mapping to time instants static or dynamic Communication and Synchronization Anshul Kumar, CSE IITD
7
Parallelizing example
for (i=0; i<n; i++) { m = m+3 a[i] = (a[m]+a[m+1]+a[m+2])/3 } Can all iterations be done in parallel? Dependence 1: m = m + 3 Dependence 2: a[1] = (a[3]+a[4]+a[5])/3 a[4] = (a[12]+a[13]+a[14])/3 Anshul Kumar, CSE IITD
8
Parallelizing example - contd.
Eliminate dependence based on induction variable for (i=0; i<n; i++) { m = i*3 a[i] = (a[m]+a[m+1]+a[m+2])/3 } Anshul Kumar, CSE IITD
9
Parallelizing example - contd.
Eliminate forward dependency using double buffer for (i=0; i<n; i++) { m = i*3 aa[i] = (a[m]+a[m+1]+a[m+2])/3 } barrier( ) a[i] = aa[i] Anshul Kumar, CSE IITD
10
Parallelizing example - contd.
Parallelization using dynamic thread creation and scheduling schedule(0) for (i=0; i<n; i++) { wait_till_scheduled(i) m = i*3 a[i] = (a[m]+a[m+1]+a[m+2])/3 if (i0)schedule(3*i) schedule(3*i+1) schedule(3*i+2) } Anshul Kumar, CSE IITD
11
Grain size and performance
Overhead limited load imbalance and parallelism limited Speed up Fine grain Opt grain size Coarse grain Anshul Kumar, CSE IITD
12
Speed up and efficiency
Anshul Kumar, CSE IITD
13
Amdahl’s Law Sp s 1 .5 Sp=p Sp=1
14
Generalization Sp p actual Anshul Kumar, CSE IITD
15
Shared Memory Architecture
Anshul Kumar, CSE IITD
16
Design Space of Shared Memory Architectures
Extent of address space sharing Location of memory modules Uniformity of memory access Anshul Kumar, CSE IITD
17
Address Space P1 P2 P3 P4 Each processor sees an
exclusive address space Each processor sees partly exclusive and partly shared address space Each processor sees same shared address space Anshul Kumar, CSE IITD
18
Location of Memory P M M Centralized P M P Mixed Distributed
Interconnection Network Centralized P M Interconnection Network Mixed P M Interconnection Network Distributed Anshul Kumar, CSE IITD
19
Clustered Architecture
M M M M M M M M P P P P P P P P Interconnection Network Interconnection Network M M M M M M Global Interconnection Network M M M Anshul Kumar, CSE IITD
20
Uniformity of Access UMA (Uniform Memory Access)
Uniformity across memory address space Uniformity across processors NUMA (Non-Uniform Memory Access) CC-NUMA (Cache Coherent NUMA) COMA (Cache Only Memory Architecture) UMA : Symmetrical Shared Memory Multiprocessor (SMP) NUMA : Distributed Shared Memory Multiprocessor Anshul Kumar, CSE IITD
21
Location and Sharing SHARING full partial none UMA centralized
mixed NUMA distributed Anshul Kumar, CSE IITD
22
Shared Memory with Caches
Multiple copies of data may exist Problem of cache coherence Cache coherence protocols What action is taken? Which processors/caches communicate? Status of each block? Anshul Kumar, CSE IITD
23
What action is taken? Invalidate other caches and/or memory
send a signal/message immediately, copy information only when unavoidable similar to write back policy Update other caches and/or memory write simultaneously at all places (send modifications immediately) similar to write through policy Anshul Kumar, CSE IITD
24
Which procs/caches communicate?
Snoopy protocol broadcast invalidate or update messages all processors snoop on the bus Directory based protocol maintain directory - list of copies communicate selectively directory - centralized (memory) or distributed (caches) Anshul Kumar, CSE IITD
25
Status of each cache block?
valid/invalid private/shared clean/dirty Simplest protocol (3 states) Invalid, (shared) clean, private dirty Berkeley protocol (4 states) Invalid, (shared) clean, private dirty, shared dirty Illinois, Firefly protocols (4 states) Invalid, shared clean, private clean, private dirty Dragon protocols (5 states) Invalid, shared clean/dirty private clean/dirty Anshul Kumar, CSE IITD
26
Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty invalid clean shared? dirty CPU event BUS event Anshul Kumar, CSE IITD
27
Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty RD miss invalid clean shared? WR RD miss WR miss dirty CPU event BUS event Anshul Kumar, CSE IITD
28
Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty invalid clean shared? WR miss, INV RD miss WR miss, INV dirty CPU event BUS event Anshul Kumar, CSE IITD
29
Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty RD miss invalid clean shared? WR miss, INV RD miss WR miss, INV WR RD miss WR miss dirty CPU event BUS event Anshul Kumar, CSE IITD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.