Download presentation
Presentation is loading. Please wait.
1
Parallel Computing Overview CS 524 – High-Performance Computing
2
CS 524 (Wi 2003/04)- Asim Karim @ LUMS2 Parallel Computing Multiple processors that are able to work cooperatively to solve a computational problem Example of parallel computing include specially designed parallel computers and algorithms to geographically distributed network of workstations cooperating on a task There are problems that cannot be solved by present- day serial computers or they take an impractically long time to solve Parallel computing exploits concurrency and parallelism inherent in the problem domain Task parallelism Data parallelism
3
CS 524 (Wi 2003/04)- Asim Karim @ LUMS3 Development Trends Advances in IC technology and processor design CPU performance double every 18 months for past 20+ years (Moore’s Law) Clock rates increase from 4.77 MHz for 8088 (1979) to 3.2 GHz for Pentium 4 (2003) FLOPS increase from a handful (1945) to 35.86 TFLOPS (Earth Simulator by NEC, 2002 to date) Decrease in cost and size Advances in computer networking Bandwidth increase from a few bits per second to > 10 Gb/s Decrease in size and cost, and increase in reliability Need Solution of larger and more complex problems
4
CS 524 (Wi 2003/04)- Asim Karim @ LUMS4 Issues in Parallel Computing Parallel architectures Design of bottleneck-free hardware components Parallel programming models Parallel view of problem domain for effective partitioning and distribution of work among processors Parallel algorithms Efficient algorithms that take advantage of parallel architectures Parallel programming environments Programming languages, compilers, portable libraries, development tools, etc
5
CS 524 (Wi 2003/04)- Asim Karim @ LUMS5 Two Key Algorithm Design Issues Load balancing Execution time of parallel programs is the time elapsed from start of processing by the first processor to end of processing by the last processor Partitioning of computational load among processors Communication overhead Processors are much faster than communication links Partitioning of data among processors
6
CS 524 (Wi 2003/04)- Asim Karim @ LUMS6 Parallel MVM: Row-Block Partition do i = 1, N do j = 1, N y(i) = y(i)+A(i,j)*x(j) end do A x y j i P0 P3 P1 P2 P0P1P2P3
7
CS 524 (Wi 2003/04)- Asim Karim @ LUMS7 Parallel MVM: Column-Block Partition do j = 1, N do i = 1, N y(i) = y(i)+A(i,j)*x(j) end do A x y j i P0 P3 P1 P2 P0P1P2P3
8
CS 524 (Wi 2003/04)- Asim Karim @ LUMS8 Parallel MVM: Block Partition Can we do any better? Assume same distribution of x and y Can A be partitioned to reduce communication? A x y j i P0 P3 P1 P2 P0P1P2P3 P0P1 P2P3
9
CS 524 (Wi 2003/04)- Asim Karim @ LUMS9 Parallel Architecture Models Bus-based shared memory or symmetric multiprocessor [SMP] (e.g. suraj, dual/quad processor Xeon machines) Network-based distributed-memory (e.g. Cray T3E, our linux cluster) Network-based distributed-shared-memory (e.g. SGI Origin 2000) Network-based distributed shared-memory (e.g. SMP clusters)
10
CS 524 (Wi 2003/04)- Asim Karim @ LUMS10 Bus-Based Shared-Memory (SMP) Any processor can access any memory location at equal cost (symmetric multiprocessor) Tasks “communicate” by writing/reading commonly accessible locations Easier to program Cannot scale beyond 30 processors (bus bottleneck) Examples: most workstation vendors make SMPs (Sun, IBM, Intel-based SMPs), Cray T90, SV1 (uses cross-bar) Shared memory Bus PPPP
11
CS 524 (Wi 2003/04)- Asim Karim @ LUMS11 Network-Connected Distributed-Memory Each processor can only access own memory Explicit communication by sending and receiving messages More tedious to program Can scale to thousand of processors Examples: Cray T3E, clusters PPPP MMMM Interconnection network
12
CS 524 (Wi 2003/04)- Asim Karim @ LUMS12 Network-Connected Distributed-Shared- Memory Each processor can directly access any memory location Physically distributed memory Non-uniform memory access costs Example: SGI Origin 2000 PPPP MMMM Interconnection network
13
CS 524 (Wi 2003/04)- Asim Karim @ LUMS13 Network-Connected Distributed Shared- Memory Network of SMPs Each SMP can only access own memory Explicit communication between SMPs Can take advantage of both shared-memory and distributed- memory programming models Can scale to hundreds of processors Examples: SMP clusters PP MM Interconnection network Bus PP
14
CS 524 (Wi 2003/04)- Asim Karim @ LUMS14 Parallel Programming Models Global-address (or shared-memory) space model POSIX threads (PThreads) OpenMP Message passing (or distributed-memory) model MPI (message passing interface) PVM (parallel virtual machine) Higher level programming environments High-Performance Fortran (HPF) PETSc (portable extensible toolkit for scientific computation) POOMA (parallel object-oriented methods and applications)
15
CS 524 (Wi 2003/04)- Asim Karim @ LUMS15 Other Parallel Programming Models Task and channel Similar to message passing Instead of communicating between named tasks (as in message passing model), it communicates through named channels SPMD (single program multiple data) Each processor executes the same program code that operates on different data Most message passing programs are SPMD Data parallel Operations on chunks of data (e.g. arrays) are parallelized Grid Problem domain viewed in parcels with processing for parcel(s) allocated to different processors
16
CS 524 (Wi 2003/04)- Asim Karim @ LUMS16 Example real a(n,n), b(n,n) do k = 1, NumIter do i = 2, n-1 do j = 2, n-1 a(i,j) = (b(i-1,j) + b(i,j-1 + b(i+1,j) + b(i,j+1))/4 end do do i = 2, n-1 do j = 2, n-1 b(i,j) = a(i,j) end do
17
CS 524 (Wi 2003/04)- Asim Karim @ LUMS17 Global-Address Space Model: OpenMP real a(n,n), b(n,n) c$omp parallel shared(a,b,k) private(i,j) do k = 1, NumIter c$omp do do i = 2, n-1 do j = 2, n-1 a(i,j) = (b(i-1,j) + b(i,j-1) + b(i+1,j) + b(i,j+1))/4 end do c$omp do do i = 2, n-1 do j = 2, n-1 b(i,j) = a(i,j) end do
18
CS 524 (Wi 2003/04)- Asim Karim @ LUMS18 Message Passing Pseudo-code real aLoc(NdivP,n), bLoc(0:NdivP+1,n) me = get_my_procnum() do k = 1, NumIter if (me.ne. P-1) send(me+1, bLoc(NdivP, 1:n)) if (me.ne. 0) recv(me-1, bLoc(0, 1:n)) if (me.ne. 0) send(me-1, bLoc(1, 1:n)) if (me.ne. P-1) recv(me+1, bLoc(NdivP+1, 1:n)) if (me.eq. 0) then ibeg = 2 else ibeg = 1 endif if (me.eq. P-1) then iend = NdivP-1 else iend = NdivP endif do i = ibeg, iend do j = 2, n-1 aLoc(i,j) = (bLoc(i-1,j) + bLoc(i,j-1) + bLoc(i+1,j) + bLoc(i,j+1))/4 end do do i = ibeg, iend do j = 2, n-1 bLoc(i,j) = aLoc(i,j) end do
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.