Download presentation
Presentation is loading. Please wait.
Published byAriel Caldwell Modified over 9 years ago
1
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic
2
2 Overview Hardware model Programming model Message Passing Interface
3
3 Generic Model Of A Message-passing Multicomputer [5] Gyula Fehér Message-passing direct network interconnection Node
4
4 Generic Node Architecture [5] Gyula Fehér Node External channel Node-processor Processor + Local memory +.... Router Communication Processor + Switch unit+.... External channel External channel External channel Internal channel(s) Node Thin-Node -small processor -small memory -one-few chips -cheap/node -high parallelism Fat-Node -powerful processor -large memory -many chips -costly/node -moderate parallelism
5
5 Generic Organization Model [5] Gyula Fehér S P+M CP S P+M CP (b) Decentralized Switching network P+M CP P+M CP P+M CP (c) Centralized
6
6 Message Passing Properties [1] Complete computer as building block, including I/O Programming model: directly access only private address space (local memory) Communication via explicit messages (send/receive) Communication integrated at I/O level, not memory system, so no special hardware Resembles a network of workstations (which can actually be used as multiprocessor systems)
7
7 Message Passing Program [1] Problem: Sum all of the elements of an array of size n. INITIALIZE; //assign proc_num and num_procs if (proc_num == 0) //processor with a proc_num of 0 is the master, //which sends out messages and sums the result { read_array(array_to_sum, size); //read the array and array size from file size_to_sum = size/num_procs; for (current_proc = 1; current_proc < num_procs; current_proc++) { lower_ind = size_to_sum * current_proc; upper_ind = size_to_sum * (current_proc + 1); SEND(current_proc, size_to_sum); SEND(current_proc, array_to_sum[lower_ind:upper_ind]); } //master nodes sums its part of the array sum = 0; for (k = 0; k < size_to_sum; k++) sum += array_to_sum[k]; global_sum = sum; for (current_proc = 1; current_proc < num_procs; current_proc++) { RECEIVE(current_proc, local_sum); global_sum += local_sum; } printf(“sum is %d”, global_sum); } else //any processor other than proc_num = 0 is a slave { sum = 0; RECEIVE(0, size_to_sum); RECEIVE(0, array_to_sum[0 : size_to_sum]); for (k = 0; k < size_to_sum; k++) sum += array_to_sum[k]; SEND(0, sum); } END;
8
8 Message Passing Program (cont.) [1] Multiprocessor Software Functions Provided: INITIALIZE – assigns a number (proc_num) to each processor in the system, assigns the total number of processors (num_procs). SEND(receiving_processor_number, data) - sends data to another processor BARRIER(n_procs) – When a BARRIER is encountered, a processor waits at that BARRIER until n_procs processors reach the BARRIER, then execution can proceed.
9
9 Advantages [1] Advantages –Easier to build than scalable shared memory machines –Easy to scale (but topology is important) –Programming model more removed from basic hardware operations –Coherency and synchronization is the responsibility of the user, so the system designer need not worry about them. Disadvantages –Large overhead: copying of buffers requires large data transfers (this will kill the benefits of multiprocessing, if not kept to a minimum). –Programming is more difficult. –Blocking nature of SEND/RECEIVE can cause increased latency and deadlock issues.
10
10 Message-Passing Interface – MPI [3] Standardization - MPI is the only message passing library which can be considered a standard. It is supported on virtually all HPC platforms. Practically, it has replaced all previous message passing libraries. Portability - There is no need to modify your source code when you port your application to a different platform that supports the MPI standard. Performance Opportunities - Vendor implementations should be able to exploit native hardware features to optimize performance. Functionality - Over 115 routines are defined. Availability - A variety of implementations are available, both vendor and public domain.
11
11 MPI basics [3] Start Processes Send Messages Receive Messages Synchronize With these four capabilities, you can construct any program. MPI offers over 125 functions.
12
12 Communicators [3] Provide a named set of processes for communication: –System allocated unique tags to processes –All processes can be numbered from 0 to n-1 –Allow construction of libraries: application creates communicators MPI_COMM_WORLD –MPI uses objects called communicators and groups to define which collection of processes may communicate with each other. –Provide functions (split, duplicate,...) for creating communicators from other communicators –Functions (size, my_rank, …) for finding out about all processes within a communicator Blocking vs. non-blocking
13
13 Hello world example [3] #include #include "mpi.h" main(int argc, char** argv) { int my_PE_num; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_num); printf("Hello from %d.\n", my_PE_num); MPI_Finalize(); }
14
14 Hello world example [3] Hello from 5. Hello from 3. Hello from 1. Hello from 2. Hello from 7. Hello from 0. Hello from 6. Hello from 4.
15
15 MPMD [3] Use MPI_Comm_rank: if (my_PE_num = 0) Routine1 else if (my_PE_num = 1) Routine2 else if (my_PE_num =2) Routine3...
16
16 Blocking Sending and Receiving Messages [3] #include #include "mpi.h" main(int argc, char** argv) { int my_PE_num, numbertoreceive, numbertosend=42; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_num); if (my_PE_num==0) { MPI_Recv( &numbertoreceive, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); printf("Number received is: %d\n", numbertoreceive); } else MPI_Send( &numbertosend, 1, MPI_INT, 0, 10, MPI_COMM_WORLD); MPI_Finalize(); }
17
17 Non-Blocking Message Passing Routines [4] #include "mpi.h" #include int main(int argc, char *argv[]) { int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2; MPI_Request reqs[4]; MPI_Status stats[4]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); prev = rank-1; next = rank+1; if (rank == 0) prev = numtasks - 1; if (rank == (numtasks - 1)) next = 0; MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]); MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]); MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]); MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]); { do some work } MPI_Waitall(4, reqs, stats); MPI_Finalize(); }
18
18 Collective Communications [3] The Communicator specifies a process group to participate in a collective communication MPI implements various optimized functions: –Barrier synchronization –Broadcast –Reduction operations: with one destination or all in group destination Collective operations may or may not synchronize
19
19 Comparison MPI vs. OpenMP FeaturesOpenMPMPI Apply parallelism in steps yes no Scale to large number of processors maybeyes Code complexity Small increaseMajor increase Runtime environment Expensive compilersFree Cost of hardware Very expensiveCheap Ease of modification EasyHard
20
20 References 1.J. Kowalczyk, “Multiprocessor Systems,” Xilinx, 2003. 2.D. Culler, J. P. Singh, Parallel Computer Architectures, A Hardware/Software Approach, Morgan Kaufman, 1999. 3.MPI BasicsMPI Basics 4.Message Passing Interface (MPI)Message Passing Interface (MPI) 5.D. Sima, T. Fountain and P. Kascuk, Advanced Computer Architectures – A Design Space Approach, Pearson, 1997.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.