Plan: I. Introduction: Programming Model II. Basic MPI Command III. Examples IV. Collective Communications V. More on Communication modes VI. References.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
MPI Collective Communication CS 524 – High-Performance Computing.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
1 Review –6 Basic MPI Calls –Data Types –Wildcards –Using Status Probing Asynchronous Communication Collective Communications Advanced Topics –"V" operations.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
CS 584.
An Introduction to Parallel Programming with MPI
More on MPI Nonblocking point-to-point routines Deadlock
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
5- Message-Passing Programming
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

Plan: I. Introduction: Programming Model II. Basic MPI Command III. Examples IV. Collective Communications V. More on Communication modes VI. References on MPI Basic examples with MPI M. Garbey Reference:

The program is executed by one and only one processor All variables and constants of the program are allocated in central memory I Introduction Definition Model for a sequential program Memory PE Programme

The program is written in a classical language (Fortran, C, C ++, ….) The computer is an ensemble of processors with an arbitrary interconnection topology Each processor has its own medium-size local memory. Each processor executes its own program. Processors communicate by message passing. Any processor can send a message to any other processor. There are no shared resources (CPU, Memory…) I Introduction Message Passing Programming Model

I Introduction Message Passing Programming Model Memory Processus Program Network

I Introduction Message Passing Programming Model memory Processus Single Program Multiple Data Network

I Introduction Execution model: S P M D Single Program Multiple Data The same program is executed by all the processors Most of the computers can run this model. It is a particular case of MPMD, but SMPD can emulate MPMD. If processor is in set A then do piece of code A If processor is in set B then do piece of code B …….

I Introduction Process = Basic Unit of Computation A program written in a “standard” sequential language with library calls to implement message passing. A process executes on a node - other processes may execute simultaneously on other nodes A process communicates and synchronizes with other processes via messages. A process is uniquely identified by its label A process does not migrate …..

I Introduction Processes communicate and synchronize with each other by sending and receiving messages. (No global variables or shared memory) Processes execute independently and asynchronously (no global synchronizing clock) Processes my be unique and work on own data set Any process may communicate with any other process (A priori no limitation on message passing)

I Introduction Common Communication Patterns One processor to one processor One processor to many processors Input data Many processors to one processor Printing results Global operations Many processors to many processors Algorithm step (FFT …..)

integer code c -- start MPI call MPI_INIT(code) call MPI_Finalize(code) c -- end MPI MPI_InitInitialize MPI MPI_Comm_Sizegives the number of processes MPI_Comm_Rankgive the number of the process MPI_SendSend a message MPI_RecvReceive a message MPI_Finalizeend MPI environment II. 6 basic functions of MPI

integer nb_procs, rank, code c -- gives the number of processes running in the code: call MPI_COMM_SIZE(MPI_COMM_WORLD, nb_procs, code) c -- gives the rank of the process running this function: call MPI_COMM_RANK(MPI_COMM_WORLD, rank, code) NOTE: 0 =< rank =< nb_procs - 1 NOTE: MPI_COMM_WORLD is for the set of all processes running in the code MPI_Comm_Sizegives the number of processes MPI_Comm_Rankgive the rank of the process II. 6 basic functions of MPI

Program who_i_am implicit none include ‘mpif.h’ integer nb_procs, rang, code call MPI_INIT(code) call MPI_COMM_SIZE(MPI_COMM_WORLD,nb_procs,code) call MPI_COMM_RANK(MPI_COMM_WORLD,rank,code) print *, ‘ I am the process ‘, rank, ‘among’, nb_procs call MPI_FINALIZE(code) end program who_i_am > mpirun -np 4 who_i_am I am the process 3 among 7 I am the process 0 among 4 I am the process 2 among 4 I am the process 1 among 4 II. 6 basic functions of MPI

MPI_SendSend a message MPI_RecvReceive a message II. 6 basic functions of MPI

Program node_to_node implicit none include ‘mpif.h’ integer status(MPI_STATUS_SIZE) integer code, rank, value, tag parameter(tag=100) call MPI_INIT(code) call MPI_COMM_RANK(MPI_COMM_WORLD,rank,code) if (rank.eq. 1) then value=1000 call MPI_SEND(value,1,MPI_INTEGER, 5, tag, MPI_COMM_WORLD, code) elseif (rank.eq. 5) then call MPI_RECEV(value,1, MPI_INTEGER, 1, tag, MPI_COMM_WORLD,statut,code) end if call MPI_FINALIZE(code) end program node_to_node MPI_SendSend a message MPI_RecvReceive a message II. 6 basic functions of MPI

value is the number of type MPI_INTEGER that is sent each message should have a tag This protocol of communication is a Synchronous send and a Synchronous receive. MPI_SEND(value,1,MPI_INTEGER, 5, tag, MPI_COMM_WORLD, code) blocks the excecution of the code until the send is completed, value can be reused, but no guarantee that message has been received. MPI_RECEV(value,1, MPI_INTEGER, 1, tag, MPI_COMM_WORLD,status,code) blocks the execution of the code until the receive is completed NOTE: at the beginning, use print command to check that things are OK! MPI_SendSend a message MPI_RecvReceive a message II. 6 basic functions of MPI

sender must specify a valid destination rank receiver must specify a valid source rank may use wildcard: MPI_ANY_SOURCE the communicator must be the same Tags must match may use wildcard: MPI_ANY_TAG Message types must match Receiver’s buffer must be large enough For a communication to succeed: II. 6 basic functions of MPI

MPI DatatypesFortran Datatypes MPI_INTEGERINTEGER MPI_REALREAL MPI_DOUBLE_PRECISIONDOUBLE PRECISION MPI_COMPLEXECOMPLEXE MPI_LOGICALLOGICAL MPI_CHARACTERCHARACTER(1) MPI Basic Datatypes in Fortran II. 6 basic functions of MPI

in Fortran: double precision MPI_Wtime() Time is measured in seconds Time to perform a task is measured by consulting the timerbefore and after. Modify your program to measure its execution time and print out. Preliminary: TIMER III. The matrix multiply example: Example: tstart = mpi_wtime blabla blablaba….. tend = mpi_wtime print *, ‘ node ’, myid, ‘,time=‘, tend-tstart, ‘ seconds ’

Matrix A is copied to every processors j=1..np. Matrix B is divided into blocks of columns B and distributed to processors Performs matrix multiply simultaneously between A and B Output solutions. Simple matrix multiply algorithm III. The matrix multiply example: j=1..np j 1,2,3,4 *= A B C

Master: distribute the work to workers, collect results, and output solution. Master sends a copy of A to every worker do dest=1, numworkers call MPI_SEND(a, nra*nca, mpi_double_precision, dest,mtype, mpi_comm_world, ierr) end do Worker: receive a copy of A from master call mpi_recv(a, nra*nca, mpi_double_precision, master, mtype, mpi_comm_world, status, ierr) III. The matrix multiply example:

Master: distribute block of columns of B to workers Master sends column length (cols) and column identifier (offset) do dest=1, numworkers call MPI_SEND(offset, 1, mpi_integer, dest,mtype, mpi_comm_world,ierr) call MPI_SEND(cols, 1, mpi_integer, dest,mtype, mpi_comm_world,ierr) end do Master sends corresponding values to workers: do dest=1, numworkers call MPI_SEND(b(1,offset), cols*nca, mpi_double_precision, dest, mtype, mpi_comm_world,ierr) end do III. The matrix multiply example:

Workers receive the data: call MPI_RECV(offset, 1, mpi_integer, master, mtype, mpi_comm_world, status, ierr) call MPI_RECV(cols, 1, mpi_integer, master, mtype, mpi_comm_world, status, ierr) call MPI_RECV(b, cols*nca, mpi_double_precision, master, mtype, mpi_comm_world, status, ierr) Workers do matrix multiply: do k=1, cols c(i,k)=0.0 d0 do j=1, nca c(i,k) = c(i,k) + a(i,j) * b(j,k) end do III. The matrix multiply example:

Workers send the results for their block back to the master: call MPI_SEND(c, cols*nca, mpi_double_precision, master, mtype, mpi_comm_world, ierr) Master receives results from workers: do i= 1, numworkers call MPI_RECV(c(1,offset), cols*nca, mpi_double_precision, master, mtype, mpi_comm_world, status, ierr) end do Remark: Fortran is not case sensitive III. The matrix multiply example:

Substitute for a more complex sequence of calls Involve all the processes in a process group Called by all processes in a communicator all routines block until they are locally complete Receive buffers must be exactly the right size No message tags are needed Collective calls are divided into three subsets: synchronization data movement global computation IV. Collective Communications:

To synchronize all processes within a communicator A communicator is a group of processes and a context of communication The base group is the group that contains all processes, which is associated with the MPI_COMM_WORLD communicator. A node calling it will be blocked until all nodes within the group have called it. Call MPI_BARRIER(comm,ierr) IV. Collective Communications: Barrier Synchronization Routines

One processor sends some data to all processors in a group call MPI_BCAST(buffer, count, datatype, root, comm, ierr) The MPI_BCAST must be called by each node in a group,specifying the same communicator and root. The message is sent from theroot process to all processes in the group, including the rootprocess. Scatter Data are distributed into n equal segments, where the ith segment is sent to the ith process in the group which has n processes. Call MPI_SCATTER(sbuff,scount, sdatatype, rbuf, rcount, rdatatype, root, comm, ierr) IV. Collective Communications: Broadcast

. Gather Data are collected into a specified process in the order of process rank, Gather is the reverse process of scatter. Call MPI_Gather(sbuff,scount, sdatatype, rbuf, rcount, rdatatype, root, comm, ierr) Example: datas in Proc. 0 are: {1,2}, in Proc. 1: {3,4}, in Proc.2: {5,6}, …. in Proc. 5 are {11,12}, then real rbuf(2), sbuf(2) call MPI_Gather (sbuf,2,MPI_INIT,rbuf,2,MPI_INIT,3, MPI_COMM_WORLD,ierr) will bring {1,2,3,4,5,6,….,11,12} into Proc. 3. Similarly, the inverse transfer is: call MPI_Scatter (sbuf,2,MPI_INIT,rbuf,2,MPI_INIT,3, MPI_COMM_WORLD,ierr) IV. Collective Communications:

. Two more MPI functions: MPI_Allgather and MPI_Alltoall: MPI_Alltoall(sbuf,scount,stype,rbuf,rcount,rtype,comm,ierr) sbuf: starting address of send buffer scount: number of elements sent to each process stype: data type to send buffer rbuff: address of receive buffer rcount: number of elements received from any process rtype: data type of receive buffer elements comm: communicator To summarize: IV. Collective Communications: p0 p1 p0 p1 p0 p1 p0 p1 p0 p1 p0 p1 p0 p1 p0 p1 aa a aba b a b ab ab ab cdd c b a Broadcast Scatter Gather All Gather All to All

Global Reduction Routines The partial result in each process in the group is combined together using some desired function. The operation function passed to a global computation routine is either a predefined MPI function or a user supplied function. Examples: Global sum or product. Global maximum or minimum. Global user-defined operation. MPI_Reduce(sbuf,rbuf,count,stype,op,root, comm,ierr) MPI_Allreduce(sbuf,rbuf,count,stype,op, comm,ierr) IV. Collective Communications:

Global Reduction Routines sbuf : address of send buffer rbuf: address of receive buffer count: the number of elements in the send buffer stype: the data type of elements of send buffer op: the reduce operation function, predefined or user-defined root: the rank of the root process comm: communicator mpi_reduce returns results to single process mpi_allreduce returns results to all processes in the group. IV. Collective Communications:

Global Reduction Routines IV. Collective Communications: p0p1p2p0p1p2 MPI_Reduce( sendbuf,recvbuf, 4 MPI_INT, MPI_MAX,0,comm) p0p1p2p0p1p2 MPI_Allreduce( sendbuf,recvbuf, 4 MPI_INT, MPI_SUM,comm)

Global Reduction Routines Examples c A subroutine that computes the dot product of two vectors that are distributed across c a group of processes and return the answer at node zero: subroutine PAR_BLAS1(N, a, b, scalar_product, comm) real a(N), b(N), sum, scalar_product sum=0.0 do I = 1, N sum = sum + a(I) * b(I) end do call MPI_Reduce(sum, scalar_product, 1, MPI_REAL, 0, MPI_SUM, comm, ierr) return IV. Collective Communications:

Global Reduction Routines Predefined Reduce Operations MPI NAME FUNCTIONMPI NAMEFUNCTION MPI_MAXMaximumMPI_LORLogical OR MPI_MINMinimumMPI_LAND Logical AND MPI_SUMSumMPI_PRODProduct IV. Collective Communications:

So far, we have seen standard standard SEND and RECEIVE functions, however we do need to know more in order to overlap communications by computations….and more generally optimized the code. Blocking Calls A blocking send or receive call suspends execution of user’s program until the message buffer being sent:received is safe to use. In case of a blocking send, this means the data to be sent have been copied out of the send buffer, but they have not necessarly been received in the receiving task. The contents of the send buffer can be modified without affecting the message that was sent. The blocking receive implies that the data in the receive buffer are valid. V. More on Communication Mode:

Blocking Communication Modes: Synchronous Send: MPI_SSEND: Return when the message buffer can be safely reused. The sending tasks tells the receiver that a message is ready for it and waits for the receiver to acknowledge. System overhead: buffer to network and vice versa. Synchronization overhead: handshake + waiting. Safe and Portable. Buffered Send: MPI_BSEND: Return when message is copied to the system buffer. Standard Send: MPI_SEND: Either synchronous or buffered, implemented by vendor to give good performance for most programs. In MPICH: we do have buffered send V. More on Communication Mode:

Non-Blocking Calls Non-blocking calls return immediately after initiating the communication. In order to reuse the send message buffer, the programmer must check for its status. The programmer can choose to block before the message buffer is used or test for the status of the message buffer. A blocking or non_blocking send can be paired to a blocking or non blocking receive. Syntax: call MPI_Isend(buf,count,datatype,dest,tag,comm,handle,ierr) call MPI_Irecv (buf,count,datatype,src,tag,comm,handle,ierr) V. More on Communication Mode:

Non-Blocking Calls The programmer can block or check for the status of the message buffer: MPI_Wait(request,status) this routine blocks until the communication has completed. They are useful when the data from the communication buffer is about to be re-used. MPI_Test(request,flag,status) This routine blocks until the communication specified by the handle request has completed. The request handle will have been returned by an earlier call to a non_blocking communication routine. The routine queries completion of the communication and the result (True or False) is returned in flag. V. More on Communication Mode:

Deadlock -All tasks are waiting for events that haven’t been initiated -Common to SPMD program with blocking communication, e.g every task sends, but none receives -Insufficient system buffer space is available -Remedies : -Arrange one task to receive -Use MPI_Ssendrecv -Use non-blocking communication

Examples : Deadlock c Improper use of blocking calls results in deadlock, run on two nodes c author : Roslyn Leibensperger, (CTC) program deadlock implicit none include ‘mpif.h’ integer MSGLEN, ITAG_A,ITAG_B parameter (MSGLEN = 2048,ITAG_A=100,ITAG_B=200) real rmsg1(MSGLEN), rmsg2(MSGLEN) integer, irank, idest, isrc, istag, iretag, istatus(MPI_STATUS_SIZE), ierr,I call MPI_Init (ierr) call MPI_Comm_rank(MPI_COMM_WORLD, irank, ierr) do I = 1, MSGLEN rmsg1(I)=100 rmsg2(I)= -100 end do V. More on Communication Mode:

Example : Deadline (Cont’d) if (irank.eq.0) then Idest = 1 Isrc = 1 Istag = ITAG_A Iretag = ITAG_B end if (irank.eq.1) then idest = 0 isrc = 0 istag = ITAG_B iretag = ITAG_A end if print*, ‘’ Task ‘’,irank, ‘’has sent the message ‘’ call MPI_Ssend (rmsg1,MSGLEN, MPI_REAL,isrc, iretag, MPI_COMM_WORLD, ierr) call MPI_Recev(rmsg2,MSGLEN,MPI_REAL,isrc, iretag, MPI_COMM_WORLD,istatus,ierr) print*, ‘’Task ‘’,irank, ‘’has received the message ‘’ call MPI_Finalize (ierr) end V. More on Communication Mode:

Examples : Deadlock (fixed) c Solution program showing the use of a non-blocking send to eliminate deadlock c author : Roslyn Leibensperger (CTC) program fixed implicit none include ‘mpif.h’ print*, ‘’Task ‘’, irank, ‘’has started the send ‘’ call MPI_isend(rmsg1,MSGLEN, MPI_REAL,idest, istag,MPI_COMM_WORLD,irequest,ierr) call MPI_Recev(rmsg2,MSGLEN,MPI_REAL,isrc, iretag,MPI_COMM_WORLD,irstatus,ierr) call MPI_Wait (irequest,isstatus,ierr) print*, ‘’Task ‘’,irank, ‘’ has completed the send ‘’ call MPI_Finalize(ierr) end V. More on Communication Mode:

Sendrecv -Useful for executing a shift operation across a chain of processes. -System take care of possible deadlock due to blocking call MPI_Sendrecv (sbuf,scount,stype,dest, stag,rbuf,rcount,rtype,rtag,comm,status) -sbuf (rbuf): initial address of send (receive) buffer. -scount (rcount): number of elements in send (receive) buffer. -stype (rtype) : type of elements in send (receive) buffer. -stag (rtag): send (receive) tag -dest: rank of destination. -source: rank of source. -comm: communicator -status: status object.

1: program sendrecv 2: implicit none 3: include ‘mpif.h’ 4: integer, dimension(MPI_STATUS_SIZE) :: status 5: integer, parameter :: tag 6: integer :: rank, value, num_proc, code 7: 8: call MPI_INIT(code) 9: call MPI_COMM_RANK(MPI_COMM_WORLD,rank,code) 10: 11: ! one suppose that we have only two processes. 12: num_proc=mod(rank+1,2) 13: 14: call MPI_SENDRECV(rank+1000,1,MPI_INTEGER,num_proc, tag,value,1,MPI_INTEGER,num_proc,tag, MPI_COMM_WORLD,status,code) 15: 16: print *,’me, process’,rank, ‘ i have received’, value,’from process’,num_proc 17: call MPI_FINALIZE(code) 18: end program sendrecv  mpirun –np 2 send recv me, process 1, i have received 1000 from process 0 me, process 0, i have received 1001 from process 1 Remark: if Blocking MPI_SEND are implemented in this code, we will have a deadlock because each process will wait for an order of reception that will never come!

Optimizations Optimization must be a main concern when communications time become a significant part compare to computations time Optimization of communications may be accomplished at different levels, the main ones are : 1.Overlap communication by computation 2.Avoid, if possible, copy of the message in a temporary memory (buffering), 3.Minimize additional costs induced by calling subroutines of communication too often V. More on Communication Mode: