Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI.

Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI.
Giuseppe LA ROCCA INFN – Sez. di Catania Corso Introduttivo di Grid Computing Catania,

Overview Execution of parallel jobs is an essential
issue for modern conceptions of informatics and applications. Solve large problems Solve problems with greater speed Most used library for parallel jobs support is (Message Passing Interface) MPI based on send() and receive() primitives a “master” node starts some processes “slaves” by establishing SSH sessions all processes can share a common workspace and/or exchange data Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Goals of the MPI standard
MPI prime goals are: To provide source-code portability To allow efficient implementation the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem. A great deal of functionality Support for heterogeneous parallel architectures Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

A bit of history ... The Message Passing Interface (MPI) is a standard developed by the Message Passing Interface Forum (MPIF). It specifies a portable interface APIs for writing message-passing programs in Fortran, C and C++ MPIF ( with the participation of more than 40 organizations, started working on the standard in 1992. The first draft (Version 1.0), which was published in 1994, was strongly influenced by the work at the IBM T. J. Watson Research Center. MPIF has further enhanced the first version to develop a second version (MPI-2) in The latest release of the first version (Version 1.2) is offered as an update to the previous release and is contained in the MPI-2 document. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

..some basic concepts An MPI process consists of a C/C++ or Fortran 77 program which communicates with other MPI processes by calling MPI routines The MPI routines provide the programmer with a consistent interface across a wide variety of different platforms. All names of MPI routines and constants in both C and Fortran begin with the prefix MPI_ to avoid name collisions. Fortran routine names are all upper case but C routine names are mixed case. In general, C MPI routines return an int and Fortran MPI routines have an IERROR argument. The default action on detection of an error by MPI is to cause the parallel computation to abort, rather than return with an error code, but this can be changed. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Basic Structures of MPI Programs
Header files Initializing MPI MPI Communicator MPI Function format Communicator Size Process Rank Finalizing MPI Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

1.Header files All sub-programs that contains calls to MPI subroutine MUST include the MPI HEADER file C: #include <mpi.h> Fortran: include ‘mpi.h’ The header file contains definitions of MPI constants, MPI types and functions Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

2.Initializing MPI int MPI_Init(int &argc, char &argv);
The first MPI routine called in any MPI program must be the initialisation routine MPI_INIT. Every MPI program must call this routine once, before any other MPI routines. Making multiple calls to MPI_INIT is erroneous. The C version of the routine accepts argc and argv as arguments. int MPI_Init(int &argc, char &argv); The Fortran version takes no arguments other than the error code : MPI_INIT(IERROR) Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

3.MPI Communicator The Communicator is a variable identifying a group of processes that are allowed to communicate with each other There is a default communicator MPI_COMM_WORLD which identify the group of all the processes. The processes are ordered and numbered consecutively from 0 (in both Fortran and C), the number of each process being known as its rank The rank identifies each process within the communicator. The predefined communicator MPI_COMM_WORLD for 7 processes The numbers indicate the ranks of each process. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Size is the number of processes associated to the communicator.
Rank is the index of a generic process (rank=0, .. ,SIZE-1). The Rank is used to identify the source and the destination process during a communication Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

4.MPI Function format C: Error = MPI_XXX (parameter, …); Fortran:
CALL MPI_XXX (parameter, IERROR) Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

5.Communicator Size How many processes are associated with a communicator ? C : MPI_Comm_size (MPI_Comm comm, int *SIZE); Fortran : INTEGER COMM, SIZE, IERR CALL MPI_COMM_SIZE (COMM,SIZE,IERR) Output SIZE Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

6.Process Rank What is the ID of a process in a group ? C:
MPI_Comm_rank (MPI_Comm comm, int *RANK); Fortran: INTEGER COMM, RANK, IERR CALL MPI_COMM_RANK (COMM, RANK, IERR) Output : RANK Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

7.Finalizing MPI An MPI program should call the MPI routine MPI_FINALIZE when all communications have completed. This routine cleans up all MPI data-structures, etc. Once this routine has been called, no other calls can be made to MPI routines Finalizing the MPI environment C: int MPI_Finalize (); Fortran: INTEGER IERR CALL MPI_FINALIZE (IERR) Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

A template of Fortran MPI program
A template of C MPI program Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Point-to-point communication
A point-to-point communication always involves exactly two processes. One process sends a MESSAGE to the other. This distinguishes it from the collective communication, which involves a whole group of process at one time. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Blocking and Non-Blocking Communication /1
Even when a single message is sent from process 0 to process 1, there are several steps involved in the communication. At the sending process, the following events occur one after another. The data is copied to the user buffer (scalar variable or array used in the program) by the user. The user calls one of the MPI send subroutines. The system copies the data from the user buffer to the system buffer. The system sends the data from the system buffer to the destination process. The following occurs during the receiving process: The user calls one of the MPI receive subroutines. The system receives the data from the source process and copies it to the system buffer. The system copies the data from the system buffer to the user buffer. The user uses the data in the user buffer Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

When you send data, you CANNOT or SHOULD NOT reuse your buffer until the system copies data from user buffer to the system buffer. Also when you receive data, the data is not ready until the system completes copying data from a system buffer to a user buffer. In MPI, there are two modes of communication: blocking and non-blocking. When you use blocking communication subroutines the program will not return from the subroutine call until the copy to/from the system buffer has finished. When you use non-blocking communication subroutines the program immediately returns from the subroutine call. ! Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

A call to a non-blocking subroutine only indicates that the copy to/from the system buffer is initiated and it is not assured that the copy has completed. Therefore, you have to make sure of the completion of the copy (MPI_WAIT, MPI_TEST). Blocking Communication Non-Blocking Communication Slow and simple Fast, complex and insecure Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Communication Modes and MPI subroutines
The Standard Send completes once the message has been sent, which may or may not imply that the message has arrived at its destination. The message may instead lie “in the communication network” for some time. MPI_SEND (buf, count, datatype, dest, tag, comm); Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Synchronous Send If user needs to know if the sending message has been received by the receiver, then both process may use synchronous communication. An acknowledgement is sent by the receiver to the sender (‘handshake process’) If the ack is properly received by sender the send is considered completed. MPI_SSEND (buf,count,datatype,dest,tag,comm); Advantage Disadvantage safer slower Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Sender & receiver are not synchtonised Can’t pre-allocate buffer space
Buffered Send Buffered Send guarantees to complete immediately, copying the message to system buffer for later transmission if necessary. The programmer has to allocate enough buffer space for the program with calls to MPI_BUFFER_ATTACK (buffer, size) Buffer space is detached with calls to MPI_BUFFER_DETACH(buffer, size) Advantage Disadvantage Sender & receiver are not synchtonised Can’t pre-allocate buffer space Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Complex to debug in case of failures
Ready Send Like the Buffered Send, a Ready Send completes immediately. The sending process simply throws the message out onto the communication network and hopes that the receiver is waiting to catch it. If the receiver is ready, the message will be received, otherwise the message will be dropped. MPI_RSEND (buf, count, datatypes, source, tag, comm, status) Advantage Disadvantage High performance Complex to debug in case of failures Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

The standard blocking receive
The format of the standard blocking receive is: MPI_RECV (buf, count, datatype, source, tag, comm, status) Where: buf is the address where the data should be placed once received (the receive buffer) count is the number of elements which buf can contain. datatype is the MPI datatype for the message source is the rank of the source of the message in the group associated with the communicator comm. tag is used by the receiving process to specify the message the receiver is waiting for. comm is the communicator status contains the status of the receiving process Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Non-Blocking Communication /1
The non-blocking routines have identical arguments to their blocking counterparts except for an extra argument in the non-blocking routines. This argument, request, is very important as it provides a handle which is used to test when the communication has completed Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Non-Blocking Communication /2
Non-Blocking communications allow the separation between the initiation of the communication and the completion. Advantage Disadvantage Between the initiation and the completion the program could do some useful computation (latency hiding) The programmer has to insert code to check for completion Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Waiting and Testing for Completion /1
MPI_WAIT (req, status, ierr) MPI_WAIT (MPI_Request *req, MPI_Status *status); A call to this subroutine cause the code to wait until the communication pointed by req is completed. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Waiting and Testing for Completion /2
MPI_TEST (req, flag, status, ierr) MPI_TEST (MPI_Request *req, int *flag, MPI_Status *status); A call to this subroutine sets flag to true if the communication pointed by req is completed, set flag to false otherwise. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

MPI_ISSEND (buf, count, datatype, dest, tag, comm, request)
After the sending the program can continues with other computations which do not alter the send buffer. Before the sending process can update the send buffer it must check that the send has completed MPI_IRECV (buf, count, datatype, source, tag, comm, request) The receiving process can then carry on with other computations until it needs the received data. It then checks the receive buffer to see if the communication has completed. Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Fortran – MPI Data types

C - MPI Data types Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

JDL attributes From the user’s point of view, MPI jobs are specified by setting the JDL JobType attribute to MPICH specifying the NodeNumber attribute as well JobType = “MPICH”; NodeNumber = 2; This attribute defines the required number of CPU cores (PEs) Matchmaking: the Resource Broker (RB) chooses a CE (if any!) with enough free Processing Elements (PE = CPU cores) e.g.: free PE# ≥ NodeNumber (otherwise “wait!”) ! Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

When these two attributes are included in a JDL script the following expression is automatically added: (other.GlueCEInfoTotalCPUs >= NodeNumber) && Member (“MPICH”,other.GlueHostApplicationSoftwareRunTimeEnvironment) to the JDL requirements expression in order to find out the best resource where the job can be executed Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Requirements & Settings
In order to guarantee that MPI job can run, the following requirements MUST BE satisfied: the MPICH software must be installed and placed in the PATH environment variable, on all the WNs of the CE. Some MPI applications required a shared file system among the WNs to run. No shared file system in GILDA Parallel jobs can run inside single Computing Element only several projects are involved into studies concerning the possibility of executing parallel jos on Worker Nodes (WNs) belonging to differents CEs. ! Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Requirements & Settings
Job Wrapper copies all the files indicated in the InputSandbox on ALL of the “slave” nodes host based ssh authentication MUST BE well configured between all the WNs Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Click here to download ‘pearson.c’
The Pearson Product Moment Correlation Coefficient (r) is the most widely used measure of correlation or association. It is named after Karl Pearson who developed the correlation method to do agricultural research. The product moment part of the name comes from the way in which it is calculated, by summing up the products of the deviations of the scores from the mean. Click here to download ‘pearson.c’ Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Local Resource Manager (LRMS) =
pearson.jdl [ Type = "Job"; JobType = "MPICH"; Executable = “pearson”; NodeNumber = 3; StdOutput = “pearson.out"; StdError = “pearson.err"; InputSandbox = {“pearson”}; OutputSandbox = {“pearson.err”, “pearson.out”}; Requirements = other.GlueCEInfoLRMSType == "PBS" || other.GlueCEInfoLRMSType == "LSF"; ] Executable Local Resource Manager (LRMS) = PBS/LSF only Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

MPI on the web.. https://edms.cern.ch/file/454439/LCG-2-UserGuide.pdf

Hands-on cd ${HOME}/MPI 01/ 02/ 03/ 04/ 05/
ssh OS passwd : GridCIGXX PassPhrase : CIGC where XX = 01,..,25 Corso Introduttivo di Grid Computing – Lezione 8 – Catania,

Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI.

Similar presentations

Presentation on theme: "Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI.

Similar presentations

Presentation on theme: "Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI."— Presentation transcript:

Similar presentations

About project

Feedback