MPI Advanced edition Jakub Yaghob. Initializing MPI – threading int MPI Init(int *argc, char ***argv, int required, int *provided); Must be called as.

Slides:



Advertisements
Similar presentations
Proposal (More) Flexible RMA Synchronization for MPI-3 Hubert Ritzdorf NEC–IT Research Division
Advertisements

MPI Message Passing Interface
1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Process Groups & Communicators  Communicator is a group of processes that can communicate with one another.  Can create sub-groups of processes, or.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Reference: / MPI Program Structure.
Reference: / Point-to-Point Communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Its.unc.edu 1 Derived Datatypes Research Computing UNC - Chapel Hill Instructor: Mark Reed
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Research Computing UNC - Chapel Hill Instructor: Mark Reed Group and Communicator Management Routines.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.
CS 179: GPU Programming Lecture 20: Cross-system communication.
Chapter 4: Threads Adapted to COP4610 by Robert van Engelen.
Advanced MPI Rajeev Thakur Argonne National Laboratory
Advanced MPI Slides Available at Pavan Balaji
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Parallel Programming with MPI Matthew Pratola
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
11/04/2010CS4961 CS4961 Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4,
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
L17: MPI, cont. October 25, Final Project Purpose: -A chance to dig in deeper into a parallel programming model and explore concepts. -Research.
An Introduction to MPI (message passing interface)
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Interface Using resources from
OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
L18: MPI, cont. November 10, Administrative Class cancelled, Tuesday, November 15 Guest Lecture, Thursday, November 17, Ganesh Gopalakrishnan CUDA.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
MPI Point to Point Communication
Introduction to MPI.
Computer Science Department
MPI-Message Passing Interface
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.
Computer Science Department
5- Message-Passing Programming
Presentation transcript:

MPI Advanced edition Jakub Yaghob

Initializing MPI – threading int MPI Init(int *argc, char ***argv, int required, int *provided); Must be called as the first MPI routine Establishes the MPI environment for multithreaded execution MPI_THREAD_SINGLE Only one thread MPI_THREAD_FUNNELED Only main thread will make MPI calls MPI_THREAD_SERIALIZED Only one MPI call at time MPI_THREAD_MULTIPLE Multiple threads with multiple calls

Communication modes Most sending functions in three modes Standard – MPI_Send MPI decides whether outgoing messages will be buffered Non-local – a successful completion may depend on the occurrence of a matching receive Buffered – MPI_Bsend Can be started whether or not a matching receive has been posted It may complete before a patching receive is posted Local - a successful completion does not depend on the occurrence of a matching receive Synchronous – MPI_Ssend Can be started whether or not a matching receive has been posted It completes successfully only if a matching receive is posted and the receive operation has started to receive the message - rendezvous Non-local Ready – MPI_Rsend May be started only if the matching receive is already posted The same semantic as the standard mode, additional information can save a handshake

Non-blocking point-to-point – send Posting a send – non-blocking operation int MPI_Isend(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request); None of the arguments should be read or written until the send is completed

Non-blocking point-to-point – receive Posting a receive – non-blocking operation int MPI_Irecv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request); None of the arguments should be read or written until the send is completed

Completion Posted sends and receives must be completed Waiting – blocking completion int MPI_Wait(MPI_Request *request, MPI_Status *status); Testing – non-blocking completion int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status); If the flag is true, then the posted operation is complete and status contains proper information Returns immediately

Probe Checking for presence of a message int MPI_Probe(int source, int tag, MPI_Comm comm, MPI_Status *status) int MPI_Iprobe(int source, int tag, MPI_Comm comm, int *flag, MPI_Status *status) Blocking/non-blocking versions flag==true Valid message awaits Allocate necessary memory for a message Size and type in status

Matching probe/receive Solving a problem for multithreaded environment int MPI_Mprobe(int source, int tag, MPI_Comm comm, MPI_Message *message, MPI_Status *status) int MPI_Improbe(int source, int tag, MPI_Comm comm, int *flag, MPI_Message *message, MPI_Status *status) int MPI_Mrecv(void* buf, int count, MPI_Datatype datatype, MPI_Message *message, MPI_Status *status) int MPI_Imrecv(void* buf, int count, MPI_Datatype datatype, MPI_Message *message, MPI_Request *request) Blocking/non-blocking versions Mprobe removes the message from the queue Mrecv receives the message

Packing and unpacking Noncontiguous data or heterogeneous data int MPI_Pack(void* inbuf, int incount, MPI_Datatype datatype, void *outbuf, int outsize, int *position, MPI_Comm comm); int MPI_Unpack(void* inbuf, int insize, int *position, void *outbuf, int outcount, MPI_Datatype datatype, MPI_Comm comm); outsize/insize in bytes position Input value – the first location in the output buffer Output value – the first location following the packed data MPI datatype MPI_PACK

Non-blocking collective operations Only for MPI-3 conforming implementations Solve some interesting synchronization problem int MPI_Ibarrier(MPI_Comm comm, MPI_Request *request) And many others MPI_Ibcast, MPI_Igather, MPI_Igatherv, MPI_Iscatter, MPI_Iscatterv, MPI_Iallgather, MPI_Iallgatherv, MPI_Ialltoall, MPI_Ialltoallv, MPI_Ialltoallw, MPI_Ireduce,MPI_Iallreduce, MPI_Ireduce_scatter, MPI_Iscan, MPI_Iexscan A BC

Communicators Support for communication among a selected subgroup, virtual topologies Group Ordered set of process identifiers Used to describe participants in a communication Intra-communicator Contains a group of valid participants (including local process) The source and destination identified by process rank within that group Inter-communicator Application with internal user-level servers, each server is a process group Clients are a process group as well Communication between processes in different groups

Group constructors Determining the group handle from a communicator int MPI_Comm_group(MPI_Comm comm, MPI_Group *group); Inclusion int MPI_Group_incl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup); An empty group handler MPI_GROUP_EMPTY Exclusion int MPI_Group_excl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup);

Group accessors and destructors Querying a process’s rank in a group int MPI_Group_rank(MPI_Group group, int *rank); Size of a group int MPI_Group_size(MPI_Group group, int *size); Destructor int MPI_Group_free(MPI_Group *group);

Intra-communicator constructors Creating a communicator from a group int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm); Returns MPI_COMM_NULL for processes not within the group Splitting a communicator int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm); Disjoint groups, one for each value of color Ranks within new groups according to key Collective call, each process provides different values for color and key

Intra-communicator accessors and destructor Rank int MPI_Comm_rank(MPI_Comm comm, int *rank); Shortcut for MPI_Group_rank Size int MPI_Comm_size(MPI_Comm comm, int *size); Shortcut for MPI_Group_size Destructor int MPI_Comm_free(MPI_Comm *comm);

Virtual topologies Partitioning of matrices M x N matrix decomposed into P Q x N submatrices with each assigned to be worked on by one of the P processes Mapping of the linear process rank to a 2D virtual rank

MPI-2 parallel I/O Parallel I/O similar to message sending Not all implementations support the full MPI-2 I/O Physical decomposition with a certain number of I/O nodes can be configured Blocking and nonblocking I/O Collective and non-collective I/O

MPI-2 file structure Characteristics MPI datatypes are written and read Partitioning of the file Sequential and random access Each process has its own view of the file A view defines the current set of data visible and accessible by a process, it is defined by three quantities Displacement – where in the file to start Etype – the type od data Filetype – pattern of how the data is partitioned in the file Default view: displacement=0, etype=MPI_BYTE, filetype=MPI_BYTE

One-sided communication RMA (Remote Memory Access) One process specifies all communication parameters Two memory models Separate  No assumption about consistency  Highly portable Unified  Exploits cache-coherency  Hardware-accelerated one-sided operations Setting by a window attribute MPI_WIN_MODEL Two categories of communications Active target  Both sides involved in communication Passive target  Only originator involved, target passive

One-sided communication – initialization Windows Part of memory exposed for RMA to the group Collective communications int MPI_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, MPI_Win *win) Creates a window for given memory int MPI_Win_allocate(MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, void **baseptr, MPI_Win *win) Creates and allocates a window int MPI_Win_allocate_shared(MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, void **baseptr, MPI_Win *win) Creates and allocates a window in a shared memory, which can be accessed by all processes in a group by direct load/store memory operations int MPI_Win_free(MPI_Win *win) Destroys the window

One-sided communication – transfers Communications calls From the caller memory to the target memory int MPI_Put(const void *origin_addr, int origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Win win) From the target memory to the caller memory int MPI_Get(void *origin_addr, int origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Win win)

One-sided communication – accumulate Accumulate f(a,b) = a OP b MPI_REPLACE – f(a,b) = b MPI_NO_OP – f(a,b) = a int MPI_Accumulate(const void *origin_addr, int origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win) Accumulate to the target

One-sided communication – accumulate cont. Accumulate int MPI_Get_accumulate(const void *origin_addr, int origin_count, MPI_Datatype origin_datatype, void *result_addr, int result_count, MPI_Datatype result_datatype, int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win) Fetches target buffer before accumulation int MPI_Fetch_and_op(const void *origin_addr, void *result_addr, MPI_Datatype datatype, int target_rank, MPI_Aint target_disp, MPI_Op op, MPI_Win win) Faster specialized version

One-sided communication – compare and replace Compare and replace int MPI_Compare_and_swap(const void *origin_addr, const void *compare_addr, void *result_addr, MPI_Datatype datatype, int target_rank, MPI_Aint target_disp, MPI_Win win) Compare compare_addr with result_addr, if equal, replace with origin_addr

One-sided communication – request-based operations Request-based operations Functions use MPI_Request for waiting Only for passive target MPI_Rput, MPI_Rget, MPI_Raccumulate, MPI_Rget_accumulate

One-sided communication – synchronization calls Three mechanisms Fence Collective synchronization MPI_Win_fence Loosely-synchronous model Only for active targets An access epoch at an origin and an exposure epoch at an target are started and completed by MPI_Win_fence General active target synchronization An originator calls MPI_Win_start for starting an access epoch, MPI_Win_complete for ending the access epoch A target calls MPI_Win_post for starting an exposure epoch, MPI_Win_wait for wait for end of the exposure epoch General passive target synchronization Locking and unlocking window at an target by MPI_Win_lock, MPI_Win_lock_all, MPI_Win_unlock, MPI_Win_unlock_all

One-sided communication – active/passive targets

One-sided communication – fence Fence int MPI_Win_fence(int assert, MPI_Win win) Collective call All RMA operations started at the given process and started before fence will finish before the fence call returns Operations will be completed at the target before the fence call returns at the target

One-sided communication – general active target sync General active target synchronization int MPI_Win_start(MPI_Group group, int assert, MPI_Win win) Starts an access epoch Access only windows at processes in the group Each process in a group must issue MPI_Win_post RMA calls may be delayed until corresponding MPI_Win_post is issued int MPI_Win_complete(MPI_Win win) Completes the access epoch All RMA operations must complete at the origin (not at the target) before the call returns

One-sided communication – general active target sync General active target synchronization int MPI_Win_post(MPI_Group group, int assert, MPI_Win win) Starts an exposure epoch for local window Only processes in the group can access the window Does not block int MPI_Win_wait(MPI_Win win) Completes the exposure epoch Will block until matching calls to MPI_Win_complete have occured

One-sided communication – multiple active targets

One-sided communication – general passive target sync General passive target synchronization int MPI_Win_lock(int lock_type, int rank, int assert, MPI_Win win) Starts an access epoch at process rank MPI_LOCK_EXCLUSIVE, MPI_LOCK_SHARED int MPI_Win_lock_all(int assert, MPI_Win win) Starts an access epoch to all processes in win with lock type MPI_LOCK_SHARED Must be unlocked by MPI_Win_unlock_all Not collective, locks all processes in win

One-sided communication – general passive target sync General passive target synchronization int MPI_Win_unlock(int rank, MPI_Win win) Completes an access epoch All RMA operations issued during the epoch are finished both at the origin and at the target before return int MPI_Win_unlock_all(MPI_Win win) Completes an access epoch started by MPI_Win_lock_all