Download presentation
Presentation is loading. Please wait.
1
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions Compiling and running code with MPI Domain decomposition Stokes solver Tracers/markers Performance Documentation
2
What is MPI? Mainly a data communication tool: “Message-Passing Interface” Allows parallel calculation on distributed memory machines Usually Single-Program-Multiple-Data principle used: all processors have similar tasks (e.g. in domain decomposition) Alternative: OpenMP for shared memory machines Why should we use MPI? If sequential calculations take too long If sequential calculations use too much memory
3
Output for 4 processors: Code: contains definitions, macros, function prototypes initialize MPI ask processor ‘rank’ ask # processors p stop MPI Simple MPI example
4
MPI calls for sending/receiving data
5
in C: in Fortran: in C: in Fortran: MPI_SEND and MPI_RECV syntax
6
MPI data types in C:in Fortran:
7
Other frequently used MPI calls Sending and receiving at the same time: no risk for deadlocks: … or overwrite send buffer with received info:
8
Other frequently used MPI calls Synchronizing the processors: wait for each other at the barrier: Broadcasting a message from one processor to all the others: both sending and receiving processors use same call to MPI_BCAST
9
Other frequently used MPI calls “Reducing” (combining) data from all processors: add, find maximum/minimum, etc. OP can be one of the following: For results to be available at all processors, use MPI_Allreduce:
10
Additional comments: ‘wildcards’ are allowed in MPI calls for: source: MPI_ANY_SOURCE tag: MPI_ANY_TAG MPI_SEND and MPI_RECV are ‘blocking’: they wait until job is done
11
Deadlocks: Deadlock Depending on buffer Safe Don’t let processor send a message to itself In this case use MPI_SENDRECV Non-matching send/receive calls my block the code
12
Compiling and running code with MPI Compiling: Fortran: mpif77 –o binary code.f mpif90 –o binary code.f C: mpicc –o binary code.c Running in general, no queueing system: mpirun –np 4 binary mpirun -np 4 -nolocal -machinefile mach binary Running on Gonzales, with queueing system: bsub -n 4 -W 8:00 prun binary
13
Domain decomposition x y z Total computational domain divided into ‘equal size’ blocks Each processor only deals with its own block At block boundaries some information exchange necessary Block division matters: surface/volume ratio number of processor bnds.
14
M2M2 S2S2 N2N2 EW M 1 =0.25*(N 1 +S 1 +W) M S N EW M=0.25*(N+S+E+W) S1S1 M1M1 N1N1 M 2 =0.25*(E) M =M 1 +M 2 (using MPI_SENDRECV) M 1 =M 1 =M Stokes equation: Jacobi iterative solver In block interior: no MPI needed At block boundary: MPI needed Gauss-Seidel solver performs better, but is also slightly more difficult to implement.
15
Tracers/Markers proc nproc n+1 2 nd order Runge-Kutta scheme: k 1 = dt v( t,x(t) ) k 2 = dt v( t+dt/2, x(t) + k 1 /2 ) x( t+dt ) = x( t ) + k 2 Procedure: Calculate x(t+dt/2) If in proc n+1 : proc n sends tracer coordinates to proc n+1 proc n+1 reports tracer velocity back to proc n Calculate x(t) If in proc n+1 : procn sends tracer coordinates + function values permantently to proc n+1 k1k1 k2k2
16
Performance For too small jobs communication quickly becomes the bottleneck. This problem: R-B convection (Ra=10 6 ) 2-D 64x64 finite elements, 10 4 steps 3-D 64x64x64 FE, 100 steps Calculation on gonzales
17
Documentation PDF: www.hpc.unimelb.edu.au/software/mpi-docs/mpi-book.pdf Books:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.