Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Chris Carothers Computer Science Department MRC 309a

Similar presentations


Presentation on theme: "Prof. Chris Carothers Computer Science Department MRC 309a "— Presentation transcript:

1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 10-11:30 a.m. MPI File I/O
Prof. Chris Carothers Computer Science Department MRC 309a Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt PPC MPI Parallel File I/O

2 Common Ways of Doing I/O in Parallel Programs
Sequential I/O: All processes send data to rank 0, and 0 writes it to the file PPC MPI Parallel File I/O

3 Pros and Cons of Sequential I/O
parallel machine may support I/O from only one process (e.g., no common file system) Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO) not parallel resulting single file is handy for ftp, mv big blocks improve performance short distance from original, serial code Cons: lack of parallelism limits scalability, performance (single node bottleneck) PPC MPI Parallel File I/O

4 PPC 2019 - MPI Parallel File I/O
Another Way Each process writes to a separate file Pros: parallelism, high performance Cons: lots of small files to manage LOTS OF METADATA – stress parallel filesystem difficult to read back data from different number of processes PPC MPI Parallel File I/O

5 PPC 2019 - MPI Parallel File I/O
What is Parallel I/O? Multiple processes of a parallel program accessing data (reading or writing) from a common file FILE P(n-1) P0 P1 P2 PPC MPI Parallel File I/O

6 PPC 2019 - MPI Parallel File I/O
Why Parallel I/O? Non-parallel I/O is simple but Poor performance (single process writes to one file) or Awkward and not interoperable with other tools (each process writes a separate file) Parallel I/O Provides high performance Can provide a single file that can be used with other tools (such as visualization programs) PPC MPI Parallel File I/O

7 Why is MPI a Good Setting for Parallel I/O?
Writing is like sending a message and reading is like receiving. Any parallel I/O system will need a mechanism to define collective operations (MPI communicators) define noncontiguous data layout in memory and file (MPI datatypes) Test completion of nonblocking operations (MPI request objects) i.e., lots of MPI-like machinery PPC MPI Parallel File I/O

8 PPC 2019 - MPI Parallel File I/O
MPI-IO Background Marc Snir et al (IBM Watson) paper exploring MPI as context for parallel I/O (1994) MPI-IO discussion group led by J.-P. Prost (IBM) and Bill Nitzberg (NASA), 1994 MPI-IO group joins MPI Forum in June 1996 MPI-2 standard released in July 1997 MPI-IO is Chapter 9 of MPI-2 PPC MPI Parallel File I/O

9 Using MPI for Simple I/O
FILE P0 P1 P2 P(n-1) Each process needs to read a chunk of data from a common file PPC MPI Parallel File I/O

10 Using Individual File Pointers
#include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET); MPI_File_read(fh, buf, nints, MPI_INT, &status); MPI_File_close(&fh); } PPC MPI Parallel File I/O

11 Using Explicit Offsets
#include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_read_at(fh, rank*bufsize, buf, nints, MPI_INT, &status); MPI_File_close(&fh); } PPC MPI Parallel File I/O

12 PPC 2019 - MPI Parallel File I/O
Function Details MPI_File_open(MPI_Comm comm, char *file, int mode, MPI_Info info, MPI_File *fh) (note: mode = MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY, MPI_MODE_CREATE, MPI_MODE_EXCL, MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN, MPI_MODE_SEQUENTIAL, MPI_MODE_APPEND) MPI_File_close(MPI_File *fh) MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_read_at(MPI_File fh, int offset, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_seek(MPI_File fh, MPI_Offset offset, in whence); (note: whence = MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END) MPI_File_write(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status) MPI_File_write_at( …same as read_at … ); (Note: Many other functions to get/set properties (see Gropp et al)) PPC MPI Parallel File I/O

13 PPC 2019 - MPI Parallel File I/O
Writing to a File Use MPI_File_write or MPI_File_write_at Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran PPC MPI Parallel File I/O

14 MPI Datatype Interlude
Datatypes in MPI Elementary: MPI_INT, MPI_DOUBLE, etc everything we’ve used to this point Contiguous Next easiest: sequences of elementary types Vector Sequences separated by a constant “stride” PPC MPI Parallel File I/O

15 PPC 2019 - MPI Parallel File I/O
MPI Datatypes, cont Indexed: more general does not assume a constant stride Struct General mixed types (like C structs) PPC MPI Parallel File I/O

16 Creating simple datatypes
Let’s just look at the simplest types: contiguous and vector datatypes. Contiguous example Let’s create a new datatype which is two ints side by side. The calling sequence is MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype); MPI_Datatype newtype; MPI_Type_contiguous(2, MPI_INT, &newtype); MPI_Type_commit(newtype); /* required */ PPC MPI Parallel File I/O

17 PPC 2019 - MPI Parallel File I/O
Using File Views Processes write to shared file MPI_File_set_view assigns regions of the file to separate processes PPC MPI Parallel File I/O

18 PPC 2019 - MPI Parallel File I/O
File Views Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view displacement = number of bytes to be skipped from the start of the file etype = basic unit of data access (can be any basic or derived datatype) filetype = specifies which portion of the file is visible to the process This is a collective operation and so all processors/ranks must use the same data rep, etypes in the group determined when the file was open.. PPC MPI Parallel File I/O

19 File Interoperability
Users can optionally create files with a portable binary data representation “datarep” parameter to MPI_File_set_view native - default, same as in memory, not portable internal - impl. defined representation providing an impl. defined level of portability external32 - a specific representation defined in MPI, (basically 32-bit big-endian IEEE format), portable across machines and MPI implementations PPC MPI Parallel File I/O

20 PPC 2019 - MPI Parallel File I/O
File View Example MPI_File thefile; for (i=0; i<BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, "testfile", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, myrank * BUFSIZE, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); MPI_File_write(thefile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&thefile); PPC MPI Parallel File I/O

21 Ways to Write to a Shared File
MPI_File_seek MPI_File_read_at MPI_File_write_at MPI_File_read_shared MPI_File_write_shared Collective operations like Unix seek combine seek and I/O for thread safety use shared file pointer good when order doesn’t matter PPC MPI Parallel File I/O

22 PPC 2019 - MPI Parallel File I/O
Collective I/O in MPI A critical optimization in parallel I/O Allows communication of “big picture” to file system Framework for 2-phase I/O, in which communication precedes I/O (can use MPI machinery) Basic idea: build large blocks, so that reads/writes in I/O system will be large Small individual requests Large collective access PPC MPI Parallel File I/O

23 PPC 2019 - MPI Parallel File I/O
Collective I/O MPI_File_read_all, MPI_File_read_at_all, etc _all indicates that all processes in the group specified by the communicator passed to MPI_File_open will call this function Each process specifies only its own access information -- the argument list is the same as for the non-collective functions PPC MPI Parallel File I/O

24 PPC 2019 - MPI Parallel File I/O
Collective I/O By calling the collective I/O functions, the user allows an implementation to optimize the request based on the combined request of all processes The implementation can merge the requests of different processes and service the merged request efficiently Particularly effective when the accesses of different processes are noncontiguous and interleaved PPC MPI Parallel File I/O

25 Collective non-contiguous MPI-IO examples
#define “mpi.h” #define FILESIZE #define INTS_PER_BLK 16 int main(int argc, char **argv){ int *buf, rank, nprocs, nints, bufsize; MPI_File fh; MPI_Datatype filetype; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; buf = (int *) malloc(bufsize); nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, “filename”, MPI_MODE_RD_ONLY, MPI_INFO_NULL, &fh); MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK, INTS_PER_BLK*nprocs, MPI_INT, &filetype); MPI_Type_commit(&filetype); MPI_File_set_view(fh, INTS_PER_BLK*sizeof(int)*rank, MPI_INT, filetype, “native”, MPI_INFO_NULL); MPI_File_read_all(fh, buf, nints, MPI_INT, MPI_STATUS_IGNORE); MPI_Type_free(&filetype); free(buf) MPI_Finalize(); return(0); } PPC MPI Parallel File I/O

26 PPC 2019 - MPI Parallel File I/O
More on MPI_Read_all Note that the _all version has the same argument list Difference is that all processes involved in MPI_Open must call this the read Contrast with the non-all version where any subset may or may not call it Allows for many optimizations PPC MPI Parallel File I/O

27 PPC 2019 - MPI Parallel File I/O
Split Collective I/O A restricted form of nonblocking collective I/O Only one active nonblocking collective operation allowed at a time on a file handle Therefore, no request object necessary MPI_File_write_all_begin(fh, buf, count, datatype); // available on Blue Gene/L, but may not improve // performance for (i=0; i<1000; i++) { /* perform computation */ } MPI_File_write_all_end(fh, buf, &status); PPC MPI Parallel File I/O

28 Passing Hints to the Implementation
MPI_Info info; MPI_Info_create(&info); /* no. of I/O devices to be used for file striping */ MPI_Info_set(info, "striping_factor", "4"); /* the striping unit in bytes */ MPI_Info_set(info, "striping_unit", "65536"); MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile", MPI_MODE_CREATE | MPI_MODE_RDWR, info, &fh); MPI_Info_free(&info); PPC MPI Parallel File I/O

29 Examples of Hints (used in ROMIO)
striping_unit striping_factor cb_buffer_size cb_nodes ind_rd_buffer_size ind_wr_buffer_size start_iodevice pfs_svr_buf direct_read direct_write MPI-2 predefined hints New Algorithm Parameters Platform-specific hints PPC MPI Parallel File I/O

30 I/O Consistency Semantics
The consistency semantics specify the results when multiple processes access a common file and one or more processes write to the file MPI guarantees stronger consistency semantics if the communicator used to open the file accurately specifies all the processes that are accessing the file, and weaker semantics if not The user can take steps to ensure consistency when MPI does not automatically do so PPC MPI Parallel File I/O

31 PPC 2019 - MPI Parallel File I/O
Example 1 File opened with MPI_COMM_WORLD. Each process writes to a separate region of the file and reads back only what it wrote. MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_read_at(off=0,cnt=100) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=100,cnt=100) Process 0 Process 1 MPI guarantees that the data will be read correctly PPC MPI Parallel File I/O

32 PPC 2019 - MPI Parallel File I/O
Example 2 Same as example 1, except that each process wants to read what the other process wrote (overlapping accesses) In this case, MPI does not guarantee that the data will automatically be read correctly Process 0 Process 1 /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) In the above program, the read on each process is not guaranteed to get the data written by the other process! PPC MPI Parallel File I/O

33 PPC 2019 - MPI Parallel File I/O
Example 2 contd. The user must take extra steps to ensure correctness There are three choices: set atomicity to true close the file and reopen it ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process/MPI rank Can hurt performance…. PPC MPI Parallel File I/O

34 Example 2, Option 1 Set atomicity to true
MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh1,1) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_set_atomicity(fh2,1) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=0,cnt=100) Process 0 Process 1 PPC MPI Parallel File I/O

35 Example 2, Option 2 Close and reopen file
Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_close MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_close MPI_Barrier MPI_File_read_at(off=0,cnt=100) PPC MPI Parallel File I/O

36 PPC 2019 - MPI Parallel File I/O
Example 2, Option 3 Ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process a sequence is a set of operations between any pair of open, close, or file_sync functions a write sequence is a sequence in which any of the functions is a write operation PPC MPI Parallel File I/O

37 PPC 2019 - MPI Parallel File I/O
Example 2, Option 3 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_File_read_at(off=100,cnt=100) MPI_File_close MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=0,cnt=100) Process 0 Process 1 PPC MPI Parallel File I/O

38 General Guidelines for Achieving High I/O Performance
Buy sufficient I/O hardware for the machine Use fast file systems, not NFS-mounted home directories Do not perform I/O from one process only Make large requests wherever possible For noncontiguous requests, use derived datatypes and a single collective I/O call PPC MPI Parallel File I/O

39 PPC 2019 - MPI Parallel File I/O
Optimizations Given complete access information, an implementation can perform optimizations such as: Data Sieving: Read large chunks and extract what is really needed Collective I/O: Merge requests of different processes into larger requests Improved prefetching and caching PPC MPI Parallel File I/O

40 PPC 2019 - MPI Parallel File I/O
Summary MPI-IO has many features that can help users achieve high performance The most important of these features are the ability to specify noncontiguous accesses, the collective I/O functions, and the ability to pass hints to the implementation Users must use the above features! In particular, when accesses are noncontiguous, users must create derived datatypes, define file views, and use the collective I/O functions PPC MPI Parallel File I/O


Download ppt "Prof. Chris Carothers Computer Science Department MRC 309a "

Similar presentations


Ads by Google