Presentation is loading. Please wait.

Presentation is loading. Please wait.

PPC 2011 - MPI Parallel File I/O1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer.

Similar presentations


Presentation on theme: "PPC 2011 - MPI Parallel File I/O1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer."— Presentation transcript:

1 PPC 2011 - MPI Parallel File I/O1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer Science Department MRC 309a chrisc@cs.rpi.edu www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2010 Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt

2 PPC 2011 - MPI Parallel File I/O2 Common Ways of Doing I/O in Parallel Programs Sequential I/O: –All processes send data to rank 0, and 0 writes it to the file

3 PPC 2011 - MPI Parallel File I/O3 Pros and Cons of Sequential I/O Pros: –parallel machine may support I/O from only one process (e.g., no common file system) –Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO) not parallel –resulting single file is handy for ftp, mv –big blocks improve performance –short distance from original, serial code Cons: –lack of parallelism limits scalability, performance (single node bottleneck)

4 PPC 2011 - MPI Parallel File I/O4 Another Way Each process writes to a separate file Pros: –parallelism, high performance Cons: –lots of small files to manage –LOTS OF METADATA – stress parallel filesystem –difficult to read back data from different number of processes

5 PPC 2011 - MPI Parallel File I/O5 What is Parallel I/O? Multiple processes of a parallel program accessing data (reading or writing) from a common file FILE P0P1P2 P(n-1)

6 PPC 2011 - MPI Parallel File I/O6 Why Parallel I/O? Non-parallel I/O is simple but –Poor performance (single process writes to one file) or –Awkward and not interoperable with other tools (each process writes a separate file) Parallel I/O –Provides high performance –Can provide a single file that can be used with other tools (such as visualization programs)

7 PPC 2011 - MPI Parallel File I/O7 Why is MPI a Good Setting for Parallel I/O? Writing is like sending a message and reading is like receiving. Any parallel I/O system will need a mechanism to –define collective operations (MPI communicators) –define noncontiguous data layout in memory and file (MPI datatypes) –Test completion of nonblocking operations (MPI request objects) i.e., lots of MPI-like machinery

8 PPC 2011 - MPI Parallel File I/O8 MPI-IO Background Marc Snir et al (IBM Watson) paper exploring MPI as context for parallel I/O (1994) MPI-IO email discussion group led by J.-P. Prost (IBM) and Bill Nitzberg (NASA), 1994 MPI-IO group joins MPI Forum in June 1996 MPI-2 standard released in July 1997 MPI-IO is Chapter 9 of MPI-2

9 PPC 2011 - MPI Parallel File I/O9 Using MPI for Simple I/O FILE P0P1P2 P(n-1) Each process needs to read a chunk of data from a common file

10 PPC 2011 - MPI Parallel File I/O10 Using Individual File Pointers #include #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET); MPI_File_read(fh, buf, nints, MPI_INT, &status); MPI_File_close(&fh); }

11 PPC 2011 - MPI Parallel File I/O11 Using Explicit Offsets #include #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_read_at(fh, rank*bufsize, buf, nints, MPI_INT, &status); MPI_File_close(&fh); }

12 PPC 2011 - MPI Parallel File I/O12 Function Details MPI_File_open(MPI_Comm comm, char *file, int mode, MPI_Info info, MPI_File *fh) (note: mode = MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY, MPI_MODE_CREATE, MPI_MODE_EXCL, MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN, MPI_MODE_SEQUENTIAL, MPI_MODE_APPEND) MPI_File_close(MPI_File *fh) MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_read_at(MPI_File fh, int offset, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_seek(MPI_File fh, MPI_Offset offset, in whence); (note: whence = MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END) MPI_File_write(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status) MPI_File_write_at( …same as read_at … ); (Note: Many other functions to get/set properties (see Gropp et al))

13 PPC 2011 - MPI Parallel File I/O13 Writing to a File Use MPI_File_write or MPI_File_write_at Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran

14 PPC 2011 - MPI Parallel File I/O14 MPI Datatype Interlude Datatypes in MPI –Elementary: MPI_INT, MPI_DOUBLE, etc everything we’ve used to this point Contiguous –Next easiest: sequences of elementary types Vector –Sequences separated by a constant “stride”

15 PPC 2011 - MPI Parallel File I/O15 MPI Datatypes, cont Indexed: more general –does not assume a constant stride Struct –General mixed types (like C structs)

16 PPC 2011 - MPI Parallel File I/O16 Creating simple datatypes Let’s just look at the simplest types: contiguous and vector datatypes. Contiguous example –Let’s create a new datatype which is two ints side by side. The calling sequence is MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype); MPI_Datatype newtype; MPI_Type_contiguous(2, MPI_INT, &newtype); MPI_Type_commit(newtype); /* required */

17 PPC 2011 - MPI Parallel File I/O17 Using File Views Processes write to shared file MPI_File_set_view assigns regions of the file to separate processes

18 PPC 2011 - MPI Parallel File I/O18 File Views Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view displacement = number of bytes to be skipped from the start of the file etype = basic unit of data access (can be any basic or derived datatype) filetype = specifies which portion of the file is visible to the process This is a collective operation and so all processors/ranks must use the same data rep, etypes in the group determined when the file was open..

19 PPC 2011 - MPI Parallel File I/O19 File Interoperability Users can optionally create files with a portable binary data representation “datarep” parameter to MPI_File_set_view native - default, same as in memory, not portable internal - impl. defined representation providing an impl. defined level of portability external32 - a specific representation defined in MPI, (basically 32-bit big-endian IEEE format), portable across machines and MPI implementations

20 PPC 2011 - MPI Parallel File I/O20 File View Example MPI_File thefile; for (i=0; i<BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, "testfile", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, myrank * BUFSIZE, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); MPI_File_write(thefile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&thefile);

21 PPC 2011 - MPI Parallel File I/O21 Ways to Write to a Shared File MPI_File_seek MPI_File_read_at MPI_File_write_at MPI_File_read_shared MPI_File_write_shared Collective operations combine seek and I/O for thread safety use shared file pointer good when order doesn’t matter like Unix seek

22 PPC 2011 - MPI Parallel File I/O22 Collective I/O in MPI A critical optimization in parallel I/O Allows communication of “big picture” to file system Framework for 2-phase I/O, in which communication precedes I/O (can use MPI machinery) Basic idea: build large blocks, so that reads/writes in I/O system will be large Small individual requests Large collective access

23 PPC 2011 - MPI Parallel File I/O23 Collective I/O MPI_File_read_all, MPI_File_read_at_all, etc _all indicates that all processes in the group specified by the communicator passed to MPI_File_open will call this function Each process specifies only its own access information -- the argument list is the same as for the non-collective functions

24 PPC 2011 - MPI Parallel File I/O24 Collective I/O By calling the collective I/O functions, the user allows an implementation to optimize the request based on the combined request of all processes The implementation can merge the requests of different processes and service the merged request efficiently Particularly effective when the accesses of different processes are noncontiguous and interleaved

25 PPC 2011 - MPI Parallel File I/O25 Collective non-contiguous MPI-IO examples #define “mpi.h” #define FILESIZE 1048576 #define INTS_PER_BLK 16 int main(int argc, char **argv){ int *buf, rank, nprocs, nints, bufsize; MPI_File fh; MPI_Datatype filetype; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; buf = (int *) malloc(bufsize); nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, “filename”, MPI_MODE_RD_ONLY, MPI_INFO_NULL, &fh); MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK, INTS_PER_BLK*nprocs, MPI_INT, &filetype); MPI_Type_commit(&filetype); MPI_File_set_view(fh, INTS_PER_BLK*sizeof(int)*rank, MPI_INT, filetype, “native”, MPI_INFO_NULL); MPI_File_read_all(fh, buf, nints, MPI_INT, MPI_STATUS_IGNORE); MPI_Type_free(&filetype); free(buf) MPI_Finalize(); return(0); }

26 PPC 2011 - MPI Parallel File I/O26 More on MPI_Read_all Note that the _all version has the same argument list Difference is that all processes involved in MPI_Open must call this the read Contrast with the non-all version where any subset may or may not call it Allows for many optimizations

27 PPC 2011 - MPI Parallel File I/O27 Split Collective I/O MPI_File_write_all_begin(fh, buf, count, datatype); // available on Blue Gene/L, but may not improve // performance for (i=0; i<1000; i++) { /* perform computation */ } MPI_File_write_all_end(fh, buf, &status); A restricted form of nonblocking collective I/O Only one active nonblocking collective operation allowed at a time on a file handle Therefore, no request object necessary

28 PPC 2011 - MPI Parallel File I/O28 Passing Hints to the Implementation MPI_Info info; MPI_Info_create(&info); /* no. of I/O devices to be used for file striping */ MPI_Info_set(info, "striping_factor", "4"); /* the striping unit in bytes */ MPI_Info_set(info, "striping_unit", "65536"); MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile", MPI_MODE_CREATE | MPI_MODE_RDWR, info, &fh); MPI_Info_free(&info);

29 PPC 2011 - MPI Parallel File I/O29 Examples of Hints (used in ROMIO) striping_unit striping_factor cb_buffer_size cb_nodes ind_rd_buffer_size ind_wr_buffer_size start_iodevice pfs_svr_buf direct_read direct_write MPI-2 predefined hints New Algorithm Parameters Platform-specific hints

30 PPC 2011 - MPI Parallel File I/O30 I/O Consistency Semantics The consistency semantics specify the results when multiple processes access a common file and one or more processes write to the file MPI guarantees stronger consistency semantics if the communicator used to open the file accurately specifies all the processes that are accessing the file, and weaker semantics if not The user can take steps to ensure consistency when MPI does not automatically do so

31 PPC 2011 - MPI Parallel File I/O31 Example 1 File opened with MPI_COMM_WORLD. Each process writes to a separate region of the file and reads back only what it wrote. MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_read_at(off=0,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=100,cnt=100) Process 0Process 1 MPI guarantees that the data will be read correctly

32 PPC 2011 - MPI Parallel File I/O32 Example 2 Same as example 1, except that each process wants to read what the other process wrote (overlapping accesses) In this case, MPI does not guarantee that the data will automatically be read correctly In the above program, the read on each process is not guaranteed to get the data written by the other process! /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) Process 0Process 1

33 PPC 2011 - MPI Parallel File I/O33 Example 2 contd. The user must take extra steps to ensure correctness There are three choices: –set atomicity to true –close the file and reopen it –ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process/MPI rank Can hurt performance….

34 PPC 2011 - MPI Parallel File I/O34 Example 2, Option 1 Set atomicity to true MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh1,1) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh2,1) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) Process 0Process 1

35 PPC 2011 - MPI Parallel File I/O35 Example 2, Option 2 Close and reopen file MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_close MPI_Barrier MPI_File_open(MPI_COMM_WORLD,…) MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_close MPI_Barrier MPI_File_open(MPI_COMM_WORLD,…) MPI_File_read_at(off=0,cnt=100) Process 0Process 1

36 PPC 2011 - MPI Parallel File I/O36 Example 2, Option 3 Ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process a sequence is a set of operations between any pair of open, close, or file_sync functions a write sequence is a sequence in which any of the functions is a write operation

37 PPC 2011 - MPI Parallel File I/O37 Example 2, Option 3 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_Barrier MPI_File_sync MPI_File_read_at(off=100,cnt=100) MPI_File_close MPI_File_open(MPI_COMM_WORLD,…) MPI_File_sync /*collective*/ MPI_Barrier MPI_File_sync MPI_File_write_at(off=100,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_File_read_at(off=0,cnt=100) MPI_File_close Process 0Process 1

38 PPC 2011 - MPI Parallel File I/O38 General Guidelines for Achieving High I/O Performance Buy sufficient I/O hardware for the machine Use fast file systems, not NFS-mounted home directories Do not perform I/O from one process only Make large requests wherever possible For noncontiguous requests, use derived datatypes and a single collective I/O call

39 PPC 2011 - MPI Parallel File I/O39 Optimizations Given complete access information, an implementation can perform optimizations such as: –Data Sieving: Read large chunks and extract what is really needed –Collective I/O: Merge requests of different processes into larger requests –Improved prefetching and caching

40 PPC 2011 - MPI Parallel File I/O40 Summary MPI-IO has many features that can help users achieve high performance The most important of these features are the ability to specify noncontiguous accesses, the collective I/O functions, and the ability to pass hints to the implementation Users must use the above features! In particular, when accesses are noncontiguous, users must create derived datatypes, define file views, and use the collective I/O functions


Download ppt "PPC 2011 - MPI Parallel File I/O1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer."

Similar presentations


Ads by Google