Prof. Chris Carothers Computer Science Department MRC 309a

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
File Consistency in a Parallel Environment Kenin Coloma
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Reference: / MPI Program Structure.
Introduction to MPI-IO. 2 Common Ways of Doing I/O in Parallel Programs Sequential I/O: –All processes send data to rank 0, and 0 writes it to the file.
Parallel I/O. Objectives The material covered to this point discussed how multiple processes can share data stored in separate memory spaces (See Section.
Parallel I/O. 2 Common Ways of Doing I/O in Parallel Programs Sequential I/O: –All processes send data to rank 0, and 0 writes it to the file.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
PPC MPI Parallel File I/O1 CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
Introduction to MPI-IO Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
1 Outline l Performance Issues in I/O interface design l MPI Solutions to I/O performance issues l The ROMIO MPI-IO implementation.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
MPI-2 Sathish Vadhiyar Using MPI2: Advanced Features of the Message-Passing.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Hybrid MPI and OpenMP Parallel Programming
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
I/O on Clusters Rajeev Thakur Argonne National Laboratory.
CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Sending large message counts (The MPI_Count issue)
Parallel NetCDF Rob Latham Mathematics and Computer Science Division Argonne National Laboratory
MPI-2 Sathish Vadhiyar Using MPI2: Advanced Features of the Message-Passing.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
CS 591x Overview of MPI-2. Major Features of MPI-2 Superset of MPI-1 Parallel IO (previously discussed) Standard Process Startup Dynamic Process Management.
Message Passing Interface Using resources from
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
Computational Methods in Astrophysics Dr Rob Thacker (AT319E)
MPI IO Parallel Distributed Systems File Management I/O Peter Collins, Khadouj Fikry.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
The Machine Model Memory
Chapter 4.
Introduction to parallel computing concepts and technics
MPI Basics.
Parallel I/O Optimizations
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
Introduction to MPI.
MPI Message Passing Interface
Computer Science Department
CS4961 Parallel Programming Lecture 17: Message Passing, cont
CS 584.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
MPI-Message Passing Interface
Message Passing Models
CS 5334/4390 Spring 2017 Rogelio Long
Prof. Thomas Sterling Department of Computer Science
Lecture 14: Inter-process Communication
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message Passing Programming Based on MPI
Computer Science Department
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 10-11:30 a.m. MPI File I/O Prof. Chris Carothers Computer Science Department MRC 309a chrisc@cs.rpi.edu www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2019 Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt PPC 2019 - MPI Parallel File I/O

Common Ways of Doing I/O in Parallel Programs Sequential I/O: All processes send data to rank 0, and 0 writes it to the file PPC 2019 - MPI Parallel File I/O

Pros and Cons of Sequential I/O parallel machine may support I/O from only one process (e.g., no common file system) Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO) not parallel resulting single file is handy for ftp, mv big blocks improve performance short distance from original, serial code Cons: lack of parallelism limits scalability, performance (single node bottleneck) PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Another Way Each process writes to a separate file Pros: parallelism, high performance Cons: lots of small files to manage LOTS OF METADATA – stress parallel filesystem difficult to read back data from different number of processes PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O What is Parallel I/O? Multiple processes of a parallel program accessing data (reading or writing) from a common file FILE P(n-1) P0 P1 P2 PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Why Parallel I/O? Non-parallel I/O is simple but Poor performance (single process writes to one file) or Awkward and not interoperable with other tools (each process writes a separate file) Parallel I/O Provides high performance Can provide a single file that can be used with other tools (such as visualization programs) PPC 2019 - MPI Parallel File I/O

Why is MPI a Good Setting for Parallel I/O? Writing is like sending a message and reading is like receiving. Any parallel I/O system will need a mechanism to define collective operations (MPI communicators) define noncontiguous data layout in memory and file (MPI datatypes) Test completion of nonblocking operations (MPI request objects) i.e., lots of MPI-like machinery PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O MPI-IO Background Marc Snir et al (IBM Watson) paper exploring MPI as context for parallel I/O (1994) MPI-IO email discussion group led by J.-P. Prost (IBM) and Bill Nitzberg (NASA), 1994 MPI-IO group joins MPI Forum in June 1996 MPI-2 standard released in July 1997 MPI-IO is Chapter 9 of MPI-2 PPC 2019 - MPI Parallel File I/O

Using MPI for Simple I/O FILE P0 P1 P2 P(n-1) Each process needs to read a chunk of data from a common file PPC 2019 - MPI Parallel File I/O

Using Individual File Pointers #include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET); MPI_File_read(fh, buf, nints, MPI_INT, &status); MPI_File_close(&fh); } PPC 2019 - MPI Parallel File I/O

Using Explicit Offsets #include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_read_at(fh, rank*bufsize, buf, nints, MPI_INT, &status); MPI_File_close(&fh); } PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Function Details MPI_File_open(MPI_Comm comm, char *file, int mode, MPI_Info info, MPI_File *fh) (note: mode = MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY, MPI_MODE_CREATE, MPI_MODE_EXCL, MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN, MPI_MODE_SEQUENTIAL, MPI_MODE_APPEND) MPI_File_close(MPI_File *fh) MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_read_at(MPI_File fh, int offset, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_seek(MPI_File fh, MPI_Offset offset, in whence); (note: whence = MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END) MPI_File_write(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status) MPI_File_write_at( …same as read_at … ); (Note: Many other functions to get/set properties (see Gropp et al)) PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Writing to a File Use MPI_File_write or MPI_File_write_at Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran PPC 2019 - MPI Parallel File I/O

MPI Datatype Interlude Datatypes in MPI Elementary: MPI_INT, MPI_DOUBLE, etc everything we’ve used to this point Contiguous Next easiest: sequences of elementary types Vector Sequences separated by a constant “stride” PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O MPI Datatypes, cont Indexed: more general does not assume a constant stride Struct General mixed types (like C structs) PPC 2019 - MPI Parallel File I/O

Creating simple datatypes Let’s just look at the simplest types: contiguous and vector datatypes. Contiguous example Let’s create a new datatype which is two ints side by side. The calling sequence is MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype); MPI_Datatype newtype; MPI_Type_contiguous(2, MPI_INT, &newtype); MPI_Type_commit(newtype); /* required */ PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Using File Views Processes write to shared file MPI_File_set_view assigns regions of the file to separate processes PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O File Views Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view displacement = number of bytes to be skipped from the start of the file etype = basic unit of data access (can be any basic or derived datatype) filetype = specifies which portion of the file is visible to the process This is a collective operation and so all processors/ranks must use the same data rep, etypes in the group determined when the file was open.. PPC 2019 - MPI Parallel File I/O

File Interoperability Users can optionally create files with a portable binary data representation “datarep” parameter to MPI_File_set_view native - default, same as in memory, not portable internal - impl. defined representation providing an impl. defined level of portability external32 - a specific representation defined in MPI, (basically 32-bit big-endian IEEE format), portable across machines and MPI implementations PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O File View Example MPI_File thefile; for (i=0; i<BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, "testfile", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, myrank * BUFSIZE, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); MPI_File_write(thefile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&thefile); PPC 2019 - MPI Parallel File I/O

Ways to Write to a Shared File MPI_File_seek MPI_File_read_at MPI_File_write_at MPI_File_read_shared MPI_File_write_shared Collective operations like Unix seek combine seek and I/O for thread safety use shared file pointer good when order doesn’t matter PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Collective I/O in MPI A critical optimization in parallel I/O Allows communication of “big picture” to file system Framework for 2-phase I/O, in which communication precedes I/O (can use MPI machinery) Basic idea: build large blocks, so that reads/writes in I/O system will be large Small individual requests Large collective access PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Collective I/O MPI_File_read_all, MPI_File_read_at_all, etc _all indicates that all processes in the group specified by the communicator passed to MPI_File_open will call this function Each process specifies only its own access information -- the argument list is the same as for the non-collective functions PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Collective I/O By calling the collective I/O functions, the user allows an implementation to optimize the request based on the combined request of all processes The implementation can merge the requests of different processes and service the merged request efficiently Particularly effective when the accesses of different processes are noncontiguous and interleaved PPC 2019 - MPI Parallel File I/O

Collective non-contiguous MPI-IO examples #define “mpi.h” #define FILESIZE 1048576 #define INTS_PER_BLK 16 int main(int argc, char **argv){ int *buf, rank, nprocs, nints, bufsize; MPI_File fh; MPI_Datatype filetype; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; buf = (int *) malloc(bufsize); nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, “filename”, MPI_MODE_RD_ONLY, MPI_INFO_NULL, &fh); MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK, INTS_PER_BLK*nprocs, MPI_INT, &filetype); MPI_Type_commit(&filetype); MPI_File_set_view(fh, INTS_PER_BLK*sizeof(int)*rank, MPI_INT, filetype, “native”, MPI_INFO_NULL); MPI_File_read_all(fh, buf, nints, MPI_INT, MPI_STATUS_IGNORE); MPI_Type_free(&filetype); free(buf) MPI_Finalize(); return(0); } PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O More on MPI_Read_all Note that the _all version has the same argument list Difference is that all processes involved in MPI_Open must call this the read Contrast with the non-all version where any subset may or may not call it Allows for many optimizations PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Split Collective I/O A restricted form of nonblocking collective I/O Only one active nonblocking collective operation allowed at a time on a file handle Therefore, no request object necessary MPI_File_write_all_begin(fh, buf, count, datatype); // available on Blue Gene/L, but may not improve // performance for (i=0; i<1000; i++) { /* perform computation */ } MPI_File_write_all_end(fh, buf, &status); PPC 2019 - MPI Parallel File I/O

Passing Hints to the Implementation MPI_Info info; MPI_Info_create(&info); /* no. of I/O devices to be used for file striping */ MPI_Info_set(info, "striping_factor", "4"); /* the striping unit in bytes */ MPI_Info_set(info, "striping_unit", "65536"); MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile", MPI_MODE_CREATE | MPI_MODE_RDWR, info, &fh); MPI_Info_free(&info); PPC 2019 - MPI Parallel File I/O

Examples of Hints (used in ROMIO) striping_unit striping_factor cb_buffer_size cb_nodes ind_rd_buffer_size ind_wr_buffer_size start_iodevice pfs_svr_buf direct_read direct_write MPI-2 predefined hints New Algorithm Parameters Platform-specific hints PPC 2019 - MPI Parallel File I/O

I/O Consistency Semantics The consistency semantics specify the results when multiple processes access a common file and one or more processes write to the file MPI guarantees stronger consistency semantics if the communicator used to open the file accurately specifies all the processes that are accessing the file, and weaker semantics if not The user can take steps to ensure consistency when MPI does not automatically do so PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Example 1 File opened with MPI_COMM_WORLD. Each process writes to a separate region of the file and reads back only what it wrote. MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_read_at(off=0,cnt=100) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=100,cnt=100) Process 0 Process 1 MPI guarantees that the data will be read correctly PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Example 2 Same as example 1, except that each process wants to read what the other process wrote (overlapping accesses) In this case, MPI does not guarantee that the data will automatically be read correctly Process 0 Process 1 /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) In the above program, the read on each process is not guaranteed to get the data written by the other process! PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Example 2 contd. The user must take extra steps to ensure correctness There are three choices: set atomicity to true close the file and reopen it ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process/MPI rank Can hurt performance…. PPC 2019 - MPI Parallel File I/O

Example 2, Option 1 Set atomicity to true MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh1,1) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_set_atomicity(fh2,1) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=0,cnt=100) Process 0 Process 1 PPC 2019 - MPI Parallel File I/O

Example 2, Option 2 Close and reopen file Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_close MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_close MPI_Barrier MPI_File_read_at(off=0,cnt=100) PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Example 2, Option 3 Ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process a sequence is a set of operations between any pair of open, close, or file_sync functions a write sequence is a sequence in which any of the functions is a write operation PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Example 2, Option 3 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_File_read_at(off=100,cnt=100) MPI_File_close MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=0,cnt=100) Process 0 Process 1 PPC 2019 - MPI Parallel File I/O

General Guidelines for Achieving High I/O Performance Buy sufficient I/O hardware for the machine Use fast file systems, not NFS-mounted home directories Do not perform I/O from one process only Make large requests wherever possible For noncontiguous requests, use derived datatypes and a single collective I/O call PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Optimizations Given complete access information, an implementation can perform optimizations such as: Data Sieving: Read large chunks and extract what is really needed Collective I/O: Merge requests of different processes into larger requests Improved prefetching and caching PPC 2019 - MPI Parallel File I/O

PPC 2019 - MPI Parallel File I/O Summary MPI-IO has many features that can help users achieve high performance The most important of these features are the ability to specify noncontiguous accesses, the collective I/O functions, and the ability to pass hints to the implementation Users must use the above features! In particular, when accesses are noncontiguous, users must create derived datatypes, define file views, and use the collective I/O functions PPC 2019 - MPI Parallel File I/O