Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.

Similar presentations


Presentation on theme: "Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad."— Presentation transcript:

1 Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad Lawrence Livermore National Lab

2 Parallel and Grid I/O Infrastructure Outline Introduction PVFS and ROMIO Parallel NetCDF Query Pattern Analysis Please interrupt at any point for questions!

3 Parallel and Grid I/O Infrastructure What is this project doing? Extending existing infrastructure work –PVFS parallel file system –ROMIO MPI-IO implementation Helping match application I/O needs to underlying capabilities –Parallel NetCDF –Query Pattern Analysis Linking with Grid I/O resources –PVFS backend for GridFTP striped server –ROMIO on top of Grid I/O API

4 Parallel and Grid I/O Infrastructure What Are All These Names? MPI - Message Passing Interface Standard –Also known as MPI-1 MPI-2 - Extensions to MPI standard –I/O, RDMA, dynamic processes MPI-IO - I/O part of MPI-2 extensions ROMIO - Implementation of MPI-IO –Handles mapping MPI-IO calls into communication (MPI) and file I/O PVFS - Parallel Virtual File System –An implementation of a file system for Linux clusters

5 Parallel and Grid I/O Infrastructure Fitting the Pieces Together Query Pattern AnalysisParallel NetCDF Any MPI-IO Implementation Query Pattern Analysis (QPA) and Parallel NetCDF both written in terms of MPI-IO calls –QPA tools pass information down through MPI-IO hints –Parallel NetCDF written using MPI-IO for data read/write ROMIO implementation uses PVFS as storage medium on Linux clusters or could hook to Grid I/O resources ROMIO MPI-IO Implementation PVFS Parallel File SystemGrid I/O Resources

6 Parallel and Grid I/O Infrastructure PVFS and ROMIO Provide a little background on the two –What they are, example to set context, status Motivate the work Discuss current research and development –I/O interfaces –MPI-IO Hints –PVFS2 Our work with these two closely tied together.

7 Parallel and Grid I/O Infrastructure Parallel Virtual File System Parallel file system for Linux clusters –Global name space –Distributed file data –Builds on TCP, local file systems Tuned for high performance concurrent access Mountable like NFS file systems User-level interface library (used by ROMIO) 200+ users on mailing list, 100+ downloads/month –Up from 160+ users in March Installations at OSC, Univ. of Utah, Phillips Petroleum, ANL, Clemson Univ., etc.

8 Parallel and Grid I/O Infrastructure PVFS Architecture Client - Server architecture Two server types –Metadata server (mgr) - keeps track of file metadata (permissions, owner) and directory structure –I/O servers (iod) - orchestrate movement of data between clients and local I/O devices Clients access PVFS one of two ways –MPI-IO (using ROMIO implementation) –Mount through Linux kernel (loadable module)

9 PVFS and ROMIO PVFS Performance Ohio Supercomputer Center cluster 16 I/O servers (IA32), 70+ clients (IA64), IDE disks Block partitioned data, accessed through ROMIO

10 Parallel and Grid I/O Infrastructure ROMIO Implementation of MPI-2 I/O specification –Operates on wide variety of platforms –Abstract Device Interface for I/O (ADIO) aids in porting to new file systems –Fortran and C bindings Successes –Adopted by industry (e.g. Compaq, HP, SGI) –Used at ASCI sites (e.g. LANL Blue Mountain) MPI-IO Interface ADIO Interface FS-Specific Code (e.g. AD_PVFS, AD_NFS)

11 Parallel and Grid I/O Infrastructure Example of Software Layers FLASH Astrophysics application stores checkpoints and visualization data using HDF5 HDF5 in turn uses MPI-IO (ROMIO) to write out its data files PVFS client library is used by ROMIO to write data to PVFS file system PVFS client library interacts with PVFS servers over network FLASH Astrophysics Code HDF5 I/O Library ROMIO MPI-IO Library PVFS Client Library PVFS Servers

12 Parallel and Grid I/O Infrastructure Example of Software Layers (2) FLASH Astrophysics application stores checkpoints and visualization data using HDF5 HDF5 in turn uses MPI-IO (IBM) to write out its data files GPFS File System stores data to disks FLASH Astrophysics Code HDF5 I/O Library IBM MPI-IO Library GPFS

13 Parallel and Grid I/O Infrastructure Status of PVFS and ROMIO Both are freely available, widely distributed, documented, and supported products Current work focuses on: –Higher performance through more rich file systems interfaces –Hint mechanisms for optimizing behavior of both ROMIO and PVFS –Scalability –Fault tolerance

14 Parallel and Grid I/O Infrastructure Why Does This Work Matter? Much of I/O on big machines goes through MPI-IO –Direct use of MPI-IO (visualization) –Indirect use through HDF5 or NetCDF (fusion, climate, astrophysics) –Hopefully soon through Parallel NetCDF! On clusters, PVFS is currently the most deployed parallel file system Optimizations in these layers are of direct benefit to those users Providing guidance to vendors for possible future improvements

15 Parallel and Grid I/O Infrastructure I/O Interfaces Scientific applications keep structured data sets in memory and in files For highest performance, the description of the structure must be maintained through software layers –Allow the scientist to describe the data layout in memory and file –Avoid packing into buffers in intermediate layers –Minimize the number of file system operations needed to perform I/O

16 Parallel and Grid I/O Infrastructure File System Interfaces MPI-IO is a great starting point Most underlying file systems only provide POSIX-like contiguous access List I/O work was first step in the right direction –Proposed FS interface –Allows movement of lists of data regions in memory and file with one call Memory File

17 Parallel and Grid I/O Infrastructure List I/O Implemented in PVFS Transparent to user through ROMIO Distributed in latest releases

18 Parallel and Grid I/O Infrastructure List I/O Example Simple datatype repeated over file Desire to read first 9 bytes This is converted into four [offset,length] pairs One can see how this process could result in a very large list of offsets and lengths

19 Parallel and Grid I/O Infrastructure Describing Regular Patterns List I/O can ’ t describe regular patterns (e.g. a column of a 2D matrix) in an efficient manner MPI datatypes can do this easily Datatype I/O is our solution to this problem –Concise set of datatype constructors used to describe types –API for passing these descriptions to a file system

20 Parallel and Grid I/O Infrastructure Datatype I/O Built using a generic datatype processing component (also used in MPICH2) –Optimizing for performance Prototype for PVFS in progress –API and server support Prototype of support in ROMIO in progress –Maps MPI datatypes to PVFS datatypes –Passes through new API This same generic datatype component could be used in other projects as well

21 Parallel and Grid I/O Infrastructure Datatype I/O Example Same datatype as in previous example Describe datatype with one construct: –index {(0,1), (2,2)} describes pattern of one short block and one longer one –automatically tiled (as with MPI types for files) Linear relationship between # of contiguous pieces and size of request is removed # of Bytes # of Datatypes12 3 Datatype 01234567891011 size of a byte

22 Parallel and Grid I/O Infrastructure MPI Hints for Performance ROMIO has a number of performance optimizations built in The optimizations are somewhat general, but there are tuning parameters that are very specific –buffer sizes –number and location of processes to perform I/O –data sieving and two-phase techniques Hints may be used to tune ROMIO to match the system

23 Parallel and Grid I/O Infrastructure ROMIO Hints Currently all of ROMIO ’ s optimizations may be controlled with hints –data sieving –two-phase I/O –list I/O –datatype I/O Additional hints are being considered to allow ROMIO to adapt to access patterns –collective-only I/O –sequential vs. random access –inter-file dependencies

24 Parallel and Grid I/O Infrastructure PVFS2 PVFS (version 1.x.x) plays an important role as a fast scratch file system for use today PVFS2 will supersede this version, adding –More comprehensive system management –Fault tolerance through lazy redundancy –Distributed metadata –Component-based approach for supporting new storage and network resources Distributed metadata and fault tolerance will extend scalability into thousands and tens of thousands of clients and hundreds of servers PVFS2 implementation is underway

25 Parallel and Grid I/O Infrastructure Summary ROMIO and PVFS are a mature foundation on which to make additional improvements New, rich I/O descriptions allow for higher performance access Addition of new hints to ROMIO allows for fine-tuning its operation PVFS2 focuses on the next generation of clusters


Download ppt "Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad."

Similar presentations


Ads by Google