Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.

Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab Parallel netCDF Enabling High Performance Application I/O

Outline NetCDF overview Parallel netCDF and MPI-IO Progress on API implementation Preliminary performance evaluation using LBNL test suite

NetCDF Overview netCDF example { // CDL notation for netCDF dataset dimensions: // dimension names and lengths lat = 5, lon = 10, level = 4, time = unlimited; variables: // var types, names, shapes, attributes float temp(time,level,lat,lon); temp:long_name = "temperature"; temp:units = "celsius"; float rh(time,lat,lon); rh:long_name = "relative humidity"; rh:valid_range = 0.0, 1.0; // min and max int lat(lat), lon(lon), level(level), time(time); lat:units = "degrees_north"; lon:units = "degrees_east"; level:units = "millibars"; time:units = "hours since 1996-1-1"; // global attributes: :source = "Fictional Model Output"; data: // optional data assignments level = 1000, 850, 700, 500; lat = 20, 30, 40, 50, 60; lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15; time = 12; rh =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7,.1,.3,.1,.1,.1,.1,.5,.7,.8,.8,.1,.2,.2,.2,.2,.5,.7,.8,.9,.9,.1,.2,.3,.3,.3,.3,.7,.8,.9,.9, 0,.1,.2,.4,.4,.4,.4,.7,.9,.9; // 1 record allocated } NetCDF (network Common Data Form) is an API for reading/writing multi- dimensional data arrays Self-describing file format –A netCDF file includes information about the data it contains Machine independent –Portable file format Popular in both the fusion and climate communities

NetCDF File Format File header –Stores metadata for fixed-size arrays: number of arrays, dimension lists, global attribute list, etc. Array data –Fixed-size arrays Stored contiguously in file –Variable-size arrays Records from all variable-sized arrays are stored interleaved

NetCDF APIs Dataset APIs –Create /open/close a dataset, set the dataset to define/data mode, and synchronize dataset changes to disk Define mode APIs –Define dataset: add dimensions, variables Attribute APIs –Add, change, and read attributes of datasets Inquiry APIs –Inquire dataset metadata: dim(id, name, len), var(name, ndims, shape, id) Data mode APIs –Read/write variable (access method: single value, whole array, subarray, strided subarray, sampled subarray)

Serial vs. Parallel netCDF Serial netCDF –Parallel read Implemented by simply having all processors read the file independently Does NOT utilize native I/O provided on parallel file system – miss parallel optimizations –Sequential write Parallel writes are carried out by shipping data to a single process – overwhelm its memory capacity Parallel netCDF –Parallel read/write to a shared netCDF file –Built on top of MPI-IO which utilizes optimal I/O facilities provided by the parallel file systems –Can pass high-level access hints down to the file systems for further optimization P0P1P2P3 netCDF Parallel File System Parallel netCDF P0P1P2P3 Parallel File System

Design Parallel netCDF APIs Goals –Retain the original format Applications using original netCDF applications can access the same files –A new set of parallel APIs Prefix name “ncmpi_” and “nfmpi_” –Similar APIs Minimum changes from the original APIs for easy migration –Portable across machines –High performance Tune the API to provide better performance in today’s computing environments

Parallel File System Parallel file system consists of multiple I/O nodes –Increase bandwidth between compute and I/O nodes Each I/O node may contain more than one disk –Increase bandwidth between disks and I/O nodes A file is striped across all disks in a round-robin fashion –Maximize the possibility of parallel access switch network I/O Server I/O Server I/O Server Compute node File...

Parallel netCDF Parallel netCDF and MPI-IO Parallel netCDF APIs are the interfaces of applications to parallel file systems Parallel netCDF is implemented on top of MPI- IO ROMIO is an implementation of MPI-IO standard ROMIO is built on top of ADIO ADIO has implementations on various file systems, using optimal native I/O calls Compute node switch network I/O Server I/O Server I/O Server ROMIO ADIO User space File system space

Parallel API Implementations Dataset APIs –Collective calls –Add MPI communicator to define I/O process scope –Add MPI_Info to pass access hint for further optimization Define mode APIs –Collective calls Attribute APIs –Collective calls Inquiry APIs –Collective calls Data mode APIs –Collective mode (default) Ensure file consistency –Independent mode ncmpi_create/open( MPI_Comm comm const char *path, int cmode, MPI_Info info, int ncidp); ncmpi_begin_indep_data(int ncid); ncmpi_end_indep_data(int ncid); Switch in/out independent data mode File open

Data Mode APIs Collective and independent calls –With suffix “_all” or not High-level APIs –Mimics the original APIs –Easy path of migration to the parallel interface –Mapping netCDF access types to MPI derived datatypes Flexible APIs –Better handling of internal data representations –More fully expose the capabilities of MPI-IO to the programmer ncmpi_put/get_vars_types_all( int ncid, const MPI_Offset start[ ], const MPI_Offset count[ ] const MPI_Offset stride[ ], const unsigned char *buf); ncmpi_put/get_vars( int ncid, const MPI_Offset start[ ], const MPI_Offset count[ ] const MPI_Offset stride[ ], void *buf, int count, MPI_Datatype datatype); Flexible APIs High-level APIs

LBNL Benchmark Test suite –Developed by Chris Ding et al. at LBNL –Written in Fortran –Simple block partition patterns Access to a 3D array which is stored in a single netCDF file Running on IBM SP2 at NERSC, LBNL –Each compute node is an SMP with 16 processors –I/O is performed using all processors processor 0processor 4 XYZ partition XY partitionXZ partitionYZ partition processor 1 X partitionY partition processor 2 processor 3 processor 5 Z partition processor 6 processor 7 Y X Z

LBNL Results – 64 MB Array size – 256 x 256 x 256, real*4 Read –In some cases, performance improvement over the single processor –8 processor parallel read is 2-3 times faster than the serial netCDF Write –Performance is not better than serial netCDF, 7-8 times slower

Our Results – 64 MB Array size: 256 x 256 x 256, real*4 Run on IBM SP2 at SDSC I/O is performed using one processor per node

LBNL Results – 1 GB Array size – 512 x 512 x 512, real*8 Read –No better performance is observed Write –4-8 processor writes results in 2-3 times higher bandwidth than using a single processor

Our Results – 1 GB Array size: 512 x 512 x 512, real*8 Run on IBM SP2 at SDSC I/O is performed using one processor per node

Summary Complete the parallel C APIs Identify friendly users –ORNL, LBNL User reference manual Preliminary performance results –Using LBNL test suite: typical access patterns –Obtained scalable results

Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.

Similar presentations

Presentation on theme: "Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.

Similar presentations

Presentation on theme: "Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab."— Presentation transcript:

Similar presentations

About project

Feedback