Intro to Parallel HDF5 10/17/151ICALEPCS 2015. 10/17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
11/6/07HDF and HDF-EOS Workshop XI, Landover, MD1 Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007.
10/15/08HDF & HDF-EOS Workshop XII11 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, HDF/HDF-EOS Workshop.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
1 Input/Output. 2 Principles of I/O Hardware Some typical device, network, and data base rates.
Parallel Processing LAB NO 1.
Director of Core Software & HPC
Euratom – ENEA Association Commonalities and differences between MDSplus and HDF5 data systems G. Manduchi Consorzio RFX, Euratom-ENEA Association, corso.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
Chapter 1. Introduction What is an Operating System? Mainframe Systems
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
April 28, 2008LCI Tutorial1 HDF5 Tutorial LCI April 28, 2008.
Using HDF5 in WRF Part of MEAD - an alliance expedition.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
The HDF Group October 28, 2010NetcDF Workshop1 Introduction to HDF5 Quincey Koziol The HDF Group Unidata netCDF Workshop October 28-29,
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Overview and Applications
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop IX November 30, 2005.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
Parallel NetCDF Rob Latham Mathematics and Computer Science Division Argonne National Laboratory
April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
1 Introduction to HDF5 Programming and Tools Boeing September 19, 2006.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
IO Best Practices For Franklin Katie Antypas User Services Group NERSC User Group Meeting September 19, 2007.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Parallel HDF5 Introductory Tutorial
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Introduction to HDF5 Tutorial.
What NetCDF users should know about HDF5?
Introduction to HDF5 Mike McGreevy The HDF Group
Lecture Topics: 11/1 General Operating System Concepts Processes
Hierarchical Data Format (HDF) Status Update
Presentation transcript:

Intro to Parallel HDF5 10/17/151ICALEPCS 2015

10/17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis Parallel tools PHDF5 Programming Model ICALEPCS 2015

10/17/153 Overview of Parallel HDF5 Design ICALEPCS 2015

10/17/154 PHDF5 Requirements Support Message Passing Interface (MPI) programming PHDF5 files compatible with serial HDF5 files Shareable between different serial or parallel platforms Single file image to all processes One file per process design is undesirable Expensive post processing Not usable by different number of processes Standard parallel I/O interface Must be portable to different platforms ICALEPCS 2015

10/17/155 PHDF5 Implementation Layers Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk PHDF5 built on top of standard MPI-IO API ICALEPCS 2015

10/17/156 MPI-IO vs. HDF5 MPI-IO is an Input/Output API. It treats the data file as a “linear byte stream” and each MPI application needs to provide its own file view and data representations to interpret those bytes. All data stored are machine dependent except the “external32” representation. External32 is defined in Big Endianness Little-endian machines have to do the data conversion in both read or write operations. 64bit sized data types may lose information. ICALEPCS 2015

10/17/157 MPI-IO vs. HDF5 Cont. HDF5 is a data management software. It stores the data and metadata according to the HDF5 data format definition. HDF5 file is self-described. Each machine can store the data in its own native representation for efficient I/O without loss of data precision. Any necessary data representation conversion is done by the HDF5 library automatically. ICALEPCS 2015

10/17/158 Performance Analysis Some common causes of poor performance Possible solutions: Use larger I/O sizes Use specific I/O system hints Independent vs. Collective access ICALEPCS 2015

10/17/159 Independent vs. Collective Access User reported Independent data transfer mode was much slower than the Collective data transfer mode Data array was tall and thin: 230,000 rows by 6 columns : 230,000 rows : ICALEPCS 2015

Collective vs. Independent Calls MPI definition of collective calls All processes of the communicator must participate in the right order. E.g., Process1 Process2 call A(); call B(); call A(); call B(); **right** call A(); call B(); call B(); call A(); **wrong** Independent means not collective Collective is not necessarily synchronous 10/17/1510ICALEPCS 2015

Debug Slow Parallel I/O Speed(1) Writing to one dataset Using 4 processes == 4 columns data type is 8 bytes doubles 4 processes, 1000 rows == 4x1000x8 = 32,000 bytes % mpirun -np 4./a.out i t 1000 Execution time: s. % mpirun -np 4./a.out i t 2000 Execution time: s. # Difference of 2 seconds for 1000 more rows = 32,000 Bytes. # A speed of 16KB/Sec!!! Way too slow. 10/17/1511ICALEPCS 2015

Debug Slow Parallel I/O Speed(2) Build a version of PHDF5 with./configure --enable-debug --enable-parallel … This allows the tracing of MPIO I/O calls in the HDF5 library. E.g., to trace MPI_File_read_xx and MPI_File_write_xx calls % setenv H5FD_mpio_Debug “rw” 10/17/1512ICALEPCS 2015

Debug Slow Parallel I/O Speed(3) % setenv H5FD_mpio_Debug ’rw’ % mpirun -np 4./a.out i t 1000# Indep.; contiguous. in H5FD_mpio_write mpi_off=0 size_i=96 in H5FD_mpio_write mpi_off=2056 size_i=8 in H5FD_mpio_write mpi_off=2048 size_i=8 in H5FD_mpio_write mpi_off=2072 size_i=8 in H5FD_mpio_write mpi_off=2064 size_i=8 in H5FD_mpio_write mpi_off=2088 size_i=8 in H5FD_mpio_write mpi_off=2080 size_i=8 … # total of 4000 of this little 8 bytes writes == 32,000 bytes. 10/17/1513ICALEPCS 2015

10/17/1514 Independent calls are many and small Each process writes one element of one row, skips to next row, write one element, so on. Each process issues 230,000 writes of 8 bytes each. Not good==just like many independent cars driving to work, waste gas, time, total traffic jam. : 230,000 rows : ICALEPCS 2015

Debug Slow Parallel I/O Speed (4) % setenv H5FD_mpio_Debug ’rw’ % mpirun -np 4./a.out i h 1000# Indep., Chunked by column. in H5FD_mpio_write mpi_off=0 size_i=96 in H5FD_mpio_write mpi_off=3688 size_i=8000 in H5FD_mpio_write mpi_off=11688 size_i=8000 in H5FD_mpio_write mpi_off=27688 size_i=8000 in H5FD_mpio_write mpi_off=19688 size_i=8000 in H5FD_mpio_write mpi_off=96 size_i=40 in H5FD_mpio_write mpi_off=136 size_i=544 in H5FD_mpio_write mpi_off=680 size_i=120 in H5FD_mpio_write mpi_off=800 size_i=272 … Execution time: s. 10/17/1515ICALEPCS 2015

10/17/1516 Use Collective Mode or Chunked Storage Collective mode will combine many small independent calls into few but bigger calls Chunks of columns speeds up too : 230,000 rows : ICALEPCS 2015

10/17/1517 # of RowsData Size (MB) Independent (Sec.) Collective (Sec.) Independent vs. Collective write 6 processes, IBM p-690, AIX, GPFS ICALEPCS 2015

10/17/1518 Independent vs. Collective write (cont.) ICALEPCS 2015

10/17/1519 Parallel Tools h5perf Performance measuring tools showing I/O performance for different I/O API ICALEPCS 2015

10/17/1520 h5perf An I/O performance measurement tool Test 3 File I/O API POSIX I/O (open/write/read/close…) MPIO (MPI_File_{open,write,read,close}) PHDF5 H5Pset_fapl_mpio (using MPI-IO) An indication of I/O speed upper limits ICALEPCS 2015

10/17/1521 h5perf: Some features Check (-c) verify data correctness Added 2-D chunk patterns in v1.8 -h shows the help page. ICALEPCS 2015

10/17/1522 Useful Parallel HDF Links Parallel HDF information site Parallel HDF5 tutorial available at Parallel FAQ quest.html#PARALLEL HDF Help address ICALEPCS 2015

10/17/1523 Questions? ICALEPCS 2015

10/17/1524 How to Compile PHDF5 Applications h5pcc – HDF5 C compiler command Similar to mpicc h5pfc – HDF5 F90 compiler command Similar to mpif90 To compile: % h5pcc h5prog.c % h5pfc h5prog.f90 ICALEPCS 2015

10/17/1525 h5pcc/h5pfc -show option -show displays the compiler commands and options without executing them, i.e., dry run % h5pcc -show Sample_mpio.c mpicc -I/home/packages/phdf5/include \ -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE \ -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE \ -D_BSD_SOURCE -std=c99 -c Sample_mpio.c mpicc -std=c99 Sample_mpio.o \ -L/home/packages/phdf5/lib \ home/packages/phdf5/lib/libhdf5_hl.a \ /home/packages/phdf5/lib/libhdf5.a -lz -lm -Wl,-rpath \ -Wl,/home/packages/phdf5/lib ICALEPCS 2015

10/17/1526 Programming Restrictions Most PHDF5 APIs are collective PHDF5 opens a parallel file with a communicator Returns a file-handle Future access to the file via the file-handle All processes must participate in collective PHDF5 APIs Different files can be opened via different communicators ICALEPCS 2015

Collective vs. Independent Calls MPI definition of collective calls All processes of the communicator must participate in the right order. E.g., Process1 Process2 call A(); call B(); call A(); call B(); **right** call A(); call B(); call B(); call A(); **wrong** Independent means not collective Collective is not necessarily synchronous 10/17/1527ICALEPCS 2015

10/17/1528 Examples of PHDF5 API Examples of PHDF5 collective API File operations: H5Fcreate, H5Fopen, H5Fclose Objects creation: H5Dcreate, H5Dclose Objects structure: H5Dextend (increase dimension sizes) Array data transfer can be collective or independent Dataset operations: H5Dwrite, H5Dread Collectiveness is indicated by function parameters, not by function names as in MPI API ICALEPCS 2015

10/17/1529 What Does PHDF5 Support ? After a file is opened by the processes of a communicator All parts of file are accessible by all processes All objects in the file are accessible by all processes Multiple processes may write to the same data array Each process may write to individual data array ICALEPCS 2015

10/17/1530 PHDF5 API Languages and Platforms C and F90 language interfaces Platforms supported: Most platforms with MPI-IO supported. E.g., IBM AIX Linux clusters Crays For performance parallel file system is needed, e.g., Lustre or GPFS ICALEPCS 2015

10/17/1531 Programming model for creating and accessing a file HDF5 uses access template object (property list) to control the file access mechanism General model to access HDF5 file in parallel: Use MPI-IO driver (via access property list) Open File Access Data Close File ICALEPCS 2015

10/17/1532 Setup MPI-IO access template Each process of the MPI communicator creates an access template and sets it up with MPI parallel access information C: herr_t H5Pset_fapl_mpio(hid_t plist_id, MPI_Comm comm, MPI_Info info); F90: h5pset_fapl_mpio_f(plist_id, comm, info) integer(hid_t) :: plist_id integer :: comm, info plist_id is a file access property list identifier ICALEPCS 2015

10/17/1533 C Example Parallel File Create 23 comm = MPI_COMM_WORLD; 24 info = MPI_INFO_NULL; 26 /* 27 * Initialize MPI 28 */ 29 MPI_Init(&argc, &argv); 30 /* 34 * Set up file access property list for MPI-IO access 35 */ ->36 plist_id = H5Pcreate(H5P_FILE_ACCESS); ->37 H5Pset_fapl_mpio(plist_id, comm, info); 38 ->42 file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); 49 /* 50 * Close the file. 51 */ 52 H5Fclose(file_id); 54 MPI_Finalize(); ICALEPCS 2015

10/17/1534 Creating and Opening Dataset All processes of the communicator open/close a dataset by a collective call C: H5Dcreate or H5Dopen; H5Dclose F90: h5dcreate_f or h5dopen_f; h5dclose_f All processes of the communicator must extend an unlimited dimension dataset before writing to it C: H5Dextend F90: h5dextend_f ICALEPCS 2015

10/17/1535 C Example: Create Dataset 56 file_id = H5Fcreate(…); 57 /* 58 * Create the dataspace for the dataset. 59 */ 60 dimsf[0] = NX; 61 dimsf[1] = NY; 62 filespace = H5Screate_simple(RANK, dimsf, NULL); /* 65 * Create the dataset with default properties collective. 66 */ ->67 dset_id = H5Dcreate(file_id, “dataset1”, H5T_NATIVE_INT, 68 filespace, H5P_DEFAULT); 70 H5Dclose(dset_id); 71 /* 72 * Close the file. 73 */ 74 H5Fclose(file_id); ICALEPCS 2015

10/17/1536 Accessing a Dataset All processes that have opened dataset may do collective I/O Each process may do independent and arbitrary number of data I/O access calls C: H5Dwrite and H5Dread F90: h5dwrite_f and h5dread_f ICALEPCS 2015

10/17/1537 Programming model for dataset access Create and set dataset transfer property C: H5Pset_dxpl_mpio H5FD_MPIO_COLLECTIVE H5FD_MPIO_INDEPENDENT (default) F90: h5pset_dxpl_mpio_f H5FD_MPIO_COLLECTIVE_F H5FD_MPIO_INDEPENDENT_F (default) Access dataset with the defined transfer property ICALEPCS 2015

10/17/1538 C Example: Collective write 95 /* 96 * Create property list for collective dataset write. 97 */ 98 plist_id = H5Pcreate(H5P_DATASET_XFER); ->99 H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE); status = H5Dwrite(dset_id, H5T_NATIVE_INT, 102 memspace, filespace, plist_id, data); ICALEPCS 2015

10/17/1539 Writing and Reading Hyperslabs Distributed memory model: data is split among processes PHDF5 uses HDF5 hyperslab model Each process defines memory and file hyperslabs Each process executes partial write/read call Collective calls Independent calls ICALEPCS 2015

10/17/1540 P0 P1 File Example: Writing dataset by rows P2 P3 This pattern is similar to JPSS aggregation ICALEPCS 2015

10/17/1541 Writing by rows: Output of h5dump HDF5 "SDS_row.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13 } ICALEPCS 2015

10/17/1542 Memory File Example: Writing dataset by rows count[0] = dimsf[0]/mpi_size count[1] = dimsf[1]; offset[0] = mpi_rank * count[0]; /* = 2 */ offset[1] = 0; count[0] count[1] offset[1] Process P1 offset[0] ICALEPCS 2015

10/17/1543 Example: Writing dataset by rows 71 /* 72 * Each process defines dataset in memory and * writes it to the hyperslab 73 * in the file. 74 */ 75 count[0] = dimsf[0]/mpi_size; 76 count[1] = dimsf[1]; 77 offset[0] = mpi_rank * count[0]; 78 offset[1] = 0; 79 memspace = H5Screate_simple(RANK,count,NULL); /* 82 * Select hyperslab in the file. 83 */ 84 filespace = H5Dget_space(dset_id); 85 H5Sselect_hyperslab(filespace, H5S_SELECT_SET,offset,NULL,count,NULL); H5Dwrite (dset_id, …, memspace,filespace,buf); ICALEPCS 2015

10/17/1544 Questions? ICALEPCS 2015

Thank You! Questions? 10/17/1545ICALEPCS 2015