April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

MPI Message Passing Interface
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
File Systems.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Memory Management.
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, HDF/HDF-EOS Workshop.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Operating System Chapter 7. Memory Management Lynn Choi School of Electrical Engineering.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
April 28, 2008LCI Tutorial1 HDF5 Tutorial LCI April 28, 2008.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
HDF 1 New Features in HDF Group Revisions HDF and HDF-EOS Workshop IX November 30, 2005.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
The HDF Group October 28, 2010NetcDF Workshop1 Introduction to HDF5 Quincey Koziol The HDF Group Unidata netCDF Workshop October 28-29,
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
CE Operating Systems Lecture 14 Memory management.
1 HDF5 Life cycle of data Boeing September 19, 2006.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop IX November 30, 2005.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
Intro to Parallel HDF5 10/17/151ICALEPCS /17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.
Parallel NetCDF Rob Latham Mathematics and Computer Science Division Argonne National Laboratory
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
1 Introduction to HDF5 Programming and Tools Boeing September 19, 2006.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
HDF and HDF-EOS Workshop XII
Module 11: File Structure
Parallel HDF5 Introductory Tutorial
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
Introduction to HDF5 Tutorial.
What NetCDF users should know about HDF5?
HDF and HDF-EOS Workshop XII
HDF5 Virtual Dataset Elena Pourmal Copyright 2017, The HDF Group.
Introduction to HDF5 Mike McGreevy The HDF Group
Moving applications to HDF
Presentation transcript:

April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV

April 28, 2008LCI Tutorial2 Parallel HDF5 Introductory Tutorial

April 28, 2008LCI Tutorial3 Outline Overview of Parallel HDF5 design Setting up parallel environment Programming model for Creating and accessing a File Creating and accessing a Dataset Writing and reading Hyperslabs Parallel tutorial available at

April 28, 2008LCI Tutorial4 Overview of Parallel HDF5 Design

April 28, 2008LCI Tutorial5 PHDF5 Requirements Support MPI programming PHDF5 files compatible with serial HDF5 files Shareable between different serial or parallel platforms Single file image to all processes One file per process design is undesirable Expensive post processing Not useable by different number of processes Standard parallel I/O interface Must be portable to different platforms

April 28, 2008LCI Tutorial6 PHDF5 Implementation Layers Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk PHDF5 built on top of standard MPI-IO API

April 28, 2008LCI Tutorial7 Parallel Environment Requirements MPI with MPI-IO. E.g., MPICH2 ROMIO Vendor’s MPI-IO Parallel file system. E.g., GPFS Lustre Specially configured NFS

April 28, 2008LCI Tutorial8 How to Compile PHDF5 Applications h5pcc – HDF5 C compiler command Similar to mpicc h5pfc – HDF5 F90 compiler command Similar to mpif90 To compile: % h5pcc h5prog.c % h5pfc h5prog.f90

April 28, 2008LCI Tutorial9 h5pcc/h5pfc -show option show displays the compiler commands and options without executing them, i.e., dry run % h5pcc –show Sample_mpio.c mpicc -I/home/packages/phdf5/include - D_LARGEFILE_SOURCE - D_LARGEFILE64_SOURCE - D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE - D_BSD_SOURCE -std=c99 -c Sample_mpio.c mpicc -std=c99 Sample_mpio.o - L/home/packages/phdf5/lib /home/packages/phdf5/lib/libhdf5_hl.a /home/packages/phdf5/lib/libhdf5.a -lz -lm -Wl,-rpath - Wl,/home/packages/phdf5/lib

April 28, 2008LCI Tutorial10 Collective vs. Independent Calls MPI definition of collective call All processes of the communicator must participate in the right order Independent means not collective Collective is not necessarily synchronous

April 28, 2008LCI Tutorial11 Programming Restrictions Most PHDF5 APIs are collective PHDF5 opens a parallel file with a communicator Returns a file-handle Future access to the file via the file-handle All processes must participate in collective PHDF5 APIs Different files can be opened via different communicators

April 28, 2008LCI Tutorial12 Examples of PHDF5 API Examples of PHDF5 collective API File operations: H5Fcreate, H5Fopen, H5Fclose Objects creation: H5Dcreate, H5Dopen, H5Dclose Objects structure: H5Dextend (increase dimension sizes) Array data transfer can be collective or independent Dataset operations: H5Dwrite, H5Dread

April 28, 2008LCI Tutorial13 What Does PHDF5 Support ? After a file is opened by the processes of a communicator All parts of file are accessible by all processes All objects in the file are accessible by all processes Multiple processes write to the same data array Each process writes to individual data array

April 28, 2008LCI Tutorial14 PHDF5 API Languages C and F90 language interfaces Platforms supported: Most platforms with MPI-IO supported IBM SP, Linux clusters, HP Alpha Clusters, SGI IRIX64/Altrix, Red Storm (Cray XT3), …

April 28, 2008LCI Tutorial15 Creating and Accessing a File Programming model HDF5 uses access template object (property list) to control the file access mechanism General model to access HDF5 file in parallel: Setup MPI-IO access template (access property list) Open File Access Data Close File

April 28, 2008LCI Tutorial16 Setup access template Each process of the MPI communicator creates an access template and sets it up with MPI parallel access information C: herr_t H5Pset_fapl_mpio(hid_t plist_id, MPI_Comm comm, MPI_Info info); F90: h5pset_fapl_mpio_f(plist_id, comm, info) integer(hid_t) :: plist_id integer :: comm, info plist_id is a file access property list identifier

April 28, 2008LCI Tutorial17 C Example Parallel File Create 23 comm = MPI_COMM_WORLD; 24 info = MPI_INFO_NULL; 26 /* 27 * Initialize MPI 28 */ 29 MPI_Init(&argc, &argv); 33 /* 34 * Set up file access property list for MPI-IO access 35 */ ->36 plist_id = H5Pcreate(H5P_FILE_ACCESS); ->37 H5Pset_fapl_mpio(plist_id, comm, info); 38 ->42 file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); 49 /* 50 * Close the file. 51 */ 52 H5Fclose(file_id); 54 MPI_Finalize();

April 28, 2008LCI Tutorial18 F90 Example Parallel File Create 23 comm = MPI_COMM_WORLD 24info = MPI_INFO_NULL 26CALL MPI_INIT(mpierror) 29! 30! Initialize FORTRAN predefined datatypes 32CALL h5open_f(error) 34! 35! Setup file access property list for MPI-IO access. ->37CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) ->38CALL h5pset_fapl_mpio_f(plist_id, comm, info, error) 40 ! 41 ! Create the file collectively. ->43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 45 ! 46 ! Close the file. 49 CALL h5fclose_f(file_id, error) 51 ! 52 ! Close FORTRAN interface 54 CALL h5close_f(error) 56 CALL MPI_FINALIZE(mpierror)

April 28, 2008LCI Tutorial19 Creating and Opening Dataset All processes of the communicator open/close a dataset by a collective call C: H5Dcreate or H5Dopen; H5Dclose F90: h5dcreate_f or h5dopen_f; h5dclose_f All processes of the communicator must extend an unlimited dimension dataset before writing to it C: H5Dextend F90: h5dextend_f

April 28, 2008LCI Tutorial20 C Example: Create Dataset 56 file_id = H5Fcreate(…); 57 /* 58 * Create the dataspace for the dataset. 59 */ 60 dimsf[0] = NX; 61 dimsf[1] = NY; 62 filespace = H5Screate_simple(RANK, dimsf, NULL); /* 65 * Create the dataset with default properties collective. 66 */ ->67 dset_id = H5Dcreate(file_id, “dataset1”, H5T_NATIVE_INT, 68 filespace, H5P_DEFAULT); 70 H5Dclose(dset_id); 71 /* 72 * Close the file. 73 */ 74 H5Fclose(file_id);

April 28, 2008LCI Tutorial21 F90 Example: Create Dataset 43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 73 CALL h5screate_simple_f(rank, dimsf, filespace, error) 76 ! 77 ! Create the dataset with default properties. 78 ! ->79 CALL h5dcreate_f(file_id, “dataset1”, H5T_NATIVE_INTEGER, filespace, dset_id, error) 90 ! 91 ! Close the dataset. 92 CALL h5dclose_f(dset_id, error) 93 ! 94 ! Close the file. 95 CALL h5fclose_f(file_id, error)

April 28, 2008LCI Tutorial22 Accessing a Dataset All processes that have opened dataset may do collective I/O Each process may do independent and arbitrary number of data I/O access calls C: H5Dwrite and H5Dread F90: h5dwrite_f and h5dread_f

April 28, 2008LCI Tutorial23 Programming model for dataset access Create and set dataset transfer property C: H5Pset_dxpl_mpio H5FD_MPIO_COLLECTIVE H5FD_MPIO_INDEPENDENT (default) F90: h5pset_dxpl_mpio_f H5FD_MPIO_COLLECTIVE_F H5FD_MPIO_INDEPENDENT_F (default) Access dataset with the defined transfer property

April 28, 2008LCI Tutorial24 C Example: Collective write 95 /* 96 * Create property list for collective dataset write. 97 */ 98 plist_id = H5Pcreate(H5P_DATASET_XFER); ->99 H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE); status = H5Dwrite(dset_id, H5T_NATIVE_INT, 102 memspace, filespace, plist_id, data);

April 28, 2008LCI Tutorial25 F90 Example: Collective write 88 ! Create property list for collective dataset write 89 ! 90 CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) ->91 CALL h5pset_dxpl_mpio_f(plist_id, & H5FD_MPIO_COLLECTIVE_F, error) ! 94 ! Write the dataset collectively. 95 ! 96 CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, & error, & file_space_id = filespace, & mem_space_id = memspace, & xfer_prp = plist_id)

April 28, 2008LCI Tutorial26 Writing and Reading Hyperslabs Distributed memory model: data is split among processes PHDF5 uses HDF5 hyperslab model Each process defines memory and file hyperslabs Each process executes partial write/read call Collective calls Independent calls

April 28, 2008LCI Tutorial27 P0 P1 File Example 1: Writing dataset by rows P2 P3

April 28, 2008LCI Tutorial28 Writing by rows: Output of h5dump HDF5 "SDS_row.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13 }

April 28, 2008LCI Tutorial29 Memory File Example 1: Writing dataset by rows count[0] = dimsf[0]/mpi_size count[1] = dimsf[1]; offset[0] = mpi_rank * count[0]; /* = 2 */ offset[1] = 0; count[0] count[1] offset[0] offset[1] Process 1

April 28, 2008LCI Tutorial30 Example 1: Writing dataset by rows 71 /* 72 * Each process defines dataset in memory and * writes it to the hyperslab 73 * in the file. 74 */ 75 count[0] = dimsf[0]/mpi_size; 76 count[1] = dimsf[1]; 77 offset[0] = mpi_rank * count[0]; 78 offset[1] = 0; 79 memspace = H5Screate_simple(RANK,count,NULL); /* 82 * Select hyperslab in the file. 83 */ 84 filespace = H5Dget_space(dset_id); 85 H5Sselect_hyperslab(filespace, H5S_SELECT_SET,offset,NULL,count,NULL);

April 28, 2008LCI Tutorial31 P0 P1 File Example 2: Writing dataset by columns

April 28, 2008LCI Tutorial32 Writing by columns: Output of h5dump HDF5 "SDS_col.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 6 ) / ( 8, 6 ) } DATA { 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200 }

April 28, 2008LCI Tutorial33 Example 2: Writing dataset by column Process 1 Process 0 File Memory block[1] block[0] P0 offset[1] P1 offset[1] stride[1] dimsm[0] dimsm[1]

April 28, 2008LCI Tutorial34 Example 2: Writing dataset by column 85 /* 86 * Each process defines hyperslab in * the file 88 */ 89 count[0] = 1; 90 count[1] = dimsm[1]; 91 offset[0] = 0; 92 offset[1] = mpi_rank; 93 stride[0] = 1; 94 stride[1] = 2; 95 block[0] = dimsf[0]; 96 block[1] = 1; /* 99 * Each process selects hyperslab. 100 */ 101 filespace = H5Dget_space(dset_id); 102 H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride, count, block);

April 28, 2008LCI Tutorial35 Example 3: Writing dataset by pattern Process 0 Process 2 File Process 3 Process 1 Memory

April 28, 2008LCI Tutorial36 Writing by Pattern: Output of h5dump HDF5 "SDS_pat.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4 }

April 28, 2008LCI Tutorial37 Process 2 File Example 3: Writing dataset by pattern offset[0] = 0; offset[1] = 1; count[0] = 4; count[1] = 2; stride[0] = 2; stride[1] = 2; Memory stride[0] stride[1] offset[1] count[1]

April 28, 2008LCI Tutorial38 Example 3: Writing by pattern 90 /* Each process defines dataset in memory and 91 * writes it to the hyperslab in the file. 92 */ 93 count[0] = 4; 94 count[1] = 2; 95 stride[0] = 2; 96 stride[1] = 2; 97 if(mpi_rank == 0) { 98 offset[0] = 0; 99 offset[1] = 0; 100 } 101 if(mpi_rank == 1) { 102 offset[0] = 1; 103 offset[1] = 0; 104 } 105 if(mpi_rank == 2) { 106 offset[0] = 0; 107 offset[1] = 1; 108 } 109 if(mpi_rank == 3) { 110 offset[0] = 1; 111 offset[1] = 1; 112 }

April 28, 2008LCI Tutorial39 P0P2 File Example 4: Writing dataset by chunks P1P3

April 28, 2008LCI Tutorial40 Writing by Chunks: Output of h5dump utility HDF5 "SDS_chnk.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4 }

April 28, 2008LCI Tutorial41 Example 4: Writing dataset by chunks File Process 2: Memory block[0] = chunk_dims[0]; block[1] = chunk_dims[1]; offset[0] = chunk_dims[0]; offset[1] = 0; chunk_dims[0] chunk_dims[1] block[0] block[1] offset[0] offset[1]

April 28, 2008LCI Tutorial42 Example 4: Writing by chunks 97 count[0] = 1; 98 count[1] = 1 ; 99 stride[0] = 1; 100 stride[1] = 1; 101 block[0] = chunk_dims[0]; 102 block[1] = chunk_dims[1]; 103 if(mpi_rank == 0) { 104 offset[0] = 0; 105 offset[1] = 0; 106 } 107 if(mpi_rank == 1) { 108 offset[0] = 0; 109 offset[1] = chunk_dims[1]; 110 } 111 if(mpi_rank == 2) { 112 offset[0] = chunk_dims[0]; 113 offset[1] = 0; 114 } 115 if(mpi_rank == 3) { 116 offset[0] = chunk_dims[0]; 117 offset[1] = chunk_dims[1]; 118 }

April 28, 2008LCI Tutorial43 Parallel HDF5 Intermediate Tutorial

April 28, 2008LCI Tutorial44 Outline Performance Parallel tools

April 28, 2008LCI Tutorial45 My PHDF5 Application I/O “inhales” If my application I/O performance is bad, what can I do? Use larger I/O data sizes Independent vs Collective I/O Specific I/O system hints Parallel File System limits

April 28, 2008LCI Tutorial46 Independent Vs Collective Access User reported Independent data transfer mode was much slower than the Collective data transfer mode Data array was tall and thin: 230,000 rows by 6 columns : 230,000 rows :

April 28, 2008LCI Tutorial47 # of RowsData Size (MB) Independent (Sec.) Collective (Sec.) Independent vs Collective write 6 processes, IBM p-690, AIX, GPFS

April 28, 2008LCI Tutorial48 Independent vs Collective write (cont.)

April 28, 2008LCI Tutorial49 Effects of I/O Hints: IBM_largeblock_io GPFS at LLNL Blue 4 nodes, 16 tasks Total data size 1024MB I/O buffer size 1MB

April 28, 2008LCI Tutorial50 GPFS at LLNL ASCI Blue machine 4 nodes, 16 tasks Total data size 1024MB I/O buffer size 1MB Effects of I/O Hints: IBM_largeblock_io

April 28, 2008LCI Tutorial51 Parallel Tools ph5diff Parallel version of the h5diff tool h5perf Performance measuring tools showing I/O performance for different I/O API

April 28, 2008LCI Tutorial52 ph5diff An parallel version of the h5diff tool Supports all features of h5diff An MPI parallel tool Manager process (proc 0) coordinates each the remaining processes (workers) to “diff” one dataset at a time; collects any output from each worker and prints them out. Works best if there are many datasets in the files with few differences. Available in v1.8.

April 28, 2008LCI Tutorial53 h5perf An I/O performance measurement tool Test 3 File I/O API Posix I/O (open/write/read/close…) MPIO (MPI_File_{open,write,read.close}) PHDF5 H5Pset_fapl_mpio (using MPI-IO) H5Pset_fapl_mpiposix (using Posix I/O)

April 28, 2008LCI Tutorial54 h5perf: Some features Check (-c) verify data correctness Added 2-D chunk patterns in v1.8

April 28, 2008LCI Tutorial55 Useful Parallel HDF Links Parallel HDF information site Parallel HDF5 tutorial available at HDF Help address

April 28, 2008LCI Tutorial56 Questions? End of Part IV

April 28, 2008LCI Tutorial57 Caching and Buffering in HDF5 Tutorial Part V

April 28, 2008LCI Tutorial58 Software stack and the “magic box” Life cycle: What happens to data when it is transferred from application buffer to HDF5 file? File or other “storage” Virtual file I/O Library internals Object API Application Data buffer H5Dwrite Magic box Unbuffered I/O Data in a file

April 28, 2008LCI Tutorial59 Inside the magic box Understanding of what is happening to data inside the magic box will help to write efficient applications HDF5 library has mechanisms to control behavior inside the magic box Goals of this talk: Describe some basic operations and data structures and explain how they affect performance and storage sizes Give some “recipes” for how to improve performance

April 28, 2008LCI Tutorial60 Topics Dataset metadata and array data storage layouts Types of dataset storage layouts Factors affecting I/O performance I/O with compact datasets I/O with contiguous datasets I/O with chunked datasets Variable length data and I/O

April 28, 2008LCI Tutorial61 HDF5 dataset metadata and array data storage layouts

April 28, 2008LCI Tutorial62 HDF5 Dataset Data array Ordered collection of identically typed data items distinguished by their indices Metadata Dataspace: Rank, dimensions of dataset array Datatype: Information on how to interpret data Storage Properties: How array is organized on disk Attributes: User-defined metadata (optional)

April 28, 2008LCI Tutorial63 HDF5 Dataset DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype

April 28, 2008LCI Tutorial64 Metadata cache and array data Dataset array data typically kept in application memory Dataset header in separate space – metadata cache Application memory Metadata cache File Dataset array data HDF5 metadata Dataset array data Dataset header …………. Datatype Dataspace …………. Attributes …

April 28, 2008LCI Tutorial65 Metadata and metadata cache HDF5 metadata Information about HDF5 objects used by the library Examples: object headers, B-tree nodes for group, B-Tree nodes for chunks, heaps, super-block, etc. Usually small compared to raw data sizes (KB vs. MB-GB) Metadata cache Space allocated to handle pieces of the HDF5 metadata Allocated by the HDF5 library in application’s memory space Cache behavior affects overall performance Metadata cache implementation prior to HDF could cause performance degradation for some applications

April 28, 2008LCI Tutorial66 Types of data storage layouts

April 28, 2008LCI Tutorial67 HDF5 datasets storage layouts Contiguous Chunked Compact

April 28, 2008LCI Tutorial68 Contiguous storage layout Metadata header separate from raw data Raw data stored in one contiguous block on disk Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … File Dataset array data

April 28, 2008LCI Tutorial69 Chunked storage Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks Used for extendible datasets and datasets with filters applied (checksum, compression) HDF5 library treats each chunk as atomic object Greatly affects performance and file sizes

April 28, 2008LCI Tutorial70 Chunked storage layout Raw data divided into equal sized blocks (chunks). Each chunk stored separately as a contiguous block on disk Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … header File Dataset array data A BC ADCB D Chunk index

April 28, 2008LCI Tutorial71 Compact storage layout Data array and metadata stored together in the header File* * “File” may in fact be a collection of files, memory, or other storage destination. Application memory Dataset header …………. Datatype Dataspace …………. Attributes … Data Metadata cache Array data

April 28, 2008LCI Tutorial72 Factors affecting I/O performance

April 28, 2008LCI Tutorial73 What goes on inside the magic box? Operations on data inside the magic box Copying to/from internal buffers Datatype conversion Scattering - gathering Data transformation (filters, compression) Data structures used B-trees (groups, dataset chunks) Hash tables Local and Global heaps (variable length data: link names, strings, etc.) Other concepts HDF5 metadata, metadata cache Chunking, chunk cache

April 28, 2008LCI Tutorial74 Operations on data inside the magic box Copying to/from internal buffers Datatype conversion, such as float  integer LE  BE 64-bit integer to 16-bit integer Scattering - gathering Data is scattered/gathered from/to application buffers into internal buffers for datatype conversion and partial I/O Data transformation (filters, compression) Checksum on raw data and metadata (in 1.8.0) Algebraic transform GZIP and SZIP compressions User-defined filters

April 28, 2008LCI Tutorial75 I/O performance depends on Storage layouts Dataset storage properties Chunking strategy Metadata cache performance Datatype conversion performance Other filters, such as compression Access patterns

April 28, 2008LCI Tutorial76 I/O with different storage layouts

April 28, 2008LCI Tutorial77 Writing compact dataset Application memory Dataset header …………. Datatype Dataspace …………. Attributes … Data File Metadata cache Array data One write to store header and data array

April 28, 2008LCI Tutorial78 Writing contiguous dataset – no conversion Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … File Dataset array data

April 28, 2008LCI Tutorial79 Writing a contiguous dataset with datatype conversion Dataset header …………. Datatype Dataspace …………. Attribute 1 Attribute 2 ………… Application memory Metadata cache File Conversion buffer 1MB Dataset array data

April 28, 2008LCI Tutorial80 Partial I/O with contiguous datasets

April 28, 2008LCI Tutorial81 Writing whole dataset – contiguous rows File Application data in memory Data is contiguous in a file One I/O operation M rows M N

April 28, 2008LCI Tutorial82 Sub-setting of contiguous dataset Series of adjacent rows File N Application data in memory Subset – contiguous in file One I/O operation M rows M Entire dataset – contiguous in file

April 28, 2008LCI Tutorial83 Sub-setting of contiguous dataset Adjacent, partial rows File N M … Application data in memory Data is scattered in a file in M contiguous blocks Several small I/O operation N elements

April 28, 2008LCI Tutorial84 Sub-setting of contiguous dataset Extreme case: writing a column N M Application data in memory Subset data is scattered in a file in M different locations Several small I/O operation … 1 element

April 28, 2008LCI Tutorial85 Sub-setting of contiguous dataset Data sieve buffer File N M … Application data in memory Data is scattered in a file in M contiguous blocks 1 element Data is gathered in a sieve buffer in memory 64K memcopy

April 28, 2008LCI Tutorial86 Performance tuning for contiguous dataset Datatype conversion Avoid for better performance Use H5Pset_buffer function to customize conversion buffer size Partial I/O Write/read in big contiguous blocks Use H5Pset_sieve_buf_size to improve performance for complex subsetting

April 28, 2008LCI Tutorial87 I/O with Chunking

April 28, 2008LCI Tutorial88 Reminder – chunked storage layout Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … header File Dataset array data A BC ADCB D Chunk index

April 28, 2008LCI Tutorial89 Information about chunking HDF5 library treats each chunk as atomic object Compression is applied to each chunk Datatype conversion, other filters applied per chunk Chunk size greatly affects performance Chunk overhead adds to file size Chunk processing involves many steps Chunk cache Caches chunks for better performance Created for each chunked dataset Size of chunk cache is set for file (default size 1MB) Each chunked dataset has its own chunk cache Chunk may be too big to fit into cache Memory may grow if application keeps opening datasets

April 28, 2008LCI Tutorial90 Chunk cache Dataset_1 header ………… Application memory Metadata cache Chunking B-tree nodes Chunk cache Default size is 1MB Dataset_N header ………… ………

April 28, 2008LCI Tutorial91 Writing chunked dataset CB A ………….. Compression performed when chunk evicted from the chunk cache Other filters applied as data goes through filter pipeline ABC C File Chunk cacheChunked dataset Filter pipeline

April 28, 2008LCI Tutorial92 Partial I/O with Chunking

April 28, 2008LCI Tutorial93 Partial I/O for chunked dataset Example: write the green subset from the dataset, converting the data Dataset is stored as six chunks in the file. The subset spans four chunks, numbered 1-4 in the figure. Hence four chunks must be written to the file. But first, the four chunks must be read from the file, to preserve those parts of each chunk that are not to be overwritten

April 28, 2008LCI Tutorial94 Partial I/O for chunked dataset For each of the four chunks: Read chunk from file into chunk cache, unless it’s already there. Determine which part of the chunk will be replaced by the selection. Replace that part of the chunk in the cache with the corresponding elements from the application’s array. Move those elements to conversion buffer and perform conversion Move those elements back from conversion buffer to chunk cache. Apply filters (compression) when chunk is flushed from chunk cache For each element 3 memcopy performed

April 28, 2008LCI Tutorial95 Partial I/O for chunked dataset 3 Application memory memcopy Application buffer Chunk Elements participating in I/O are gathered into corresponding chunk Chunk cache 3

April 28, 2008LCI Tutorial96 Partial I/O for chunked dataset 3 Conversion buffer Memcopy Application memory Chunk cache File Chunk Compress and write to file

April 28, 2008LCI Tutorial97 Variable length data and I/O

April 28, 2008LCI Tutorial98 Examples of variable length data String A[0] “the first string we want to write” ………………………………… A[N-1] “the N-th string we want to write” Each element is a record of variable-length A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10] A[1] (0,0,110,2005) [length = 4] ……………………….. A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]

April 28, 2008LCI Tutorial99 Variable length data in HDF5 Variable length description in HDF5 application typedef struct { size_t length; void *p; }hvl_t; Base type can be any HDF5 type H5Tvlen_create(base_type) ~ 20 bytes overhead for each element Data cannot be compressed

April 28, 2008LCI Tutorial100 How variable length data is stored in HDF5 Global heap Actual variable length data Dataset with variable length elements Pointer into global heap File Dataset header

April 28, 2008LCI Tutorial101 Variable length datasets and I/O When writing variable length data, elements in application buffer point to global heaps in the metadata cache where actual data is stored. Global heap Application buffer Raw data

April 28, 2008LCI Tutorial102 There may be more than one global heap Global heap Application buffer Raw data Global heap

April 28, 2008LCI Tutorial103 Variable length datasets and I/O Raw data Global heap File

April 28, 2008LCI Tutorial104 VL chunked dataset in a file File Dataset header Chunk B-tree Dataset chunksHeaps with VL data

April 28, 2008LCI Tutorial105 Writing chunked VL datasets Dataset header ………… Application memory Metadata cache B-tree nodes Chunk cache ……… Conversion buffer Raw data Global heap Chunk cache VL chunked dataset with selected region File Filter pipeline

April 28, 2008LCI Tutorial106 Hints for variable length data I/O Avoid closing/opening a file while writing VL datasets Global heap information is lost Global heaps may have unused space Avoid alternately writing different VL datasets Data from different datasets will go into to the same heap If maximum length of the record is known, consider using fixed-length records and compression

April 28, 2008LCI Tutorial107 Questions? End of Part V