HDF5 Metadata and Page Buffering Improving HDF5 metadata handling with L2 cache May 30-31, 2012 HDF5 Workshop at PSI
HDF5 metadata Metadata – data about data HDF5 metadata Structural metadata (describes HDF5 objects – groups, datasets, chunks, etc.) Group header B-Tree (to index objects, chunks) Local heap (to store link names) User defined metadata (HDF5 attributes) Created via the H5A calls Usually small – less than 1 KB Accessed frequently Small disk accesses are expensive May 30-31, 2012 HDF5 Workshop at PSI
Current handling of HDF5 metadata HDF5 implements metadata aggregators to allocate space in a file and to avoid small I/O Aggregator minimum size can be controlled by application (default is 2K, 0 disables aggregation) H5Pset_meta_block_size Size of metadata block is limited only by the order of space allocations Aggregator will go beyond minimum aggregation size if current allocation block is at the end of the file May 30-31, 2012 HDF5 Workshop at PSI
HDF5 metadata allocation Dataset array data HDF5 File Metadata is mixed with raw data in HDF5 file 2K metadata block; may be partially filled HDF5 File Metadata blocks of different lengths May 30-31, 2012 HDF5 Workshop at PSI
Current handling of HDF5 metadata Problems that affect metadata I/O Size of aggregation varies and is not stored in the file Library cannot take an advantage of reading metadata block since it doesn’t know the length of the block Metadata blocks are not aligned to the block size of the underlying file system and do not have size of some multiple of the file system block size May 30-31, 2012 HDF5 Workshop at PSI
Page buffering (L2 cache) Implement metadata (MD) aggregation in 64K pages MD pages are aligned in the file Perform all I/O in page-sized blocks or greater File format change Store MD allocation parameters in the HDF5 superblock extension message; can be ignored by readers Put a flag to indicate that some MD blocks are not aligned Implement page buffering (L2 cache) Currently in design stage May 30-31, 2012 HDF5 Workshop at PSI
New aggregator API calls Can set in file creation property lists Only set on file creation Permanent, stored in superblock when set H5Pget/set_aggregator_block_size May 30-31, 2012 HDF5 Workshop at PSI
HDF5 page buffering Page buffer contains MD pages (L2 cache) Metadata blocks are aligned HDF5 File Metadata blocks are multiples of 64K May 30-31, 2012 HDF5 Workshop at PSI
Data and metadata aggregators The new aggregators pack small raw data and metadata allocations into aligned blocks which work with the page buffer. data metadata HDF5 File Small allocations May 30-31, 2012 HDF5 Workshop at PSI
Thank You! Questions? May 30-31, 2012 HDF5 Workshop at PSI