May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics
May 30-31, 2012HDF5 Workshop at PSI2 Outline Overview of HDF5 Topics not covered by PSI HDF5 Tutorial Groups and Links
HDF5 File HDF5 Workshop at PSI3 lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6 An HDF5 file is a container that holds data objects. Experiment Notes: Serial Number: Date: 3/13/09 Configuration: Standard 3 May 30-31, 2012
HDF5 File 4 lat | lon | temp ----|-----| | 23 | | 23 | | 24 | | 24 | | 21 | | 21 | 3.6 / SimOut Viz HDF5 groups and links organize data objects. Every HDF5 file has a root group Parameters 10;100;1000 Timestep 36,000 May 30-31, 2012HDF5 Workshop at PSI Similar to UNIX directories
HDF5 Software Layers & Storage HDF5 File Format File Split Files File on Parallel Filesystem Other I/O Drivers Virtual File Layer Posix I/O Split Files MPI I/OCustom Internals Memory Mgmt Datatype Conversion Filters Chunked Storage Version Compatibility and so on… Language Interfaces C, Fortran, C++ HDF5 Data Model Objects Groups, Datasets, Attributes, … Tunable Properties Chunk Size, I/O Driver, … HDF5 Library Storage h5dump tool High Level APIs HDFview tool Tools h5repack tool Java Interface … API 5May 30-31, 2012HDF5 Workshop at PSI
GROUPS AND LINKS May 30-31, 2012HDF5 Workshop at PSI6
May 30-31, 2012HDF5 Workshop at PSI7 Groups and Links Groups are containers for links (graph edges) Links were added in Warning: Many APIs in H5G interface are obsolete - use H5L interfaces to discover and manipulate file structure
Example h5_links.py 8 / B A Different kinds of links May 30-31, 2012HDF5 Workshop at PSI a External a soft dangling dset.h5 links.h5 Dataset can be “reached” using three paths /A/a /a /soft Dataset is in a different file
May 30-31, 2012HDF5 Workshop at PSI9 Links Name Example: “A”, “B”, “a”, “dangling”, “soft” Unique within a group; “/” are not allowed in names Type Hard Link Value is object’s address in a file Created automatically when object is created Can be added to point to existing object Soft Link Value is a string, for example, “/A/a”, but can be anything Use to create aliases
May 30-31, 2012HDF5 Workshop at PSI10 Links (cont.) Type External Link Value is a pair of strings, for example, (“dset.h5”, “dset” ) Use to access data in other HDF5 files HDF introduced caching of files opened via external links H5Pset_elink_file_cache_size
May 30-31, 2012HDF5 Workshop at PSI11 Links Properties ASCII or UTF-8 encoding for names Create intermediate groups Saves programming effort C example lcpl_id = H5Pcreate(H5P_LINK_CREATE); H5Pset_create_intermediate_group( lcpl_id, 1 ); H5Gcreate (fid, "A/B", lcpl_id, H5P_DEFAULT, H5P_DEFAULT); Group “A” will be created if it doesn’t exist
May 30-31, 2012HDF5 Workshop at PSI12 Operations on Links See H5L interface in Reference Manual Create Delete Copy Iterate Check if exists
May 30-31, 2012HDF5 Workshop at PSI13 Groups Properties Creation properties Type of links storage Compact (in 1.8.* versions) Used with a few members (default under 8) Dense (default behavior) Used with many (>16) members (default) Tunable size for a local heap Save space by providing estimate for size of the storage required for links names Can be compressed (in and later) Many links with similar names (XXX-abc, XXX-d, XXX- efgh, etc.) Requires more time to compress/uncompress data
May 30-31, 2012HDF5 Workshop at PSI14 Groups Properties Creation properties Links may have creation order tracked and indexed Indexing by name (default) A, B, a, dangling, soft Indexing by creation order (has to be enabled) A, B, a, soft, dangling ples-by-api/api18-c.htmlhttp:// ples-by-api/api18-c.html
May 30-31, 2012HDF5 Workshop at PSI15 Discovering HDF5 file’s structure HDF5 provides C and Fortran 2003 APIs for recursive and non-recursive iterations over the groups and attributes H5Ovisit and H5Literate (H5Giterate) H5Aiterate Life is much easier with H5Py (h5_visita.py) import h5py def print_info(name, obj): print name for name, value in obj.attrs.iteritems(): print name+":", value f = h5py.File('GATMO-SATMS-npp.h5', 'r+') f.visititems(print_info) f.close()
May 30-31, 2012HDF5 Workshop at PSI16 Checking a path in HDF5 HDF provides HL C and Fortran 2003 APIs for checking if paths exists H5LTvalid_path (h5ltvalid_path_f) Example: Is there an object with a path /A/B/C/d ? TRUE if there is a path, FALSE otherwise
Hints Use latest file format (see H5Pset_libver_bound function in RM) Save space when creating a lot of groups in a file Save time when accessing many objects (>1000) Caution: Tools built with the HDF5 versions prior to will not work on the files created with this property May 30-31, HDF5 Workshop at PSI
May 30-31, 2012HDF5 Workshop at PSI18 Informal Benchmark Create a file and a group in a file Create up to 10^6 groups with one dataset in each group Compare files sizes and performance of HDF using the latest group format with the performance of HDF (default, old format) and Note: Default and became very slow after groups
Time to Open and Read a Dataset May 30-31, 2012HDF5 Workshop at PSI19
Time to Close the File May 30-31, 2012HDF5 Workshop at PSI20
File Size May 30-31, 2012HDF5 Workshop at PSI21
DATATYPES May 30-31, 2012HDF5 Workshop at PSI22
Datatypes See Tutorial examples May 30-31, HDF5 Workshop at PSI
Thank You! Questions? May 30-31, 2012HDF5 Workshop at PSI24