April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II
April 28, 2008LCI Tutorial2 Outline Overview of HDF5 tools Using tools for problems troubleshooting
April 28, 2008LCI Tutorial3 HDF5 command-line tools Readers h5dump, h5diff, h5ls 1.8 tools: h5check, h5stat Writers h5repack, h5repart, h5import, h5jam/h5unjam 1.8 tools: h5copy, h5mkgrp Converters h4toh5, h5toh4, gif2h5, h52gif
April 28, 2008LCI Tutorial4 h5dump Dumps the content of an HDF5 file to standard output and optionally to the following types of files 1.ASCII text file 2.XML file 3.Binary file Flags to remember -H to print header information -p to print objects’ properties -b to export data in a binary form -o to export data to a file (text by default) -y to skip printing indices -w to specify line width
April 28, 2008LCI Tutorial5 h5dump -H SDS.h5 HDF5 "SDS.h5" { GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } }
April 28, 2008LCI Tutorial6 h5dump -d /Floats/FloatArray SDS.h5 HDF5 "SDS.h5" { DATASET "/Floats/FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } DATA { (0,0): 0.01, 0.02, 0.03, (1,0): 0.1, 0.2, 0.3, (2,0): 1, 2, 3, (3,0): 10, 20, 30 }
April 28, 2008LCI Tutorial7 h5dump -x SDS.h5
April 28, 2008LCI Tutorial8 h5dump binary output -b F, --binary=F The form of the binary output (F): MEMORY-- for memory type Data in a file will have the same data type as in memory FILE -- for the disk file type Data in a file will have the same data type as corresponding dataset in an HDF5 file LE -- for pre-defined little endian type H5T_IEEE_F64LE BE -- for pre-defined big endian type H5T_STD_I32BE
April 28, 2008LCI Tutorial9 h5dump -d /IntArray -o out_le.bin -b LE SDS.h5 od --width=24 -t x4 out_le.bin a b c d e f e f a b c d Dumps a 32-bit integer dataset, IntArray, from SDS.h5 to a little endian binary file out_le.bin
April 28, 2008LCI Tutorial10 h5diff Using h5diff, you can compare two objects in the same file compare two objects between two files compare all objects between two files
April 28, 2008LCI Tutorial11 h5diff SDS.h5 SDS2.h5 Dataset: and 5 differences found
April 28, 2008LCI Tutorial12 h5diff SDS.h5 SDS2.h5 -r /IntArray Dataset: and positionIntArrayIntArraydifference [ 0 0 ]01010 [ 1 0 ] [ 2 0 ] [ 3 0 ] [ 4 0 ] differences found
April 28, 2008LCI Tutorial13 h5repack Copies an HDF5 file to a new file with/without compression/chunking Remove un-used space Apply compression filter Apply layout
April 28, 2008LCI Tutorial14 h5repack: Applying filters -f FILTER GZIP, to apply GZIP compression SZIP, to apply SZIP compression SHUF, to apply the HDF5 shuffle filter FLET, to apply the HDF5 checksum filter NBIT, to apply NBIT compression SOFF, to apply the HDF5 Scale/Offset filter NONE, to remove all filters For example h5repack -i SDS2.h5 -o SDS2_compressed.h5 -f /IntArray:GZIP=9 Remember that if your data is smaller than 1K, compression will not be applied, see -m flag
April 28, 2008LCI Tutorial15 h5repack: Data layout -l LAYOUT CHUNK, to apply chunking layout COMPA, to apply compact layout CONTI, to apply continuous layout For example h5repack -i SDS.h5 -o SDS_chunk.h5 -l /Floats/FloatArray,/IntArray:CHUNK=2x3
April 28, 2008LCI Tutorial16 h5repart Repartitions a file or family of files For example h5repart -m 200m int16kx16k.h5 part200m%d.h5 977 MB 200 MB part200m0.h5 200 MB part200m1.h5 200 MB part200m2.h5 200 MB part200m3.h5 177 MB part200m1.h5
April 28, 2008LCI Tutorial17 h5import Imports binary/ASCII data into an HDF5 file h5import infile -c config_file [infile -c config_file2...] -outfile outfile Example: h5import float5x4x2.txt -c First_set.conf -o First_set.h5 PATH work/First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION-SIZES MAXIMUM-DIMENSIONS GROUP "/" { GROUP "work" { DATASET "First-set" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 5, 2, 4 ) / ( 8, 8, H5S_UNLIMITED ) } DATA { (0,0,0): 1.01, 1.02, 1.03, 1.04, (0,1,0): 1.11, 1.12, 1.13, 1.14, (1,0,0): 1.21, 1.22, 1.23, 1.24, (1,1,0): 1.31, 1.32, 1.33, 1.34, (2,0,0): 1.41, 1.42, 1.43, 1.44, (2,1,0): 1.51, 1.52, 1.53, 1.54, (3,0,0): 2.01, 2.02, 2.03, 2.04, (3,1,0): 2.11, 2.12, 2.13, 2.14, (4,0,0): 2.21, 2.22, 2.23, 2.24, (4,1,0): 2.31, 2.32, 2.33, 2.34 } }}
April 28, 2008LCI Tutorial18 h5jam/h5unjam Adds/removes a file at the beginning of an HDF5 file Example: h5jam -- adds text to User Block h5jam -u test_ub.txt -i test_ub.h5 h5unjam -- removes text from User Block h5unjam -i test_ub.h5 -o out_ub.txt -o out_ub.h5
April 28, 2008LCI Tutorial19 h5ls Lists selected information about file objects in the specified format Example: h5ls -r SDS2.h5 /Floats Group /Floats/DoubleArray Dataset {10, 5} /Floats/FloatArray Dataset {4, 3} /Floats/subs Group /IntArray Dataset {5, 6}
April 28, 2008LCI Tutorial20 gif2h5 / h52gif gif2h5 – Converts a GIF file into HDF5 gif2h5 apollo17_earth.gif apollo17_earth.h5 h52gif – Converts an HDF5 file into GIF h52gif apollo17_earth.h5 apollo17_earth2.gif -i /apollo17_earth.gif/Image0 -p "/apollo17_earth.gif/Global Palette"
April 28, 2008LCI Tutorial21 h5copy Copies an object from one location to another location within a file or across files Available in and later / FloatArray Floats IntArray / FloatArray
April 28, 2008LCI Tutorial22 h5copy usage: h5copy [OPTIONS] [OBJECTS...] -i, --input input file name -o, --output output file name -s, --source source object name -d, --destination destination object name -f, --flag shallow Copy only immediate members for groups soft Expand soft links into new objects ext Expand external links into new objects ref Copy objects that are pointed by references noattr Copy object without copying attributes
April 28, 2008LCI Tutorial23 h5copy Example h5copy -i SDS.h5 -o SDS_cp.h5 -s /Floats/FloatArray -d /FloatArray / FloatArray Floats IntArray / FloatArray SDS.h5 SDS_cp.h5
April 28, 2008LCI Tutorial24 h5copy -f shallow / i1 floats integers 64-bit i2 f32 f2f1 / floats 64-bit f32 f2f1 / floats 64-bit f32 -f shallow
April 28, 2008LCI Tutorial25 h5copy -f soft / -f soft dset_SL/f1 f1 / dset_SL/f1 f1 / dset_SL/f1
April 28, 2008LCI Tutorial26 h5copy -f ref / -f ref d1 dset_ref d / d1 dset_ref d / dset_ref 0 0
April 28, 2008LCI Tutorial27 h5stat Prints different statistics about HDF5 file Helps To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies Available in and later
April 28, 2008LCI Tutorial28 h5check Verifies if an HDF5 file is encoded according to the HDF5 File Format Specification Does not use HDF5 library Serves as a watch dog that the HDF5 library implementation is compliant with the HDF5 File Format Specification Tool is NOT a part of the HDF5 source code distribution
April 28, 2008LCI Tutorial29 How to use it? h5check [-vn] -vn verboseness mode n=0Terse—only prints if the file is compliant or not n=1Default—prints its progress and all errors found n=2Verbose—prints everything it knows, usually for debugging
April 28, 2008LCI Tutorial30 Example: a compliant file % h5check example1.h5 VALIDATING example1.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at VALIDATING the btree at FOUND btree signature. VALIDATING the local heap at FOUND local heap signature. … Result: File is in compliance.
April 28, 2008LCI Tutorial31 Example: a non-compliant file h5check invalid2.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at VALIDATING the btree at FOUND btree signature. VALIDATING the SNOD at FOUND SNOD signature. VALIDATING the object header at check_sym(at 1248): Errors from check_obj_header() decode_validate_messages(): Failure in type->decode(). H5O_sdspace_decode(): Bad version number in simple dataspace message. VALIDATING the local heap at FOUND local heap signature. Main(): Errors from check_obj_header(). decode_validate_messages(): Failure in type->decode(). H5O_attr_decode(): Can't decode attribute dataspace. H5O_sdspace_decode(): Bad version number in simple dataspace message. … Result: File is not in compliance.
April 28, 2008LCI Tutorial32 Using HDF5 Tools for Performance Tuning and Troubleshooting
April 28, 2008LCI Tutorial33 Introduction HDF5 tools may be very useful for performance tuning and troubleshooting Discover objects and their properties in HDF5 files h5dump -p Get file size overhead information h5stat Get locations of the objects in a file h5ls Discover differences h5diff, h5ls Location of raw data h5ls –var
April 28, 2008LCI Tutorial34 h5stat Prints different statistics about HDF5 file Helps To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies To use h5stat --help h5stat file.h5 Full spec can be found Let us know if you need some “special” type of statistics
April 28, 2008LCI Tutorial35 h5stat Reports two types of statistics: High-level information about objects (examples): Number of different objects (groups, datasets, datatypes) in a file Number of unique datatypes Size of raw data in a file Information about object’s structural metadata Sizes of structural metadata (total/free) Object headers, local and global heaps Sizes of B-trees Object headers fragmentation
April 28, 2008LCI Tutorial36 h5stat Examples of high-level information: File information # of unique groups: # of unique datasets: 30 # of unique named datatypes: 0 …………………… Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19 …………………… Group bins: # of groups of size 0: # of groups of size 1 - 9: 7 # of groups of size : 1 …………………… Max. dimension size of 1-D datasets: 1643 …………………… Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1
April 28, 2008LCI Tutorial37 h5stat Conclusion: There are a lot of empty groups in the file; good candidate for compact group feature (h5repack -l ….) Some datasets use “user-defined” filters and may not be readable by HDF5 library SZIP compression is needed to read some datasets Oh… my application uses buffers of size 1024 to read data… No wonder it crashes on reading… Do I have all filters needed to read the data?
April 28, 2008LCI Tutorial38 h5stat Examples of structural metadata information: Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832 ……… Dataset storage information: Total raw data size: ……… Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535) Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000)
April 28, 2008LCI Tutorial39 Conclusions File size: % overhead (not bad at all!) There some elements of size and Oh… Is it really what I want? Should I use other datatype and get advantage of compression? h5stat
April 28, 2008LCI Tutorial40 Case study: Using HDF5tools to debug a problem My application creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong? h5diff good.h5 bad.h5 Datatype: and 1 differences found h5ls –var good.h5 /Definitions/timespec Type Location: 0:1:0:900 h5debug good.h5 900 Message Information: Type class: compound Size: 8 bytes h5debug bad.h5 900 Message Information: Type class: compound Size: 16 bytes
April 28, 2008LCI Tutorial41 Conclusions Compound datatype “timespec” requires different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes) Oh… How do I read my data back? I assumed that my struct would need only 8 bytes for each element but it needs 16 bytes on VS2005. I need H5Tget_native_type function to find the type of my data in memory Case study: Using HDF5tools to debug a problem
April 28, 2008LCI Tutorial42 Questions? End of Part II