HDF5
Overview What is HDF5 Why use HDF5 Example of HDF5
HDF5? Hierarchical Data Format Versatile, completely portable & no size limit Official support: C, C++, Fortran & Java Third-party support: Python (h5py), MATLAB, R and IDL Free!!!
Hierarchical Data Format? Data is stored like files on a linux system Inside the file three basic types are used to organize and store the data
Hierarchical Data Format? Data is stored like files on a unix system Opening a file puts you at the root directory
Hierarchical Data Format? Data is stored like files on a unix system Groups are like directories and can be used to collect related information
Hierarchical Data Format? Data is stored like files on a unix system Datasets are files in our system and store the vast majority of the data
Hierarchical Data Format? Data is stored like files on a unix system Attributes store individual pieces of information
What was wrong with binary? Binary data: + efficient storage - lacks portability (endianness, IDL!) - you need to know what is in it to read it - not human readable - read the whole file to read the last value
What was wrong with ASCII? Binary data: + efficient storage - lacks portability (endianness, IDL!) - you need to know what is in it to read it - not human readable - read the whole file to read the last value ASCII data: + human readable - poor storage efficiency
What's the catch? HDF5: + efficient binary storage + portable format + printable structure + read any attribute or dataset independently + human readable output - small overhead, a download, some learning
HDF5 example (Python) – Open file Creating an HDF5 file, creating the structure, looking at the file created. First create a file In Python everything now works off the file object, f >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w")
HDF5 ex. – Group creation Groups are explicitly created via create_group >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w") >>> grp = f.create_group("myfirstgroup”)
HDF5 ex. – Dataset creation Datasets can then be stored in the group >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w") >>> grp = f.create_group("myfirstgroup") >>> dset1 = grp.create_dataset("myfirstdataset",(50,), dtype=‘i’)
HDF5 ex. – More creation Groups do not need to be explicitly created: >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w") >>> grp = f.create_group("myfirstgroup") >>> dset1 = grp.create_dataset("myfirstdataset",(50,), dtype=‘i’) >>> dset2 = f.create_dataset("grp2/dataset2",(50,), dtype=‘f’) "
HDF5 ex. – Attribute creation Attributes work in a similar way: >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w") >>> grp = f.create_group("myfirstgroup") >>> dset1 = grp.create_dataset("myfirstdataset",(50,), dtype=‘i’) >>> dset2 = f.create_dataset("grp2/dataset2",(50,), dtype=‘f’) >>> att1 = dset1.attrs[‘Number’] = 50 "
HDF5 ex. – Simple File >>> import h5py >>> import numpy as np >>> >>> f = h5py.File("mytestfile.hdf5", "w") >>> grp = f.create_group("myfirstgroup") >>> dset1 = grp.create_dataset("myfirstdataset",(50,), dtype=‘i’) >>> dset2 = f.create_dataset("grp2/dataset2",(50,), dtype=‘f’) >>> att1 = dset1.attrs[‘Number’] = 50 >>> f.close() "
h5dump – Viewing your file allows you to look at your file View the file structure: > h5dump –n mytestfile.hdf5 Look at a dataset: > h5dump –d /grp2/dataset2 mytestfile.hdf5 There are many more uses of h5dump . . .