Using Compression filters in HDF5 HDF5s` new external filter interface in action Euge Wintersberger ICALEPCS 2017, 8.10.2017
Nature of the data passed to the algorithm! Motivation Applying different compression algorithms to individual datasets is one of the key features of HDF5. Apply compression only where feasible Other data can be read and written without any performance penalty We can pick the optimum algorithm for each dataset Performance key figures for a compression algorithm: Throughput (Mbyte/sec) Compression ratio depend on Nature of the data passed to the algorithm!
The situation before HDF5 1.8.11 Could use custom filter algorithms for reading and writing #define H5Z_FILTER_BZIP2 305 /* declare a filter function */ size_t H5Z_filter_bzip2(unsigned flags, size_t cd_nelmts, const unsigned cd_values[], size_t nbytes, size_t *buf_size,void**buf); const H5Z_class2_t H5Z_BZIP2[1] = {{ H5Z_CLASS_T_VERS, /* H5Z_class_t version */ /* Filter id number */ (H5Z_filter_t)H5Z_FILTER_BZIP2, 1,/* encoder_present flag (set to true) */ 1,/* decoder_present flag (set to true) */ "bzip2",/* Filter name for debugging */ NULL, /* The "can apply" callback */ NULL, /* The "set local" callback */ /* The actual filter function */ (H5Z_func_t)H5Z_filter_bzip2, }}; /* somewhere in the code */ status = H5Zregister(H5Z_BZIP2); Two issues Need to change sourcecode Not possible for commercial applications! Currently used Eiger detector PyTables h5py
New approach since HDF5 1.8.12 HDF5_PLUGIN_PATH=... Application libLZ4.so FilterID HDF5 library libbitshuffle.so libBZ2.so The library looks for the appropriate filter by itself using the ID of the filter!
Where to get the filter plugins? Supported platforms Windows Linux macOS
Installing the filters – on Windows
Install the filters – on Linux (Debian) Add repository key and sources list $ wget -q -O - http://repos.pni-hdri.de/debian_repo.pub.gpg | apt-key add - $ cd /etc/apt/sources.d $ wget http://repos.pni-hdri.de/jessie-pni-hdri.list Install the package $ apt-get update $ apt-get install hdf5-plugin-lz4
Install the filters – on Linux (Ubuntu) Supported versions Ubuntu 14.04 (Trusty Tahr) Ubuntu 16.04 (Xenial Xerus)
Install the filters – on macOS Installing the dependencies $ brew install cmake $ brew install git $ brew install hdf5 $ brew install lz4 $ git clone https://github.com/nexusformat/HDF5-External-Filter-Plugins.git $ cd HDF5-External-Filter-Plugins $ git checkout new_build $ cmake -DENABLE_LZ4_PLUGIN=ON -DENABLE_BITSHUFFLE_PLUGIN=ON \ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/opt/hdf5 $ make $ make test $ make install Build the code Make installation available
Using the filter plugins (from Python) Reading – there is nothing you have to do Writing import h5py f = h5py.File("bitshuffle_file.h5","w") filter_id = 32008 d1 = f.create_dataset("with_lz4",(100,100),compression=filter_id, compression_opts=(0,2)) d2 = f.create_dataset("without_lz4",(100,100),compression=filter_id) No additional packages must be imported You need to know The filters ID The compression options accepted by the filter
Current status Included filters: BZIP2 LZ4 LZ4+bitshuffle Installation packages for: Windows (VS2015), Linux (Debian, Ubuntu) Simplified build for Windows using Conan
Todos Create GitHub pages Update the documentation Review of the LZ4 API calls for the new LZ4 1.4 version BLOSC filter is still missing Installation packages for MacOS RPM based Linux distributions (RedHat, CentOS, …) Update Debian packages
Thank you for your attention! Questions?