The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, HDF/HDF-EOS Workshop XIII
Before We Begin … HDF-EOS Home Page: Workshop Info: The HDF Group Page: HDF5 Home Page: HDF Helpdesk: HDF Mailing Lists: November 3-5, 20092HDF/HDF-EOS Workshop XIII
HDF5 is the second HDF format Development started in 1996 First release was in 1998 HDF4 is the first HDF format Originally called HDF Development started in 1987 Still supported by The HDF Group HDF = Hierarchical Data Format November 3-5, 20093HDF/HDF-EOS Workshop XIII
HDF5 is like… 5 November 3-5, 20094HDF/HDF-EOS Workshop XIII
HDF5 is designed … for high volume and/or complex data for every size and type of system (portable) for flexible, efficient storage and I/O to enable applications to evolve in their use of HDF5 and to accommodate new models to support long-term data preservation November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Technology HDF5 is a data model, library and file format for managing data. November 3-5, 2009HDF/HDF-EOS Workshop XIII6
HDF5 Technology HDF5 (Abstract) Data Model Defines the “building blocks” for data organization and specification Files, Groups, Datasets, Attributes, Datatypes, Dataspaces, … HDF5 Library (C, Fortran 90, C++ APIs) Also Java Language Interface and High Level Libraries HDF5 Binary File Format Bit-level organization of HDF5 file Defined by HDF5 File Format Specification Tools For Accessing Data in HDF5 Format h5dump, h5repack, HDFView, … November 3-5, HDF/HDF-EOS Workshop XIII
The HDF Group HDF5 Abstract Data Model a.k.a. HDF5 Logical Data Model a.k.a. HDF5 Data Model November 3-5, 20098HDF/HDF-EOS Workshop XIII
HDF5 File lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6 An HDF5 file is a container that holds data objects. Experiment Notes: Serial Number: Date: 3/13/09 Configuration: Standard 3 November 3-5, 20099HDF/HDF-EOS Workshop XIII
HDF5 Groups and Links lat | lon | temp ----|-----| | 23 | | 23 | | 24 | | 24 | | 21 | | 21 | 3.6 Experiment Notes: Serial Number: Date: 3/13/09 Configuration: Standard 3 / SimOut Viz HDF5 groups and links organize data objects. November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Objects HDF5 Group: A grouping structure containing zero or more HDF5 objects HDF5 Dataset: Raw data elements, together with information that describes them (There are other HDF5 objects that help support Groups and Datasets.) The two primary HDF5 objects are: November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Groups “/” A B C k l temp Used to organize collections Every file starts with a root group Similar to UNIX directories Path to object defines it Objects can be shared: /A/k and /B/l are the same = Group = Dataset November 3-5, HDF/HDF-EOS Workshop XIII temp
HDF5 Datasets HDF5 Datasets organize and contain your “raw data values”. They consist of: Your raw data Metadata describing the data: - The information to interpret the data (Datatype) - The information to describe the logical layout of the data elements (Dataspace) - Characteristics of the data (Properties) - Additional optional information that describes the data (Attributes) November 3-5, 2009HDF/HDF-EOS Workshop XIII13
HDF5 Dataset Data Metadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 (optional)Attributes Chunked Compressed Dim_3 = 7 Properties Integer Datatype November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Dataspaces An HDF5 Dataspace describes the logical layout for the data elements: Array multiple elements in dataset organized in a multi-dimensional (rectangular) array maximum number of elements in each dimension may be fixed or unlimited NULL no elements in dataset Scalar single element in dataset November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Dataspaces Two roles: Dataspace contains spatial information (logical layout) about a dataset stored in a file Rank and dimensions Permanent part of dataset definition Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimension = 10 November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Datatypes The HDF5 datatype describes how to interpret individual data elements. HDF5 datatypes include: −integer, float, unsigned, bitfield, … −user-definable (e.g., 13-bit integer) −variable length types (e.g., strings) −references to objects/dataset regions −enumerations - names mapped to integers −opaque −compound (similar to C structs) November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Dataset with Compound Datatype int8int4int16 2x3x2 array of float32 Compound Datatype: Dataspace: Rank = 2 Dimensions = 5 x VVV V V V November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Dataset Dataspace: Rank = 2 Dimensions = 5 x 3 November 3-5, HDF/HDF-EOS Workshop XIII Datatype: 16-byte integer 3 5 V
HDF5 Properties Properties (also known as Property Lists) are characteristics of HDF5 objects that can be modified Default properties handle most needs By changing properties one can take advantage of the more powerful features in HDF5 November 3-5, HDF/HDF-EOS Workshop XIII
Storage Properties November 3-5, 2009HDF/HDF-EOS Workshop XIII21 Chunked Chunked & Compressed Better access time for subsets; extensible Improves storage efficiency, transmission speed Contiguous(default) Data elements stored physically adjacent to each other
HDF5 Attributes (optional) An HDF5 attribute has a name and a value Attributes typically contain user metadata Attributes may be associated with - HDF5 groups - HDF5 datasets - HDF5 named datatypes An attribute’s value is described by a datatype and a dataspace Attributes are analogous to datasets except… - they are NOT extensible - they do NOT support compression or partial I/O November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Abstract Data Model Summary The Objects in the Data Model are the “building blocks” for data organization and specification Files, Groups, Links, Datasets, Datatypes, Dataspaces, Attributes, … Projects using HDF5 “map” their data concepts to these HDF5 Objects November 3-5, HDF/HDF-EOS Workshop XIII
The HDF Group HDF5 Software November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Software Layers & Storage HDF5 File Format File Split Files File on Parallel Filesystem Other I/O Drivers Virtual File Layer Posix I/O Split Files MPI I/O Custom Internals Memory Mgmt Datatype Conversion Filters Chunked Storage Version Compatibility and so on… Language Interfaces C, Fortran, C++ HDF5 Data Model Objects Groups, Datasets, Attributes, … Tunable Properties Chunk Size, I/O Driver, … HDF5 Library Storage h5dump tool High Level APIs HDFview tool Tools h5repack tool Java Interface … API November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 API and Applications … Storage Domain Data Objects EOS library Applications aClimate Model MATLAB November 3-5, HDF/HDF-EOS Workshop XIII HDF5 Library
HDF5 Home Page HDF5 home page: Two releases: HDF5 1.8 and HDF5 1.6 HDF5 source code: Written in C, and includes optional C++, Fortran 90 APIs, and High Level APIs Contains command-line utilities (h5dump, h5repack, h5diff,..) and compile scripts HDF pre-built binaries: When possible, include C, C++, F90, and High Level libraries. Check./lib/libhdf5.settings file. Built with and require the SZIP and ZLIB external libraries November 3-5, HDF/HDF-EOS Workshop XIII
Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files November 3-5, HDF/HDF-EOS Workshop XIII
h5dump Utility h5dump [options] [file] -H, --header Display header only – no data -d Display the specified dataset(s). -g Display the specified group(s) and all members. -p Display properties. is one or more appropriate object names. November 3-5, HDF/HDF-EOS Workshop XIII
Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } “/” ‘dset’ November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Compile Scripts h5cc – HDF5 C compiler command h5fc – HDF5 F90 compiler command h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 November 3-5, HDF/HDF-EOS Workshop XIII
Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c Will show the correct paths and libraries used by the installed HDF5 library. Will show the correct flags to specify when building an application with that HDF5 library. November 3-5, HDF/HDF-EOS Workshop XIII
The HDF Group Browsing HDF5 Files with HDFView November 3-5, HDF/HDF-EOS Workshop XIII
HDFView Structure of FileContents of Dataset November 3-5, HDF/HDF-EOS Workshop XIII
HDFView File Menu November 3-5, HDF/HDF-EOS Workshop XIII
HDF-EOS5 File in HDFView November 3-5, HDF/HDF-EOS Workshop XIII
Introduction to HDF5 Programming Model and APIs November 3-5, HDF/HDF-EOS Workshop XIII
Operations Supported by the API Create objects (groups, datasets, attributes, complex data types, …) Assign storage and I/O properties to objects Perform complex subsetting during read/write Use variety of I/O “devices” (parallel, remote, etc.) Transform data during I/O Make inquiries on file and object structure, content, properties November 3-5, HDF/HDF-EOS Workshop XIII
General Programming Paradigm Properties of object are optionally defined Creation properties Access properties Object is opened or created Object is accessed, possibly many times Object is closed November 3-5, HDF/HDF-EOS Workshop XIII
Order of Operations An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -because- the dataset open call requires a file handle as an argument. Objects can be closed in any order. November 3-5, HDF/HDF-EOS Workshop XIII
The General HDF5 API Currently C, Fortran 90, Java, and C++ bindings. C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D :Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application. November 3-5, HDF/HDF-EOS Workshop XIII
The HDF5 API For flexibility, the API is extensive 300+ functions This can be daunting… but there is hope A few functions can do a lot Start simple Build up knowledge as more features are needed Victronix Swiss Army Cybertool 34 November 3-5, HDF/HDF-EOS Workshop XIII
Basic Functions H5Fcreate (H5Fopen) create (open) File H5Screate_simple/H5Screatecreate dataSpace H5Dcreate (H5Dopen)create (open) Dataset H5Dread, H5Dwriteaccess Dataset H5Dcloseclose Dataset H5Sclose close dataSpace H5Fcloseclose File November 3-5, HDF/HDF-EOS Workshop XIII NOTE: The order specified above is not required.
Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) H5Dget_space Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate November 3-5, HDF/HDF-EOS Workshop XIII
High Level APIs Included along with the HDF5 library Simplify steps for creating, writing, and reading objects. Do not entirely ‘wrap’ HDF5 library November 3-5, HDF/HDF-EOS Workshop XIII
The HDF Group Example HDF5 Code November 3-5, HDF/HDF-EOS Workshop XIII
Steps to Create a File 1.Decide on properties the file should have and create them if necessary: Creation properties, like size of user block Access properties (improve performance) Use default properties (H5P_DEFAULT) 2. Create the file 3. Close the file and the property lists, as needed November 3-5, HDF/HDF-EOS Workshop XIII
Code: Create a File hid_t file_id; herr_t status; file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); Note: Return codes not checked for errors in code samples. “/” (root) November 3-5, HDF/HDF-EOS Workshop XIII
Dataset Components DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Chunked Compressed Dim_3 = 7 Properties Integer Datatype November 3-5, HDF/HDF-EOS Workshop XIII
Steps to Create a Dataset 1.Define dataset characteristics a) Datatype – integer b) Dataspace - 4x6 c) Properties if needed, or use H5P_DEFAULT 2.Decide where to put it Obtain location ID: -Group ID puts it in a Group -File ID puts it in Root Group 3.Create dataset in file 4.Close everything A “/” (root) November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Pre-defined Datatype Identifiers HDF5 defines* set of Datatype Identifiers per HDF5 session. For example: C Type HDF5 File TypeHDF5 Memory Type intH5T_STD_I32BE H5T_NATIVE_INT H5T_STD_I32LE floatH5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE doubleH5T_IEEE_F64BE H5T_NATIVE_DOUBLE H5T_IEEE_F64LE * Value of datatype is NOT fixed November 3-5, HDF/HDF-EOS Workshop XIII
Pre-defined File Datatype Identifiers Examples: H5T_IEEE_F64LEEight-byte, little-endian, IEEE floating-point H5T_STD_I32LEFour-byte, little-endian, signed two's complement integer NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” Architecture* Programming Type November 3-5, HDF/HDF-EOS Workshop XIII
Pre-defined Native Datatypes Examples of predefined native types in C: H5T_NATIVE_INT (int) H5T_NATIVE_FLOAT (float ) H5T_NATIVE_UINT (unsigned int) H5T_NATIVE_LONG (long ) H5T_NATIVE_CHAR (char ) NOTE: Memory types. Different for each machine. Used for reading/writing. November 3-5, HDF/HDF-EOS Workshop XIII
Storage Properties November 3-5, 2009HDF/HDF-EOS Workshop XIII55 Chunked Chunked & Compressed Better access time for subsets; extensible Improves storage efficiency, transmission speed Contiguous(default) Data elements stored physically adjacent to each other
Dataset Creation Property List November 3-5, 2009HDF/HDF-EOS Workshop XIII56 Chunked Chunked & Compressed Better access time for subsets; extensible Improves storage efficiency, transmission speed H5P_DEFAULT: contiguous Dataset creation property list: information on how to organize data in storage.
Link Creation/Dataset Access Properties Link Creation: Creating intermediate groups Dataset Access: Retrieve the raw data chunk cache parameters November 3-5, 2009HDF/HDF-EOS Workshop XIII57
Code: Create a Dataset 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2];. herr_t status;.. file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,. H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Define a dataspace rank current dims November 3-5, HDF/HDF-EOS Workshop XIII
Code: Create a Dataset 1 hid_t file_id, dataset_id, dataspace_id;. hsize_t dims[2];. herr_t status;.. file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);. dims[0] = 4;. dims[1] = 6;. dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT,H5P_DEFAULT, H5P_DEFAULT); Datatype Properties (Link Creation, Dataset Creation and Access) Where to put it Size & shape November 3-5, HDF/HDF-EOS Workshop XIII
Code: Create a Dataset 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Terminate access to dataspace, dataset, file November 3-5, HDF/HDF-EOS Workshop XIII
Example Code - H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,H5S_ALL, H5P_DEFAULT, wdata); Dataset ID from H5Dcreate/H5Dopen Memory Datatype November 3-5, HDF/HDF-EOS Workshop XIII
Partial I/O File Dataspace (disk) H5S_ALL To Modify Dataspace: H5Sselect_hyperslab H5Sselect_elements November 3-5, HDF/HDF-EOS Workshop XIII Memory Dataspace status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,wdata);
Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, wdata); Data Transfer Property List (MPI I/O, Transformations,…) November 3-5, HDF/HDF-EOS Workshop XIII
Example Code – H5Dread status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, rdata); November 3-5, HDF/HDF-EOS Workshop XIII
High Level APIs: HDF5 Lite (H5LT) #include “hdf5_hl.h“.. file_id = H5Fcreate(“file.h5",H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5LTmake_dataset (file_id,“A",2,dims, H5T_STD_I32BE, data); status = H5Fclose (file_id); November 3-5, HDF/HDF-EOS Workshop XIII
High Level APIs HDF5 Lite HDF5 Image HDF5 Table HDF5 Dimension Scales HDF5 Packet Table November 3-5, HDF/HDF-EOS Workshop XIII
Steps to Create a Group 1.Decide where to put it – “root group” Obtain location ID 2.Define properties or use H5P_DEFAULT 3.Create group in file. 4. Close the group. November 3-5, HDF/HDF-EOS Workshop XIII
Example: Create a Group A B “/” (root) 4x6 array of integers file.h5 November 3-5, HDF/HDF-EOS Workshop XIII
Group Properties Link Creation Creating intermediate groups Group Creation Creation order tracking and indexing for links in a group. Set Number of links and length of link names in a group. Group Access (not used) November 3-5, 2009HDF/HDF-EOS Workshop XIII69
Code: Create a Group hid_t file_id, group_id;... /* Open “file.h5” */ file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate (file_id,"B", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); /* Close group and file. */ status = H5Gclose (group_id); status = H5Fclose (file_id); November 3-5, HDF/HDF-EOS Workshop XIII
HDF5 Tutorial and Examples HDF5 Tutorial: HDF5 Example Code: November 3-5, HDF/HDF-EOS Workshop XIII
The HDF Group Thank You! November 3-5, HDF/HDF-EOS Workshop XIII
Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. November 3-5, 2009HDF/HDF-EOS Workshop XIII73
The HDF Group Questions/comments? November 3-5, HDF/HDF-EOS Workshop XIII