10/15/08HDF & HDF-EOS Workshop XII11 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008
10/15/08HDF & HDF-EOS Workshop XII22 Topics Covered - Introduce HDF5 - Describe HDF5 Data and Programming Models - Walk Through Example Code
10/15/08HDF & HDF-EOS Workshop XII3 For More Information … All workshop slides will be available from:
10/15/08HDF & HDF-EOS Workshop XII4 What is HDF5? HDF = H ierarchical D ata F ormat Data model, library and file format for managing data Tools for accessing data in the HDF5 format
10/15/08HDF & HDF-EOS Workshop XII5 Brief History of HDF 1987At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation
10/15/08HDF & HDF-EOS Workshop XII66 Why HDF5? In one sentence...
10/15/08HDF & HDF-EOS Workshop XII77 Matter and the universe Weather and climate August 24, 2001August 24, 2002 Total Column Ozone (Dobson) Life and nature Answering big questions …
10/15/08HDF & HDF-EOS Workshop XII88 … involves big data …
10/15/08HDF & HDF-EOS Workshop XII9 LCI Tutorial 9 … varied data … Thanks to Mark Miller, LLNL
10/15/08HDF & HDF-EOS Workshop XII10 … and complex relationships … Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match Trace SNP Score
10/15/08HDF & HDF-EOS Workshop XII11 … on big computers … … and small computers …
10/15/08HDF & HDF-EOS Workshop XII12 How do we… Describe our data? Read it? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers and repositories? Achieve storage and I/O efficiency? Give applications and tools easy access our data?
10/15/08HDF & HDF-EOS Workshop XII13 Solution: HDF5! Can store all kinds of data in a variety of ways Runs on most systems Lots of tools to access data Emphasis on standards (HDF-EOS, CGNS) Library and format emphasis on I/O efficiency and storage
10/15/08HDF & HDF-EOS Workshop XII14 File or other “storage” Virtual file I/O Library internals Structure of HDF5 Library Object API (C, F90, C++, Java) Applications
10/15/08HDF & HDF-EOS Workshop XII15 HDF Tools - HDFView and Java Products - Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack)
10/15/08HDF & HDF-EOS Workshop XII16 HDF5 Applications & Domains Simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models HDF-EOS CGNS ASC Storage File on parallel file system File Split metadata and raw data files User-defined device ? HDF5 format Storage File on parallel file system File Split metadata and raw data files User-defined device ? HDF5 format HDF5 Data Model & API StdioCustomSplit Files MPI I/O Communities Virtual File Layer (I/O Drivers)
10/15/08HDF & HDF-EOS Workshop XII17 Lots of Layers in HDF5! “Ogres are like onions.” Shrek HDF5 Monster?? Just like Shrek, once you get to know HDF5 you will really like it!!
10/15/08HDF & HDF-EOS Workshop XII18 The HDF5 Format
10/15/08HDF & HDF-EOS Workshop XII19 An HDF5 file is a container… lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6 palette …into which you can put your data objects.
10/15/08HDF & HDF-EOS Workshop XII20 HDF5 Structures for Organizing Objectspalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----| | 23 | | 23 | | 24 | | 24 | | 21 | | 21 | 3.6Table “/” (root) “foo”
10/15/08HDF & HDF-EOS Workshop XII21 HDF5 Data Model Primary Objects Groups Datasets Additional ways to organize and annotate data Attributes Storage and access properties Everything else is built from these parts.
10/15/08HDF & HDF-EOS Workshop XII22 HDF5 Dataset DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage Info Integer Datatype
10/15/08HDF & HDF-EOS Workshop XII23 Dataspaces Two roles: Dataspace contains spatial info about a dataset stored in a file Rank and dimensions Permanent part of dataset definition Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimension = 10
10/15/08HDF & HDF-EOS Workshop XII24 Write – from memory to disk memorydisk
10/15/08HDF & HDF-EOS Workshop XII25 Partial I/O (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array memorydisk (a) Slab from a 2D array to the corner of a smaller 2D array memory disk Move just part of a dataset Elements in each must be same.
10/15/08HDF & HDF-EOS Workshop XII26 Datatypes (array elements) Datatype – how to interpret a data element Permanent part of the dataset definition Two classes: atomic and compound
10/15/08HDF & HDF-EOS Workshop XII27 Datatypes HDF5 atomic types include: integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers HDF5 compound types Comparable to C structs (“records”) Members can be atomic or compound types
10/15/08HDF & HDF-EOS Workshop XII28 Record int8int4int16 2x3x2 array of float32 Datatype: HDF5 dataset: array of records Dimensionality: 5 x 3 3 5
10/15/08HDF & HDF-EOS Workshop XII29 Properties Properties are characteristics of HDF5 objects that can be modified Default properties handle most needs By changing properties can take advantage of the more powerful features in HDF5
10/15/08HDF & HDF-EOS Workshop XII30 Special Storage Properties Better subsetting access time; extensible chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extensible Metadata for Fred Dataset “Fred” File A File B Data for Fred Metadata in one file, raw data in another split file
10/15/08HDF & HDF-EOS Workshop XII31 Attributes (optional) Attribute – data of the form “name = value”, attached to an object Operations similar to dataset operations, but … Not extensible No compression or partial I/O Can be overwritten, deleted, added during the “life” of a dataset
10/15/08HDF & HDF-EOS Workshop XII32 HDF5 Dataset (again) DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info Integer Datatype
10/15/08HDF & HDF-EOS Workshop XII33 Groups “/” A B C k l m A mechanism for organizing collections Every file starts with a root group Similar to UNIX directories Can have attributes
10/15/08HDF & HDF-EOS Workshop XII34 “/” x temp / (root) /x /foo /foo/temp /foo/bar/temp Path to HDF5 Object in a File foo bar
10/15/08HDF & HDF-EOS Workshop XII35 Shared Objects /A/P /B/R /C/P “/” A B C P R P
10/15/08HDF & HDF-EOS Workshop XII36 Questions So Far?
10/15/08HDF & HDF-EOS Workshop XII37 Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files
10/15/08HDF & HDF-EOS Workshop XII38 H5dump Command-line Utility To View HDF5 File h5dump [--header] [-a ] [-d ] [-g ] [-l ] [-t ] [-p] --header Display header only; no data is displayed. -a Display the specified attribute(s). -d Display the specified dataset(s). -g Display the specified group(s) and all the members. -l Displays the value(s) of the specified soft link(s). -t Display the specified named datatype(s). -p Display properties. is one or more appropriate object names.
10/15/08HDF & HDF-EOS Workshop XII39 HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } “/” Example of h5dump Output ‘dset’
10/15/08HDF & HDF-EOS Workshop XII40 HDF5 Compile Scripts h5cc – HDF5 C compiler command h5fc – HDF5 F90 compiler command h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90
10/15/08HDF & HDF-EOS Workshop XII41 Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
10/15/08HDF & HDF-EOS Workshop XII42 Browsing HDF5 Files with HDFView
10/15/08HDF & HDF-EOS Workshop XII43 HDFView Structure of FileContents of Dataset
10/15/08HDF & HDF-EOS Workshop XII44 HDFView File Menu
10/15/08HDF & HDF-EOS Workshop XII45
10/15/08HDF & HDF-EOS Workshop XII46 Simple HDF5 File in HDFView Right-click and select “Open” with mouse Right-click and select “Show Properties” with mouse
10/15/08HDF & HDF-EOS Workshop XII47 Simple HDF5 File in HDFView
10/15/08HDF & HDF-EOS Workshop XII48 HDF-EOS5 File in HDFView
10/15/08HDF & HDF-EOS Workshop XII49 Right-click and select “Open As” with mouse
10/15/08HDF & HDF-EOS Workshop XII50 What you can’t see with slides: -Picture displayed instantly -File size is 906,229,176
10/15/08HDF & HDF-EOS Workshop XII51 Introduction to HDF5 Programming Model and APIs
10/15/08HDF & HDF-EOS Workshop XII52 Operations Supported by the API Create objects (groups, datasets, attributes, complex data types, …) Assign storage and I/O properties to objects Perform complex subsetting during read/write Use variety of I/O “devices” (parallel, remote, etc.) Transform data during I/O Make inquiries on file and object structure, content, properties
10/15/08HDF & HDF-EOS Workshop XII53 General Programming Paradigm Properties of object are optionally defined Creation properties Access property lists Object is opened or created Object is accessed, possibly many times Object is closed
10/15/08HDF & HDF-EOS Workshop XII54 Order of Operations An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -because- the dataset open call requires a file handle as an argument. Objects can be closed in any order.
10/15/08HDF & HDF-EOS Workshop XII55 The General HDF5 API Currently C, Fortran 90, Java, and C++ bindings. C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D :Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose
10/15/08HDF & HDF-EOS Workshop XII56 HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) hssize_t: for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application.
10/15/08HDF & HDF-EOS Workshop XII57 The HDF5 API For flexibility, the API is extensive 300+ functions This can be daunting… but there is hope A few functions can do a lot Start simple Build up knowledge as more features are needed Victronix Swiss Army Cybertool 34
10/15/08HDF & HDF-EOS Workshop XII58 Basic Functions H5Fcreate (H5Fopen) create (open) File H5Screate_simplecreate dataSpace H5Dcreate (H5Dopen)create (open) Dataset H5Dread, H5Dwriteaccess Dataset H5Dcloseclose Dataset H5Sclose close dataSpace H5Fcloseclose File
10/15/08HDF & HDF-EOS Workshop XII59 Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate
10/15/08HDF & HDF-EOS Workshop XII60 High Level APIs Included along with the HDF5 library Simplify steps for creating, writing, and reading objects Do not entirely ‘wrap’ HDF5 library
10/15/08HDF & HDF-EOS Workshop XII61 Example HDF5 Code
10/15/08HDF & HDF-EOS Workshop XII62 Steps to Create a File 1.Decide on special properties the file should have Creation properties, like size of user block Access properties, such as metadata cache size Use default properties (H5P_DEFAULT) 2.Create property lists, if necessary 3.Create the file 4.Close the file and the property lists, as needed
10/15/08HDF & HDF-EOS Workshop XII63 Code: Create a File hid_t file_id; herr_t status; file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); Note: Return codes not checked for errors in code samples. “/” (root)
10/15/08HDF & HDF-EOS Workshop XII64 Dataset Components DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info Integer Datatype
10/15/08HDF & HDF-EOS Workshop XII65 3-D array of floats Steps to Create a Dataset 1.Define dataset characteristics Dataspace - 4x6 Datatype – integer Properties if needed, or use H5P_DEFAULT 2.Decide where to put it Obtain location ID: -Group ID puts it in a Group -File ID puts it in Root Group 3.Create dataset in file 4.Close everything A “/” (root)
10/15/08HDF & HDF-EOS Workshop XII66 HDF5 Pre-defined Datatype Identifiers HDF5 defines* set of Datatype Identifiers per HDF5 session. For example: C Type HDF5 File TypeHDF5 Memory Type intH5T_STD_I32BE H5T_NATIVE_INT H5T_STD_I32LE floatH5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE doubleH5T_IEEE_F64BE H5T_NATIVE_DOUBLE H5T_IEEE_F64LE * Value of datatype is NOT fixed
10/15/08HDF & HDF-EOS Workshop XII67 Pre-defined File Datatype Identifiers Examples: H5T_IEEE_F64LEEight-byte, little-endian, IEEE floating-point H5T_STD_I32LEFour-byte, little-endian, signed two's complement integer NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” Architecture* Programming Type
10/15/08HDF & HDF-EOS Workshop XII68 Pre-defined Native Datatypes Examples of predefined native types in C: H5T_NATIVE_INT (int) H5T_NATIVE_FLOAT (float ) H5T_NATIVE_UINT (unsigned int) H5T_NATIVE_LONG (long ) H5T_NATIVE_CHAR (char ) NOTE: Memory types. Different for each machine. Used for reading/writing.
10/15/08HDF & HDF-EOS Workshop XII69 Dataset Creation Property List Dataset creation property list: information on how to organize data in storage. Chunked Chunked & compressed H5P_DEFAULT: contiguous
10/15/08HDF & HDF-EOS Workshop XII70 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Code: Create a Dataset Terminate access to dataset, dataspace, file Create a dataspace rank current dims Create a dataset dataspace datatype property list (default) pathname
10/15/08HDF & HDF-EOS Workshop XII71 Example Code - H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Memory Datatype Dataset Identifier from H5Dcreate or H5Dopen
10/15/08HDF & HDF-EOS Workshop XII72 Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Memory Dataspace File Dataspace Data Transfer Property List (MPI I/O, Transformations, …) H5S_ALL selects entire dataspace
10/15/08HDF & HDF-EOS Workshop XII73 Partial I/O Memory Dataspace File Dataspace (disk) H5S_ALL Get a Dataspace: H5Screate_simple H5Dget_space Modify Dataspace: H5Sselect_hyperslab H5Sselect_elements
10/15/08HDF & HDF-EOS Workshop XII74 Example Code – H5Dread status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);
10/15/08HDF & HDF-EOS Workshop XII75 High Level APIs: HDF5 Lite (H5LT) #include "H5LT.h" … file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5LTmake_dataset (file_id,“A", 2, dims, H5T_STD_I32BE, data); status = H5Fclose (file_id);
10/15/08HDF & HDF-EOS Workshop XII76 High Level APIs HDF5 Lite HDF5 Image HDF5 Table HDF5 Dimension Scales HDF5 Packet Table
10/15/08HDF & HDF-EOS Workshop XII77 A B “/” (root) Example: Create a Group 4x6 array of integers file.h5
10/15/08HDF & HDF-EOS Workshop XII78 Steps to Create a Group 1.Decide where to put it – “root group” Obtain location ID 2.Decide name – “B” 3.Create group in file 4.(Eventually) close the group.
10/15/08HDF & HDF-EOS Workshop XII79 Code: Create a Group hid_t file_id, group_id;... /* Open “file.h5” */ file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate (file_id,"B",0); /* Close group and file. */ status = H5Gclose (group_id); status = H5Fclose (file_id); Size hint for number of bytes to store names of objects. 0=default
10/15/08HDF & HDF-EOS Workshop XII80 Thank you! This work was supported by the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNX06AC83A and NNX08A077A. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.