Download presentation
Presentation is loading. Please wait.
Published bySamuel Burke Modified over 9 years ago
1
April 28, 2008LCI Tutorial1 HDF5 Tutorial LCI April 28, 2008
2
LCI Tutorial2 Outline Why HDF5? Introduction to HDF5 data and programming models HDF5 tools and utilities HDF5 advanced topics Introduction to parallel HDF5 HDF5 features that affect performance (or caching and buffering in HDF5)
3
April 28, 2008LCI Tutorial3 Why HDF5?
4
April 28, 2008LCI Tutorial4 Matter & the universe Weather and climate August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Life and nature Answering big questions …
5
April 28, 2008LCI Tutorial5 … involves big data …
6
April 28, 2008LCI Tutorial6 … varied data … Thanks to Mark Miller, LLNL
7
April 28, 2008LCI Tutorial7 … and complex relationships … Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match Trace SNP Score
8
April 28, 2008LCI Tutorial8 … on big computers …
9
April 28, 2008LCI Tutorial9 … and on little computers …
10
April 28, 2008LCI Tutorial10 How do we… Describe our data? Read it? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers and repositories? Achieve storage and I/O efficiency? Give applications and tools easy access our data?
11
April 28, 2008LCI Tutorial11 HDF started right here at NCSA
12
April 28, 2008LCI Tutorial12 HDF solution I/O software & tools Common Data models Standard APIs Scientific data file format Efficient storage, I/O
13
April 28, 2008LCI Tutorial13 The HDF5 Format
14
April 28, 2008LCI Tutorial14 An HDF5 file is a container… lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette …into which you can put your data objects.
15
April 28, 2008LCI Tutorial15 HDF5 structures for organizing objectspalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----|----- 12 | 23 | 3.1 12 | 23 | 3.1 15 | 24 | 4.2 15 | 24 | 4.2 17 | 21 | 3.6 17 | 21 | 3.6Table “/” (root) “/foo”
16
April 28, 2008LCI Tutorial16 Introduction to HDF5 Data and Programming Models Tutorial Part I
17
April 28, 2008LCI Tutorial17 Mesh Example, in HDFView
18
April 28, 2008LCI Tutorial18 HDF5 Data Model
19
April 28, 2008LCI Tutorial19 HDF5 data model HDF5 file – container for scientific data Primary Objects Groups Datasets Additional ways to organize data Attributes Sharable objects Storage and access properties Everything else is built from these parts.
20
April 28, 2008LCI Tutorial20 HDF5 Dataset DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype
21
April 28, 2008LCI Tutorial21 Dataspaces Two roles Dataspace contains spatial info about a dataset stored in a file Rank and dimensions Permanent part of dataset definition Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12
22
April 28, 2008LCI Tutorial22 Datatypes (array elements) Datatype – how to interpret a data element Permanent part of the dataset definition Two classes: atomic and compound
23
April 28, 2008LCI Tutorial23 Datatypes HDF5 atomic types normal integer & float user-definable (e.g. 13-bit integer) variable length types (e.g. strings) pointers - references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs Members can be atomic or compound types
24
April 28, 2008LCI Tutorial24 Record int8int4int16 2x3x2 array of float32 Datatype: HDF5 dataset: array of records Dimensionality: 5 x 3 3 5
25
April 28, 2008LCI Tutorial25 Attributes Attribute – data of the form “name = value”, attached to an object Operations scaled down versions of dataset operations Not extendible No compression No partial I/O Optional for the dataset definition Can be overwritten, deleted, added during the “life” of a dataset Size under 64K in releases before HDF5 1.8.0
26
April 28, 2008LCI Tutorial26 Groups A mechanism for collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes “/” A B C k l m
27
April 28, 2008LCI Tutorial27 “/” x temp / (root) /x /foo /foo/temp /foo/bar/temp Path to HDF5 object in a file foo bar
28
April 28, 2008LCI Tutorial28 Shared objects /A/P /B/R /C/P “/” A B C P R P
29
April 28, 2008LCI Tutorial29 Special Storage Options Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable Metadata for Fred Dataset “Fred” File A File B Data for Fred Metadata in one file, raw data in another split file
30
April 28, 2008LCI Tutorial30 HDF5 Software
31
April 28, 2008LCI Tutorial31 HDF5 software stack Tools & Applications HDF File HDF I/O Library
32
April 28, 2008LCI Tutorial32 Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
33
April 28, 2008LCI Tutorial33 Writing – move from memory to disk memorydisk
34
April 28, 2008LCI Tutorial34 Partial I/O (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array memorydisk (a) Hyperslab from a 2D array to the corner of a smaller 2D array memory disk Move just part of a dataset
35
April 28, 2008LCI Tutorial35 (c) A sequence of points from a 2D array to a sequence of points in a 3D array. memorydisk (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O memory disk Move just part of a dataset
36
April 28, 2008LCI Tutorial36 Layers – parallel example Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk I/O flows through many layers from application to disk.
37
April 28, 2008LCI Tutorial37 Virtual file I/O (C only) Library internals Virtual I/O layer Object API (C, Fortran 90, Java, C++)
38
April 28, 2008LCI Tutorial38 Virtual file I/O layer A public API for writing I/O drivers Allows HDF5 to interface to disk, the network, memory, or a user-defined device Network File FamilyMPI I/OMemory Virtual file I/O drivers Memory Stdio File Family File “Storage”
39
April 28, 2008LCI Tutorial39 Storage File on parallel file system File Split metadata and raw data files User-defineddevice ? Across the network or to/from another application or library HDF5 format HDF5 data model & API Apps: simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models Common application-specific data models HDF5 virtual file layer (I/O drivers) MPI I/O Split Files Stdio Custom Stream HDF5 serial & parallel I/O UDMSAFhdf5meshHDF-EOSIDL appl-specific appl-specific APIs LANLLLNL, SNLGrids COTSNASA
40
April 28, 2008LCI Tutorial40 Other info Runs almost anywhere Most workstations Big ASC machines, Cray, Compaq TeraGrid and other clusters QA Daily regression tests on key platforms Meets NASA’s highest technology readiness level
41
April 28, 2008LCI Tutorial41 Other HDF Software THG HDF Java tools Command-line utilities Regression and performance testing software Commercial (IDL, Matlab, HDF Explorer, etc.) Community (EOS, ASCI, etc.) Integration with other software (SRB, etc.)
42
April 28, 2008LCI Tutorial42 Tools Utilities Parallel h5diff HDFView Web browser plug-in HDFView and SRB
43
April 28, 2008LCI Tutorial43 Creating an HDF5 file with HDF5 tools HDFView, h5mkgrp, h5import
44
April 28, 2008LCI Tutorial44 A 3-D array of floats B “/” (root) Example: create this HDF5 file 4x6 array of floats
45
April 28, 2008LCI Tutorial45 Example: create this HDF5 file HDFView h5mkgrp file.h5 /B h5import A.txt -c A.conf -o file.h5
46
April 28, 2008LCI Tutorial46 Introduction to HDF5 Programming model and APIs Programming model for sequential access
47
April 28, 2008LCI Tutorial47 HDF5 Software stack Tools & Applications HDF File HDF I/O Library
48
April 28, 2008LCI Tutorial48 Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
49
April 28, 2008LCI Tutorial49 Goals of HDF5 Library Flexible API to support a wide range of operations on data High performance access in serial and parallel computing environments Compatibility with common data models and programming languages Because of these goals, the HDF5 API is rich and large
50
April 28, 2008LCI Tutorial50 Operations supported by the API Create groups, datasets, attributes, linkages Create complex data types Assign storage and I/O properties to objects Complex subsetting during read/write Flexible I/O (parallel, remote, etc.) Ability to transform data during I/O Query about file and structure and properties Query about object structure, content, properties
51
April 28, 2008LCI Tutorial51 Characteristics of the HDF5 API For flexibility, the API is extensive – 300+ functions This can be daunting, at first But there is hope You can do a lot with a just few functions So start simple, and build up your knowledge The library functions are categorized by object type Once you learn the system, it’s much less daunting And there is an “H5Lite” API if all you want to do are simple things.
52
April 28, 2008LCI Tutorial52 The General HDF5 API Currently has C, Fortran 90, Java and C++ bindings. C routines begin with prefix H5*, where * is a single letter indicating the object on which the operation is to be performed. Full functionality Example APIs: H5D :Dataset interface e.g.. H5Dread H5F : File interface e.g.. H5Fopen H5S : dataSpace interfacee.g.. H5Sclose
53
April 28, 2008LCI Tutorial53 How to Compile HDF5 Applications h5cc – HDF5 C compiler command Similar to mpicc h5fc – HDF5 F90 compiler command Similar to mpif90 h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90
54
April 28, 2008LCI Tutorial54 h5cc/h5fc/h5c++ -show option show displays the compiler commands and options without executing them, i.e., dry run % h5cc -show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API - DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include - D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE - D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE - std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions - L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o - L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath - Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
55
April 28, 2008LCI Tutorial55 The General Programming Paradigm Properties (called creation and access property lists) of objects are defined (optional) Objects are opened or created Objects then accessed Objects finally closed
56
April 28, 2008LCI Tutorial56 Order of Operations The library imposes an order on the operations by argument dependencies Example: A file must be opened before a dataset because the dataset open call requires a file handle as an argument Objects can be closed in any order, and reusing a closed object will result in an error
57
April 28, 2008LCI Tutorial57 HDF5 C Programming Issues For portability, HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) hssize_t: for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h at the top of your HDF5 application.
58
April 28, 2008LCI Tutorial58 A 3-D array of floats B “/” (root) 4x6 array of floats Example: Create this HDF file
59
April 28, 2008LCI Tutorial59 A 4x6 array of floats 3-D array of floats B “/” (root) Steps: Example: Create this HDF file
60
April 28, 2008LCI Tutorial60 3-D array of floats “/” (root) Create and HDF5 file
61
April 28, 2008LCI Tutorial61 Steps to Create a File Decide any special properties the file should have Creation properties, like size of user block Access properties, such as metadata cache size Create property lists, if necessary Create the file Close the file and the property lists, as needed
62
April 28, 2008LCI Tutorial62 Create new file with default properties 1 hid_t file_id; 2 herr_t status; 3 file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); 4 status = H5Fclose (file_id);
63
April 28, 2008LCI Tutorial63 4x6 array of floats A 3-D array of floats “/” (root) Add a dataset
64
April 28, 2008LCI Tutorial64 Dataset components DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype
65
April 28, 2008LCI Tutorial65 Dataset Creation Property List Dataset creation property list: information on how to organize data in storage. Chunked Chunked & compressed
66
April 28, 2008LCI Tutorial66 3-D array of floats Steps to create a dataset in a file 1.Define dataset characteristics Dataspace – 4x6 Datatype – float Properties (if needed) 2.Decide where to put it – “root group” Obtain location ID 3.Decide link or path – “A” 4.Create link and dataset in file 5.Close everything A “/” (root)
67
April 28, 2008LCI Tutorial67 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Example: create a dataset Terminate access to dataset, dataspace, & file Create a dataspace rank current dims Create a dataset Dataspace Datatype Property list (default) Pathname
68
April 28, 2008LCI Tutorial68 A 3-D array of floats B “/” (root) Example: create this HDF5 file 4x6 array of floats file.h5
69
April 28, 2008LCI Tutorial69 Creating a group To create a group, the calling program must: Obtain location identifier where group is to be created Create the group Close the group
70
April 28, 2008LCI Tutorial70 Creating a group hid_t file_id, group_id; /* identifiers */... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create a group "/B" in file. */ group_id = H5Gcreate(file_id,"/B",0); /* Close the group and file. */ status = H5Gclose(group_id); Status = H5Fclose(file_id);
71
April 28, 2008LCI Tutorial71 Questions? End of Part I
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.