Download presentation
Presentation is loading. Please wait.
Published byMorgan Ray Modified over 9 years ago
1
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006
2
2 Mesh Example, in HDFView
3
3 HDF5 Data Model
4
4 HDF5 data model HDF5 file – container for scientific data Primary Objects Groups Datasets Additional ways to organize data Attributes Sharable objects Storage and access properties Everything else is built from these parts.
5
5 HDF5 Dataset DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions time = 32.4 pressure = 987 temp = 56 Attributes Chunked compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype
6
6 Dataspaces Dataspace – spatial info about a dataset Rank and dimensions Permanent part of dataset definition Subset of points, for partial I/O Needed only during I/O operations Apply to datasets in memory or in the file Rank = 2 Dimensions = 4x6
7
7 Datatypes (array elements) Datatype – how to interpret a data element Permanent part of the dataset definition Two classes: atomic and compound
8
8 Datatypes HDF5 atomic types normal integer & float user-definable (e.g. 13-bit integer) variable length types (e.g. strings) pointers - references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs Members can be atomic or compound types
9
9 Record int8int4int16 2x3x2 array of float32 Datatype: HDF5 dataset: array of records Dimensionality: 5 x 3 3 5
10
10 Attributes Attribute – data of the form “name = value”, attached to an object Operations scaleddown versions of dataset operations Not extendible No compression No partial I/O Optional for the dataset definition Can be overwritten, deleted, added during the “life” of a dataset
11
11 “Groups” A mechanism for collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes “/” tom dick harry a b c
12
12 “/” x temp / (root) /x /foo /foo/temp /foo/bar/temp HDF5 objects are identified and located by their pathnames foo bar
13
13 Groups & their members can be shared /tom/P /dick/R /harry/P “/” tom dick harry P R P
14
14 Special Storage Options Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable Metadata for Fred Dataset “Fred” File A File B Data for Fred Metadata in one file, raw data in another. Split file
15
15 HDF5 Software
16
16 HDF5 Software stack Tools & Applications HDF File HDF I/O Library
17
17 Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations
18
18 Writing – move from memory to disk memorydisk
19
19 Partial I/O – move just part of a dataset (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array memorydisk (a) Hyperslab from a 2D array to the corner of a smaller 2D array memory disk
20
20 (c) A sequence of points from a 2D array to a sequence of points in a 3D array. memorydisk (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O – move just part of a dataset memory disk
21
21 Layers – parallel example Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk I/O flows through many layers from application to disk.
22
22 Virtual file I/O (C only) Library internals Virtual I/O layer Object API (C, Fortran 90, Java, C++)
23
23 Virtual file I/O layer A public API for writing I/O drivers Allows HDF5 to interface to disk, the network, memory, or a user-defined device Network File FamilyMPI I/OMemory Virtual file I/O drivers Memory Stdio File Family File “Storage”
24
24 Storage File on parallel file system File Split metadata and raw data files User-defineddevice ? Across the network or to/from another application or library HDF5 format HDF5 data model & API Apps: simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models Common application-specific data models HDF5 virtual file layer (I/O drivers) MPI I/O Split Files Stdio Custom Stream HDF5 serial & parallel I/O UDMSAFhdf5meshHDF-EOSIDL appl-specific appl-specific APIs LANLLLNL, SNLGrids COTSNASA
25
25 Other info Runs almost anywhere Most workstations Big ASCI machines, Cray, Compaq TeraGrid and other clusters QA Daily regression tests on key platforms Meets NASA’s highest technology readiness level
26
26 Other HDF Software NCSA HDF Java tools Command-line utilities Regression and performance testing software Commercial (IDL, Matlab, HDF Explorer, etc.) Community (EOS, ASCI, etc.) Integration with other software (SRB, etc.)
27
27 Tools Utilities Parallel h5diff HDFView Web browser plug-in HDFView and SRB
28
28 Thank you
29
29 HDF Information HDF Information Center http://hdfgroup.org/ HDF Help email address hdfhelp@hdfgroup.org/ HDF users mailing list hdfnews@hdfgroup.org/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.