Download presentation
Presentation is loading. Please wait.
Published byAnabel Garrett Modified over 9 years ago
1
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA nyeager@ncsa.uiuc.edu
2
What is HDF? A scientific data format and supporting softwareA scientific data format and supporting software Stores images, multidimensional arrays, tables, annotationsStores images, multidimensional arrays, tables, annotations Free and commercial software supportFree and commercial software support Emphasis on standardsEmphasis on standards Users from many engineering and scientific fieldsUsers from many engineering and scientific fields Biggest user: NASA Earth Observing System Data and Information System (EOSDIS)Biggest user: NASA Earth Observing System Data and Information System (EOSDIS) HDF4 and HDF5HDF4 and HDF5
3
HDF file with a mixture of objects March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude... 3-D array Raster image 2-D array group Raster image palette annotation HDF file lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6Table
4
Utilities and applications for manipulating, viewing, and analyzing data in HDF files. A software library: – High-level, object-specific APIs. – Low-level I/O drivers. A physical file or other medium (network, memory, etc.). Applications Application Programming Interfaces Low-level Interface HDF file HDF software layers
5
A Sampling of HDF Visualization and Data Analysis Tools MATLAB, IDL Commercial NOeSYS, Transform Commercial HDF Explorer Commercial JHV, Java Browser NCSA shareware Scientific Data Browser NCSA shareware DIAL Raytheon
6
EOSDIS Open standard for exchange of remote-sensed dataOpen standard for exchange of remote-sensed data –Scores of instruments and datasets –2+ terabytes per day –1,000 primary users, 30,000 secondary HDF RequirementsHDF Requirements –Support for scientists, data producers, archiving, etc. –Library and file structure optimization –HDF tools, utilities, access software –Software maintenance and QA
7
HDF Shortcomings Exposed by EOSDIS Very Large DatasetsVery Large Datasets –Object and File Sizes > 2GB –Number of Objects > 20 K Concurrent Access: parallel I/O, threadsConcurrent Access: parallel I/O, threads Richer, More Flexible Data ModelRicher, More Flexible Data Model –complex data structures –complex subsetting
8
HDF5 A successor to HDF (currently HDF4.1)A successor to HDF (currently HDF4.1) A new API, file structure, libraryA new API, file structure, library Addresses New Demands :Addresses New Demands : –Supports Large Data Models –Parallel I/O (MPIO), threads (not implemented yet) –Complex Data Structures –Smaller, faster
9
“/” “/foo” “/foo/bar” HDF5 Data Model : Group A UNIX-like directory structure containing groups, datasets, annotationsA UNIX-like directory structure containing groups, datasets, annotations Directory is a graph, rather than a treeDirectory is a graph, rather than a tree
10
HDF5 Data Model: Dataset Record int8int4int16 float32 Dimensionality: 5 x 3 Number type: 3 5 A MultiDimensional Array of Records
11
Data Records can be: Atomic Datatype ( standard integer )Atomic Datatype ( standard integer ) Compound Datatype ( C structs )Compound Datatype ( C structs ) MultidimensionalMultidimensional Pointer ( reference to dataset, region )Pointer ( reference to dataset, region ) Record int8int4int16 float32 Number type:
12
Metadata header Dataset “Fred” Data int16 time = 32.4 pressure = 987 temp = 56 Datatype Attributes Dataspace 2 Dim_3=2 Dim_2=4 Dim_1=5 Rank Dimensions Chunked; compressed Storage info Dataset components array of data elementsarray of data elements metadatametadata –datatype –dataspace –attributes –storage info
13
Dataset elements (datatypes) standard integer & floatstandard integer & float user-definable scalars (e.g. 13-bit integer)user-definable scalars (e.g. 13-bit integer) variable length types (e.g. strings)variable length types (e.g. strings) pointers - references to objects/regions of datasetspointers - references to objects/regions of datasets enumeration - names mapped to integersenumeration - names mapped to integers compound typescompound types –Comparable to C structs –Members can be atomic or compound types –Members can be multidimensional
14
Attributes Are small pieces of dataAre small pieces of data Attached to datasets or groupsAttached to datasets or groups Operations are scaleddown versions of the dataset operationsOperations are scaleddown versions of the dataset operations –Not extendible –No compression –No partial I/O
15
Dataset features Extendible in any directionExtendible in any direction Special storage optionsSpecial storage options –contiguous –external, chunked, compressed –users can add others User-defined attribute listUser-defined attribute list
16
Dataset Storage Options chunkedchunked compressedcompressed extendableextendable split filesplit file Metadata for Fred Dataset “Fred” File A File B Data for Fred Better subsetting access time; extendable Improves storage efficiency, transmission speed Arrays can be extended in any direction Metadata in one file, raw data in another.
17
Dataset Selection Options Selection describes how data points are organized to form a datasetSelection describes how data points are organized to form a dataset Select a subset of points for partial I/OSelect a subset of points for partial I/O Selection can be:Selection can be: –a set of points –a region within an array Selections describe array in memory or in the fileSelections describe array in memory or in the file
18
(c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Number of elements must be equal. (b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array (a) A hyperslab from a 2D array to the corner of a smaller 2D array Selection region in memory can be different shape from selection region in file
19
Sub-selection Options Flexibility in mappings between data in memory and object in fileFlexibility in mappings between data in memory and object in file Selection regions can beSelection regions can be –points –hyperslabs –unions of hyperslabs Selection region in memory can be different shape from selection in fileSelection region in memory can be different shape from selection in file Supports parallel I/O via MPI-I/OSupports parallel I/O via MPI-I/O –hyperslab selections translated to MPI derived types before performing I/O
20
HDF5 Raw Data Pipeline Handles data transformations between file and memory.Handles data transformations between file and memory. Deals with multiple storage optionsDeals with multiple storage options –chunking, compression, number conversion,... Optimized performance for common usageOptimized performance for common usage Hooks for new filtersHooks for new filters –compression schemes, encryption, checksum,... –user-specified filters
21
ERBE cloud tracking Index Dataset Radiance Dataset Rich Framework for Building Search Applications INDEX RECORD CONTAINS POINTER to REGION in DATASET Surface Temperature Dataset Data Structures for Building Efficient External Indexes and Storing them in the Data file
22
Rich Framework for Building Search Applications Efficient data Structures for Building External Indexes and Storing them in file with dataEfficient data Structures for Building External Indexes and Storing them in file with data ERBE cloud system trackingERBE cloud system tracking –Automation of ftp and tape storage on a PC for large data volumes (90GB in; 13,000 images out) –Code to sort data from time-ordered basis to spatial time sequences for 100 million footprints per data mo..
23
HDF Information HDF Information CenterHDF Information Center –http://hdf.ncsa.uiuc.edu/ HDF Help email addressHDF Help email address –hdfhelp@ncsa.uiuc.edu HDF users mailing listHDF users mailing list –hdfnews@ncsa.uiuc.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.