NP-EMD Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files Kim Tomashosky, Ken Stone, Pat Purcell, Ron Andrews NPOESS Program Aurora, Colorado
NP-EMD Introduction Kim Tomashosky
NP-EMD About NPOESS The National Polar-orbiting Operational Environmental Satellite System * (NPOESS) is a satellite system used to monitor global environmental conditions, and collect and disseminate data related to: –Weather –Atmosphere –Oceans –Land –Near-space environment The National Polar-orbiting Operational Environmental Satellite System (NPOESS) will converge existing polar-orbiting satellite systems under a single national program Polar-orbiting satellites observe Earth from space –They collect and disseminate data on Earth's weather, atmosphere, oceans, land, and near- space environment –The polar orbiters are able to monitor the entire planet and provide data for long-range weather and climate forecasts *
NP-EMD About NPOESS, Continued Increases the timeliness and accuracy of severe weather event forecasts Will collect over 50 environmental measurements which are crucial to timely, accurate, weather forecasts by military and civilian organizations. It will enable: –Increased accuracy in severe storm warnings and forecasting –Improved drought analysis and flood warnings Managed by the tri-agency Integrated Program Office * (IPO) utilizing personnel from the Department of Commerce, Department of Defense, and NASA *
NP-EMD NPOESS Data Products NPOESS Data Products are distributed, formatted in HDF5 –Archived and made available to the community via the Comprehensive Large Array- data Stewardship System * (CLASS), an electronic library of NOAA environmental data –There is no “HDF-NPOESS” library, NPOESS Data Products have been designed using the native HDF5 library NPOESS Data Products –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) *
NP-EMD Data Organization Data Product Granules –A segment of data, with the size optimally determined to achieve maximum efficiency for an algorithm class. –It is associated with an integer number of sensor scans, and its definition varies for sensors and data products –Gaps in granules are filled using a pre-defined ‘missing data’ fill value –Represented as a set of region reference pointers to sections of the respective data set arrays Data Product Aggregations –A grouping of the same kind of granules packaged in HDF5 covering a temporal range –May contain as few as one granule and as many as an orbit of granules –Represented as a set of object reference pointers to the various groupings of data which make up a particular data product (one for each homogenous dataset included in the granule)
NP-EMD NPOESS Documentation Documentation for the NPOESS Data Products –NPOESS Common Data Format Control Book – External Volume I – Overview Volume II – RDR Formats Volume III – SDR/TDR Formats Volume IV – EDR/IP/ARP Formats Volume V – Metadata Volume VI – Ancillary Data, Auxiliary Data, Messages, and Reports Volume VII – Application Packets
NP-EMD NPOESS HDF5 General Overview Ron Andrews
NP-EMD HDF5 Conceptual Diagram
NP-EMD HDF5 XML User Block The XML User Block for NPOESS Data Products provides a ‘quick-look’ into the metadata of the associated HDF5 file –The size of the HDF5 XML User Block will be a multiple of 512 bytes The XML User Blocks are defined in the following volumes of the CDFCB-X: –Volume V – Metadata Contains the XML User Block formats for: –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) –Volume VI – Ancillary, Auxiliary, Reports, and Messages Contains the XML User Block formats for the Ancillary and Auxiliary data files that are delivered in HDF5 Example elements: –Mission, Platform, and Instrument Names –Number_of_Data_Products –CollectionShortName(s) –Aggregation Information –Timestamps
NP-EMD General HDF5 File Structure
NP-EMD NPOESS HDF5 Metadata Locations The NPOESS HDF5 Metadata is organized hierarchically, from the top down in order to reduce duplication of information and to take advantage of the hierarchical nature of HDF5 –Root Group Data Products Group –Data Product (indicated by the specific product’s identifier) »Product Aggregation Dataset »Product Granule Dataset
NP-EMD HDF5 Conceptual Diagram - Data
NP-EMD NPOESS Quality Flags Overview The concept is to provide for consistently stored, high density, quality information about the delivered data – simplifying usability while maintaining storage efficiency Quality flags are qualifications of one or more consecutive bits in each byte. Quality flag arrays follow the structure of the data product –The size of the arrays are equal to or less than the size of the data to which the quality information applies (dimensions correspond to the data product arrays) Quality flags are stored in the HDF5 files as n number(s) of two or three dimensional, 1-byte arrays. –The number of arrays is dependant on the quality flag definitions, specific to each data product –Each byte may contain multiple bit-level flags –Quality flags will be ordered such that each flag is entirely contained within a single byte, occasionally resulting in a byte with reserved or meaningless bits –Byte alignment is the same for every quality flag array First bit (left-most) is the LSB
NP-EMD Dimensional Array Example
NP-EMD Detailed NPOESS UML Models Ken Stone
NP-EMD RDR UML Model
NP-EMD Common RDR Layout
NP-EMD SDR/TDR UML Model
NP-EMD EDR UML Model
NP-EMD Geolocation UML Model
NP-EMD Ancillary/Auxiliary UML Models
NP-EMD NPOESS Sample Data Reading the NPOESS HDF5 file with the HDF API Patrick Purcell
NP-EMD VIIRS Ice Surface Temperature (IST) Environmental Data Record (EDR) Example UML Model
NP-EMD The NPOESS Granule - Product Profile Ice Surface Temperature The Product Profile describes the NPOESS granule. For Ice Surface Temperature, the fields in the granule are: –IST_Array (Shown below) –QF1_VIIRSISTEDR (Shown below) –QF2_VIIRSISTEDR –QF3_VIIRSISTEDR –ISTFactors (Scale & Offset – Shown below)
NP-EMD The NPOESS Granule - Product Profile IST Quality Flag Byte 1
NP-EMD The NPOESS Granule - Product Profile IST Scale Factors
NP-EMD VIIRS Ice Surface Temperature (IST) EDR – HDFView Screenshot
NP-EMD The NPOESS Granule – HDF View The granule dataset array “VIIRS-IST-EDR_Gran_1” contains object IDs that “point” or dereference to the second region of each dataset array under the “VIIRS-IST-EDR_All” group: The first object ID in the VIIRS-IST- EDR_Gran_1 array dereferences to the middle portion of the IST_Array All of these “portions” share the same time effectivity and other granule level metadata.
NP-EMD References to Regions
NP-EMD NPOESS Granules – Derefencing to Datasets Suggested Improvements to the HDF API Problem: When dereferencing to a portion of a dataset, currently there is no way to know the name of the dataset through the API –Solution: Add an API function that will return the name of the dataset referenced Note: This will be added to v1.8 beta Problem: When dereferencing to a portion of a dataset, a copy of the entire dataset’s dataspace is returned. The requested selection is populated with data, but other regions of the dataspace are filled –When the selection is a simple hyperslab, many users expect to retrieve only the hyperslab referenced, not a copy of the entire dataspace –This leads to confusion... a novice user of references will tend to size to the selection, not to the entire dataset’s dataspace –Solution: Add an option to the API to allow only the selection to be returned from the H5Rdereference command when choosing simple, contiguous hyperslab selections (as with NPOESS HDF5 granules) Any other suggestions from users?
NP-EMD NPOESS HDF5 Files Summary The NPOESS Program delivers the official deliverable data products (RDR, SDR/TDR, EDR/ARP/IP) and dynamic ancillary data and auxiliary data in HDF5 Files The HDF5 Files have an XML User Block that can be accessed without HDF5 tools - provides a “quick-look” into the metadata before opening the HDF5 file Metadata within the HDF5 files are stored as attributes There are general UML Models for the NPOESS official delivered data that provide a common framework Official deliverable data products are organized by reference objects (aggregations) which contain one or more reference regions (granules) Although data may be accessed directly through the All Data group, the Data Products group provides integrated access: –Allows the user to access both metadata and data through a common HDF5 group Metadata is accessed directly by reading the Attribute values Datasets may be accessed by dereferencing the object ID stored in the Data Products Group for the aggregation or granule NPOESS HDF5 files provide flexibility for a variety of end users.
NP-EMD Backup Slides
NP-EMD NPOESS Granules – Derefencing to Datasets Details (See the HDF5 User’s Guide release 1.6.5, Chapter 2, “The HDF5 Library and Programming Model” Section 2, “Dataspace Function Summaries” - H5S commands) Note that the H5S API commands fall into two broad categories: 1.Dataspace Management & Query Functions These functions operate on the entire dataspace –Entire dataspace is equivalent to an entire (temporal) aggregated array’s dataspace in an NPOESS HDF5 file under the “All_Data” group Example: H5Sget_simple_extent_npoints –Returns the number of elements in the entire Array under “All_Data” for HDF5 NPOESS. –For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 768 x 3200 = 2,457,600 points. 2.Dataspace Selection Functions – hyperslabs and points These functions operate on a hyperslab or a point selection For NPOESS HDF5 files, the “selection” is equivalent to the granule (hyperslab) for a particular field (array) The “selection” is the portion of the data array the reference “points” to: –Example: H5Sget_select_npoints »Determines the number of points in a dataspace selection. »For HDF5 NPOESS, this would be the number of points in a granule for a particular field »For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 256 x 3200 = 819,200 points. –Note that the “select” in the API command is short for “selection”. It is not a redundant term for “get”.
NP-EMD Extract from HDF5 User’s Guide (1.6.5), Section The Programming Model Reading and Writing a Portion of a Dataset A “selection” may be: –A hyperslab (NPOESS uses this only) –A Union of hyperslabs –A list of independent points. –Note: These illustrations show a mapping procedure to another dataspace. The HDF5 API does not do this when you dereference... this would be user defined.
NP-EMD h5dump Screenshot – VIIRS Sea Surface Temperature HDF5 File Another way to view the arrays of references (Aggregation and Granule dataset arrays) is with the h5dump utility: –Granule: –Aggregation: –Note: Currently, the only way to match the object ID in the granule/aggregation datasets is to manually list the aggregation as shown above using h5dump or look up the order in the NPOESS Data Format Control Book - External. The HDF Group will add the ability to obtain the name of the dataset a reference points to in v1.8 beta.
NP-EMD Needed Improvements in HDF H5R – Dereference API Suggestion: Allow the user to directly dereference and read only the hyperslab selection that the reference “points” to. The size of the returned dataset should be the size the of selection only, not the size of the entire dataspace for the object referenced. –Currently, the H5Rdereference call returns a handle to the dataset referenced and therefore, provides access to that dataset’s dataspace using H5Sget_ commands. Note that the reference can point to a very complex set of hyperslabs and/or individual points. The NPOESS selection is not complex... it is a simple hyperslab. Example: We request Granule 1 (second granule). The reference returns a handle to the entire dataset. Granule regions 0 and 2 contain fill data while the requested Gran_1 contains the data selection defined by the reference. The data must be read to a new array in order to obtain an array with just the desired Gran_1’s data and size.
NP-EMD Needed Improvements in HDF H5R – Dereference API (cont) Screenshot of output from the ISTFactors Array –A handle to the dataset is returned with the corresponding dataspace (in this example, size = 6) –The selected region contains the valid data from VIIRS-IST-EDR_Gran_0 ( and ). Other regions are (approximately) filled to zero.
NP-EMD Sample Code (p1)– Reads a Multi-Granule HDF5 NPOESS File
NP-EMD Sample Code (p2)
NP-EMD Sample Code (p3)
NP-EMD Sample Code (p4) – Code Output
NP-EMD Sample Files & HDF5 Reference API Summary NPOESS granules are made up of portions of one or more dataset arrays. In order to access a granule, the granule dataset must be read and each object ID dereferenced using the HDF Reference API (H5R) Use H5Sget_... commands to retrieve information about the entire dataspace of the array containing a reference’s selection (or hyperslab) Use H5Sget_select_... command to retrieve information about the selection only Suggested future enhancements to the HDF Reference API: –Add the ability to retrieve the name of the dataset containing a particular selection (to be added with v1.8 beta) –Add the ability to directly retrieve the hyperslab sized to the dataspace of the hyperslab only... not the size of the entire dataset referenced.