Download presentation
Presentation is loading. Please wait.
1
NP-EMD.2006.580.0001 Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files Kim Tomashosky, Ken Stone, Pat Purcell, Ron Andrews NPOESS Program Aurora, Colorado
2
NP-EMD.2006.580.0001 Introduction Kim Tomashosky
3
06.580.0001 NP-EMD.2006.580.0001 3 About NPOESS The National Polar-orbiting Operational Environmental Satellite System * (NPOESS) is a satellite system used to monitor global environmental conditions, and collect and disseminate data related to: –Weather –Atmosphere –Oceans –Land –Near-space environment The National Polar-orbiting Operational Environmental Satellite System (NPOESS) will converge existing polar-orbiting satellite systems under a single national program Polar-orbiting satellites observe Earth from space –They collect and disseminate data on Earth's weather, atmosphere, oceans, land, and near- space environment –The polar orbiters are able to monitor the entire planet and provide data for long-range weather and climate forecasts * http://www.ipo.noaa.gov/
4
06.580.0001 NP-EMD.2006.580.0001 4 About NPOESS, Continued Increases the timeliness and accuracy of severe weather event forecasts Will collect over 50 environmental measurements which are crucial to timely, accurate, weather forecasts by military and civilian organizations. It will enable: –Increased accuracy in severe storm warnings and forecasting –Improved drought analysis and flood warnings Managed by the tri-agency Integrated Program Office * (IPO) utilizing personnel from the Department of Commerce, Department of Defense, and NASA * http://www.ipo.noaa.gov/About/ipo_orgTXT.html
5
06.580.0001 NP-EMD.2006.580.0001 5 NPOESS Data Products NPOESS Data Products are distributed, formatted in HDF5 –Archived and made available to the community via the Comprehensive Large Array- data Stewardship System * (CLASS), an electronic library of NOAA environmental data –There is no “HDF-NPOESS” library, NPOESS Data Products have been designed using the native HDF5 library NPOESS Data Products –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) * http://www.class.noaa.gov/
6
06.580.0001 NP-EMD.2006.580.0001 6 Data Organization Data Product Granules –A segment of data, with the size optimally determined to achieve maximum efficiency for an algorithm class. –It is associated with an integer number of sensor scans, and its definition varies for sensors and data products –Gaps in granules are filled using a pre-defined ‘missing data’ fill value –Represented as a set of region reference pointers to sections of the respective data set arrays Data Product Aggregations –A grouping of the same kind of granules packaged in HDF5 covering a temporal range –May contain as few as one granule and as many as an orbit of granules –Represented as a set of object reference pointers to the various groupings of data which make up a particular data product (one for each homogenous dataset included in the granule)
7
06.580.0001 NP-EMD.2006.580.0001 7 NPOESS Documentation Documentation for the NPOESS Data Products –NPOESS Common Data Format Control Book – External Volume I – Overview Volume II – RDR Formats Volume III – SDR/TDR Formats Volume IV – EDR/IP/ARP Formats Volume V – Metadata Volume VI – Ancillary Data, Auxiliary Data, Messages, and Reports Volume VII – Application Packets
8
NP-EMD.2006.580.0001 NPOESS HDF5 General Overview Ron Andrews
9
06.580.0001 NP-EMD.2006.580.0001 9 HDF5 Conceptual Diagram
10
06.580.0001 NP-EMD.2006.580.0001 10 HDF5 XML User Block The XML User Block for NPOESS Data Products provides a ‘quick-look’ into the metadata of the associated HDF5 file –The size of the HDF5 XML User Block will be a multiple of 512 bytes The XML User Blocks are defined in the following volumes of the CDFCB-X: –Volume V – Metadata Contains the XML User Block formats for: –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) –Volume VI – Ancillary, Auxiliary, Reports, and Messages Contains the XML User Block formats for the Ancillary and Auxiliary data files that are delivered in HDF5 Example elements: –Mission, Platform, and Instrument Names –Number_of_Data_Products –CollectionShortName(s) –Aggregation Information –Timestamps
11
06.580.0001 NP-EMD.2006.580.0001 11 General HDF5 File Structure
12
06.580.0001 NP-EMD.2006.580.0001 12 NPOESS HDF5 Metadata Locations The NPOESS HDF5 Metadata is organized hierarchically, from the top down in order to reduce duplication of information and to take advantage of the hierarchical nature of HDF5 –Root Group Data Products Group –Data Product (indicated by the specific product’s identifier) »Product Aggregation Dataset »Product Granule Dataset
13
06.580.0001 NP-EMD.2006.580.0001 13 HDF5 Conceptual Diagram - Data
14
06.580.0001 NP-EMD.2006.580.0001 14 NPOESS Quality Flags Overview The concept is to provide for consistently stored, high density, quality information about the delivered data – simplifying usability while maintaining storage efficiency Quality flags are qualifications of one or more consecutive bits in each byte. Quality flag arrays follow the structure of the data product –The size of the arrays are equal to or less than the size of the data to which the quality information applies (dimensions correspond to the data product arrays) Quality flags are stored in the HDF5 files as n number(s) of two or three dimensional, 1-byte arrays. –The number of arrays is dependant on the quality flag definitions, specific to each data product –Each byte may contain multiple bit-level flags –Quality flags will be ordered such that each flag is entirely contained within a single byte, occasionally resulting in a byte with reserved or meaningless bits –Byte alignment is the same for every quality flag array First bit (left-most) is the LSB
15
06.580.0001 NP-EMD.2006.580.0001 15 2-Dimensional Array Example
16
NP-EMD.2006.580.0001 Detailed NPOESS UML Models Ken Stone
17
06.580.0001 NP-EMD.2006.580.0001 17 RDR UML Model
18
06.580.0001 NP-EMD.2006.580.0001 18 Common RDR Layout
19
06.580.0001 NP-EMD.2006.580.0001 19 SDR/TDR UML Model
20
06.580.0001 NP-EMD.2006.580.0001 20 EDR UML Model
21
06.580.0001 NP-EMD.2006.580.0001 21 Geolocation UML Model
22
06.580.0001 NP-EMD.2006.580.0001 22 Ancillary/Auxiliary UML Models
23
NP-EMD.2006.580.0001 NPOESS Sample Data Reading the NPOESS HDF5 file with the HDF API Patrick Purcell
24
06.580.0001 NP-EMD.2006.580.0001 24 VIIRS Ice Surface Temperature (IST) Environmental Data Record (EDR) Example UML Model
25
06.580.0001 NP-EMD.2006.580.0001 25 The NPOESS Granule - Product Profile Ice Surface Temperature The Product Profile describes the NPOESS granule. For Ice Surface Temperature, the fields in the granule are: –IST_Array (Shown below) –QF1_VIIRSISTEDR (Shown below) –QF2_VIIRSISTEDR –QF3_VIIRSISTEDR –ISTFactors (Scale & Offset – Shown below)
26
06.580.0001 NP-EMD.2006.580.0001 26 The NPOESS Granule - Product Profile IST Quality Flag Byte 1
27
06.580.0001 NP-EMD.2006.580.0001 27 The NPOESS Granule - Product Profile IST Scale Factors
28
06.580.0001 NP-EMD.2006.580.0001 28 VIIRS Ice Surface Temperature (IST) EDR – HDFView Screenshot
29
06.580.0001 NP-EMD.2006.580.0001 29 The NPOESS Granule – HDF View The granule dataset array “VIIRS-IST-EDR_Gran_1” contains object IDs that “point” or dereference to the second region of each dataset array under the “VIIRS-IST-EDR_All” group: The first object ID in the VIIRS-IST- EDR_Gran_1 array dereferences to the middle portion of the IST_Array All of these “portions” share the same time effectivity and other granule level metadata.
30
06.580.0001 NP-EMD.2006.580.0001 30 References to Regions
31
06.580.0001 NP-EMD.2006.580.0001 31 NPOESS Granules – Derefencing to Datasets Suggested Improvements to the HDF API Problem: When dereferencing to a portion of a dataset, currently there is no way to know the name of the dataset through the API –Solution: Add an API function that will return the name of the dataset referenced Note: This will be added to v1.8 beta Problem: When dereferencing to a portion of a dataset, a copy of the entire dataset’s dataspace is returned. The requested selection is populated with data, but other regions of the dataspace are filled –When the selection is a simple hyperslab, many users expect to retrieve only the hyperslab referenced, not a copy of the entire dataspace –This leads to confusion... a novice user of references will tend to size to the selection, not to the entire dataset’s dataspace –Solution: Add an option to the API to allow only the selection to be returned from the H5Rdereference command when choosing simple, contiguous hyperslab selections (as with NPOESS HDF5 granules) Any other suggestions from users?
32
06.580.0001 NP-EMD.2006.580.0001 32 NPOESS HDF5 Files Summary The NPOESS Program delivers the official deliverable data products (RDR, SDR/TDR, EDR/ARP/IP) and dynamic ancillary data and auxiliary data in HDF5 Files The HDF5 Files have an XML User Block that can be accessed without HDF5 tools - provides a “quick-look” into the metadata before opening the HDF5 file Metadata within the HDF5 files are stored as attributes There are general UML Models for the NPOESS official delivered data that provide a common framework Official deliverable data products are organized by reference objects (aggregations) which contain one or more reference regions (granules) Although data may be accessed directly through the All Data group, the Data Products group provides integrated access: –Allows the user to access both metadata and data through a common HDF5 group Metadata is accessed directly by reading the Attribute values Datasets may be accessed by dereferencing the object ID stored in the Data Products Group for the aggregation or granule NPOESS HDF5 files provide flexibility for a variety of end users.
33
NP-EMD.2006.580.0001 Backup Slides
34
06.580.0001 NP-EMD.2006.580.0001 34 NPOESS Granules – Derefencing to Datasets Details (See the HDF5 User’s Guide release 1.6.5, Chapter 2, “The HDF5 Library and Programming Model” Section 2, “Dataspace Function Summaries” - H5S commands) Note that the H5S API commands fall into two broad categories: 1.Dataspace Management & Query Functions These functions operate on the entire dataspace –Entire dataspace is equivalent to an entire (temporal) aggregated array’s dataspace in an NPOESS HDF5 file under the “All_Data” group Example: H5Sget_simple_extent_npoints –Returns the number of elements in the entire Array under “All_Data” for HDF5 NPOESS. –For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 768 x 3200 = 2,457,600 points. 2.Dataspace Selection Functions – hyperslabs and points These functions operate on a hyperslab or a point selection For NPOESS HDF5 files, the “selection” is equivalent to the granule (hyperslab) for a particular field (array) The “selection” is the portion of the data array the reference “points” to: –Example: H5Sget_select_npoints »Determines the number of points in a dataspace selection. »For HDF5 NPOESS, this would be the number of points in a granule for a particular field »For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 256 x 3200 = 819,200 points. –Note that the “select” in the API command is short for “selection”. It is not a redundant term for “get”.
35
06.580.0001 NP-EMD.2006.580.0001 35 Extract from HDF5 User’s Guide (1.6.5), Section 4.2 - The Programming Model Reading and Writing a Portion of a Dataset A “selection” may be: –A hyperslab (NPOESS uses this only) –A Union of hyperslabs –A list of independent points. –Note: These illustrations show a mapping procedure to another dataspace. The HDF5 API does not do this when you dereference... this would be user defined.
36
06.580.0001 NP-EMD.2006.580.0001 36 h5dump Screenshot – VIIRS Sea Surface Temperature HDF5 File Another way to view the arrays of references (Aggregation and Granule dataset arrays) is with the h5dump utility: –Granule: –Aggregation: –Note: Currently, the only way to match the object ID in the granule/aggregation datasets is to manually list the aggregation as shown above using h5dump or look up the order in the NPOESS Data Format Control Book - External. The HDF Group will add the ability to obtain the name of the dataset a reference points to in v1.8 beta.
37
06.580.0001 NP-EMD.2006.580.0001 37 Needed Improvements in HDF H5R – Dereference API Suggestion: Allow the user to directly dereference and read only the hyperslab selection that the reference “points” to. The size of the returned dataset should be the size the of selection only, not the size of the entire dataspace for the object referenced. –Currently, the H5Rdereference call returns a handle to the dataset referenced and therefore, provides access to that dataset’s dataspace using H5Sget_ commands. Note that the reference can point to a very complex set of hyperslabs and/or individual points. The NPOESS selection is not complex... it is a simple hyperslab. Example: We request Granule 1 (second granule). The reference returns a handle to the entire dataset. Granule regions 0 and 2 contain fill data while the requested Gran_1 contains the data selection defined by the reference. The data must be read to a new array in order to obtain an array with just the desired Gran_1’s data and size.
38
06.580.0001 NP-EMD.2006.580.0001 38 Needed Improvements in HDF H5R – Dereference API (cont) Screenshot of output from the ISTFactors Array –A handle to the dataset is returned with the corresponding dataspace (in this example, size = 6) –The selected region contains the valid data from VIIRS-IST-EDR_Gran_0 (0.0003 and 251.11). Other regions are (approximately) filled to zero.
39
06.580.0001 NP-EMD.2006.580.0001 39 Sample Code (p1)– Reads a Multi-Granule HDF5 NPOESS File
40
06.580.0001 NP-EMD.2006.580.0001 40 Sample Code (p2)
41
06.580.0001 NP-EMD.2006.580.0001 41 Sample Code (p3)
42
06.580.0001 NP-EMD.2006.580.0001 42 Sample Code (p4) – Code Output
43
06.580.0001 NP-EMD.2006.580.0001 43 Sample Files & HDF5 Reference API Summary NPOESS granules are made up of portions of one or more dataset arrays. In order to access a granule, the granule dataset must be read and each object ID dereferenced using the HDF Reference API (H5R) Use H5Sget_... commands to retrieve information about the entire dataspace of the array containing a reference’s selection (or hyperslab) Use H5Sget_select_... command to retrieve information about the selection only Suggested future enhancements to the HDF Reference API: –Add the ability to retrieve the name of the dataset containing a particular selection (to be added with v1.8 beta) –Add the ability to directly retrieve the hyperslab sized to the dataspace of the hyperslab only... not the size of the entire dataset referenced.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.