NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.

NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering richard.e.ullman@nasa.govrichard.e.ullman@nasa.gov  SDR  Sensor Data Record  TDR  Temperature Data Record  EDR  Environmental Data Record  IP  Intermediate Product  ARP  Application Related Product  GEO  Geolocation   NPOESS  National Polar-orbiting Operational Environmental Satellite System  NPP  NPOESS Preparatory Project  N A S A N P P S C I E N C E D A T A S E G M E N T I P O A L G O R I T H M D I V I S I O N Attributes extracted from XML Profile, Inserted into HDF as attributes of dataset field Attribute NameTypeComments Add_OffsetNumber Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, the Add_Offset will exist and have a value of zero. DataTypeString String format is: “%d-bit %s”, where %d is the number of bits and %s is one of: osigned integer ounsigned integer ofloating point o (number of significant bits in bitfield) DescriptionStringA descriptive text MeasurementUnitsStringConsistent with SI naming and Unidata’s “udunits” package NumberOfFillValuesIntegerIf zero, then no FillValue_Name and FillValue_Value attributes are attached. NumberOfLegendEntriesInteger If zero, then no LegendEntry_Name and LegendEntry_Value attributes are attached. Legend entries are used for quality fields only. RangeMaxNumber Maximum expected value of field elements in the product, not just this dataset instance. Data type matches type of dataset. RangeMinNumber Minimum expected value of field elements in the product, not just this instance. Data type matches type of dataset. Scale_FactorNumber Data type is the type of the resulting unscaled datum. To un-scale the elements, first multiply the scaled element by the Scale_Factor and then add the Add_Offset. If the dataset is not scaled, Scale_Factor will exist and have a value of one. Scaled Boolean True indicates that Scale_Factor and Add_Offset should be applied to the data to recover the un-scaled element values. Note that fill values are in the dataset type and so must be tested before un-scaling. FillValue_NameSet of string FillValue_Value Set of number Data type matches type of dataset. Dimension_GranuleBoundary Set of Boolean True (1) indicates that this is dimension extends when granules are appended. Dimension_Nameset of string Name match indicates that this dimension is congruent with the same dimension names in other datasets in this product group. Field_NameStringPotentially a place for the CF standard_name LegendEntry_NameSet of string LegendEntry_Value Set of number Data type matches type of dataset. NumberOfDimensionsIntegerInteger greater than zero. This duplicates the HDF attribute. What is “ned”? A “c” language demonstration exercise An exploration of using the HDF5 API and other tools to navigate the NPOESS product information model. –How hard is it to deal with the format challenges? The challenges marked with (see left) are addressed by “ned” –Are NPOESS products enough consistent, one to another that a general purpose reader is possible? –Can a computer program make detailed associations between the HDF5 file and the XML product profile? –Can the product profile be used to drive automated parsing of quality flags? To demonstrate the feasibility of implementation of suggested enhancements in the self-description of NPOESS products. –“ned” makes enhancements to the NPOESS format for challenge items marked by ned version 0.7 ANSI C code –HDF hdf5-1.6.5 –libxml 2.6.16 –udunits 1.4 Development –Apple Xcode –GNU C 4.0 SLOCCount.. –ansic=3707 –XML=1500 Submitted for NASA technology transfer open source release. Written against NPOESS IDPS build 1.4 sample data and profiles. Numerous discrepancies between profiles and sample data resolved by editing XML profiles (not affecting build 1.4 profile DTD). After editing profiles, ned successfully parses and acts upon 50 of 53 non- RDR sample files. Not complete! Format Strengths Straight HDF5.  No need for additional libraries. Consistent HDF5 group structure  Organization for each product is the same as all others.  Data “payload” is always in a product group within All_Data group. Allows for flexible temporal aggregation Granules are appended by extending dataset dimension. Format Challenges Geolocation appears in a separate product group and may be in separate HDF5 file. Field metadata, used to interpret data (similar to netCDF CF) are in separate product profile file. Quality flags must be parsed before they can be interpreted. Information needed for un-scaling scaled integers is not obvious. HDF5 indirect reference link API, used to link metadata to the data in NPOESS’ use is complex and not supported by all analysis COTS implementations. Information Model UML Diagram HDF5 for NPOESS Hierarchical Data Format 5 (HDF5) is the format for delivery of processed products from the National Polar- orbiting Operational Environmental Satellite System (NPOESS) and for the NPOESS Preparatory Project (NPP). HDF5 is a general purpose library and file format for storing scientific data. Two primary objects: Dataset, a multidimensional array of data elements Group, a structure for organizing objects Efficient storage and I/O, including parallel I/O. Free, open source software, multiple platforms. Data stored in HDF5 is used in many fields from computational fluid dynamics to film making. Data can be stored in HDF5 in an endless variety of ways, so it is important to standardize how NPOESS product data is organized in HDF5. ‘Enhanced” UML Diagram NED …. open attribute definition configuration file open the input HDF5 file create an output HDF5 file (make the "All_Data" and "Data_Product" paths) for each product in the "Data_Products" path of the input HDF5 file (H5Giterate) { Find and open the associated XML Product Profile create a product group in the "Data_Products" path of the output HDF5 file for the aggregate reference in the "Data_Products" path of the input HDF5 file { create an "aggregate reference stub" in the output HDF5 file for each attribute attached to the aggregate (H5Aiterate) validate type and range attach attribute to the aggregate reference stub of the output HDF5 file hold aggregate metadata for later } for each granule reference in the "Data_Products" path of the input HDF5 file { create a ”granule reference stub" in the output HDF5 file for each attribute attached to the granule (H5Aiterate) validate type and range attach attribute to the granule reference stub of the output HDF5 file "aggregate" the metadata according to attribute definition file (hold for later) } create a product group in the "All_Data" path of the output HDF5 file attach the aggregate metadata as HDF5 attributes of the product group in the "All_Data" path of output file. attach the aggregated granule metadata as HDF5 attributes of the product group in the "All_Data" path of output file. for each field in the associated product group in the "All_Data" path of the input HDF5 file { Find and open the associated "sub-tree" of the XML profile. read metadata from XML profile and validate type and range ** "aggregate" the field metadata to associate with the data aggregation for each NPOESS "datum" in the field { create a data aggregation field in the output HDF5 file copy the data from the input HDF5 file to the output HDF5 file attach the field metadata to the data aggregation field of the output HDF5 file } ** for each field in the "All_Data" path of the output HDF5 file create the object reference to the field populate the "aggregate reference stub” in the "Data_Products" path of the output HDF5 file ** for each granule for each field in the "All_Data" path of the output HDF5 file create the region reference to the portion of the field appropriate to this granule populate the "granule reference stub"in the "Data_Products" path of the output HDF5 file } What does (0.7)not do? Copy some of the attributes. Aggregate g-Rings. Aggregate percent attributes. Compute reference regions. Produce a user block. Discover shared dimensions and create appropriate aggregate attributes. Additional thoughts Modifications will be necessary to accommodate build 1.5 product profiles. Should the field attribute names follow the CF conventions? What does ned do? Check the profile against the HDF5 product. Validate attributes in HDF5 product (type and value). For each field, read 16 attributes from product profile and attach them as HDF attributes of the field dataset. Attach scale and offset values as “Scale_Factor” and “Add_Offset” attributes to field dataset. Parse bit-wise fields and create separate compressed datasets for each quality flag. Aggregate granule attributes and attach them as attributes of All_data product group.

NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.

Similar presentations

Presentation on theme: "NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.

Similar presentations

Presentation on theme: "NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering."— Presentation transcript:

Similar presentations

About project

Feedback