Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H.

Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H. K. “Rama” Ramapriyan and John Moses ESDIS Project NASA Goddard Space Flight Center Rama.Ramapriyan@nasa.govRama.Ramapriyan@nasa.gov; John.F.Moses@nasa.govJohn.F.Moses@nasa.gov

Outline Why do we need a standard? (reaffirm) Background Status Content Matrix Next steps – Draft introductory text material – Send matrix for broader review – NASA missions teams – satellite and aircraft investigations, MEaSUREs teams, DAAC UWGs, USGS, NOAA, EPA(?), ESA(?) – Decide on IEEE or ISO route to proceed further

Why Standard? Need to understand and document what content is essential for long-term preservation of information from Earth observations Common approach and consistency across organizations (national and international) will be beneficial to ensure future long-term archives preserve necessary content – Data needed for long-term science studies come from multiple organizations Standard will enable new missions to plan for preservation of required content – Less expensive to plan ahead than retrofit (retrofitting may be impossible) Assessing compliance to standard helps determine utility of datasets for long-term science studies – Standard is worth having even if compliance is not 100% (depends on capabilities, requirements, budgets of organizations) Standard should provide priorities for content – e.g., critical, essential, desirable Other Reasons??

Background Proposed initiating standard activity at January 2011 ESIP Meeting – Attendees were very supportive of the idea – Agreed we should develop a matrix – candidate columns defined – Don’t know of an existing standard that covers enumeration of all content that must be preserved May be some overlaps with ISO 19115, but need to identify them item by item We are focused on “what” not “how” Current title of standard: “Provenance and Context Content Standard for Data Supporting Global Change Research” – We propose simplifying to: “Preservation Content Standard for Data Supporting Global Change Research”

Status (1 of 2) Received inputs from Ted Habermann et al (NOAA) Merged with NASA’s inputs based on USGCRP Workshop (1998) Report, and discussions (2010- 2011) with EOS instrument teams (GLAS, HIRDLS) and TOMS instrument PI – Note – USGCRP workshop, jointly sponsored by NASA and NOAA, identified a number of scenarios from which content recommendations were derived Content matrix was developed and posted on ESIP Data Stewardship and Preservation Cluster wiki – Initial version – March 1, 2011 – Latest version (incorporates comments from cluster members) – June 8, 2011 – Focused mostly on satellite remote sensing data; need to ensure we cover other types of data (aircraft, in situ)

Status (2 of 2) ESA/NASA Earth Science Framework for Cooperation – NASA and ESA are discussing collaboration and/or coordination in various areas, including data systems – Subgroup 3 - Ground Segments & Data Goals: Collaborate between ESA and NASA ground segments and data systems to enhance mission return and to enable efficient development and generation of multi-mission/multi-agency data products. – ESA is very interested in coordinating the Content Standard with us – ESA has provided us with their documents related to Long-Term Data Preservation (LTDP) – Provides an opportunity to compare notes, be more comprehensive and avoid duplication – could ease the way to international standard – We have compared our matrix with ESA documents and identified only a few differences in content – We will share the matrix with ESA after this meeting

Summary Earth Science Content Information OAIS Ref Model terms: Content Information Object includes Content Data Object with Representation Information for the Designated Community – Earth Science researchers. – Preservation Description Information Types: Provenance (source of info), Context (relation to other info), Reference (identifiers), Fixity (e.g., checksum) – PCCS is not covering Reference and Fixity explicitly – should it? PCCS categories are defined to highlight the important information needed for Earth Science research. – To identify the flow of information through an instrument’s life cycle. – To develop a user’s sense of what should be included as provenance and context information about a particular instrument’s dataset. – Extendable to classes of instruments and measurement systems

Content Matrix – Introduction (1 of 3) Using column headings discussed at January 2011 ESIP meeting (mostly) Each row corresponds to a content item and provides details Content items are mapped into 8 categories (see later chart) One or more content items are defined in each of the categories Column headings – Item Number (C.N – category and number within category) – Category – Content Item Name – Definition / Description – Rationale (why content is needed) – Criteria (how good content should be) – Priority (H, M, L or critical, essential, desirable)

Content Matrix – Introduction (2 of 3) Column headings (cont.) – Source (who should provide content item) – Project phase for capture – User community (who would be most likely to need the content item – this column is mostly blank in this version; needs group inputs) – Representation (while focus in on “what”, brief comments are included here on whether items are word files, numeric files, pointers, etc.) – Distribution restrictions (potential proprietary or ITAR concerns associated with content item) – Source identifying item (where content item came from – NASA, NOAA or both)

Content Matrix – Introduction (3 of 3) Further level of detail is possible - components within content items – Different subsets of components are needed under different circumstances – Discriminators that determine such subsets: Platform type (satellite, aircraft, flux tower, buoy, etc.) Instrument type Measurement type Product level

Categories 1.Preflight/Pre-Operations: Instrument/Sensor characteristics including preflight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets 2.Products (Data): Raw instrument data, Level 0 through Level 4 data products and associated metadata 3.Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information 4.Mission Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers 5.Product Software: Product generation software and software documentation 6.Algorithm Input: Any ancillary data or other data sets used in generation or calibration of the data or derived product; ancillary data description and documentation 7.Validation: Record and data sets 8.Software Tools: product access (reader) tools.

1. Pre-Operational Content Name DescriptionDiscriminator/ Components Priority Instrument Description Documentation of Instrument/sensor characteristics including preflight or pre-operational performance measurements. Instrument Type Remote sensors Specifications Platform geometry Spectral response Radiometric response Noise characteristics In-Situ H Calibration Data Numeric (digital) files of Instrument/sensor characteristics including preflight or pre-operational performance measurements. Instrument Type Measured spectral response data Measured radiometric data Measured ambient noise L-M

2. Earth Science Data Products (1) Content Name DescriptionDiscriminator/ Components Raw data or Level 0 data products Raw data as measured by a spaceborne, airborne or in situ instrument; Level 0 data is the reconstructed, unprocessed instrument and payload data at full resolution, with any and all communications artifacts removed. Instrument Type Observations platform ephemeris and metadata in multiple files L- M* Level 1A data products Reconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters computed and appended but not applied to Level 0 data. Instrument TypeL-H* Level 1B dataLevel 1A data that have been processed to sensor units. Instrument TypeH * Priorities depend on choice to keep L0 or L1A

2. Earth Science Data Products (2) Content Name DescriptionDiscriminator/ Components Level 2 dataDerived geophysical variables at the same resolution and location as Level 1 source data. Instrument Type Remote Sensor In-Situ H Level 3 dataVariables mapped on uniform space- time grid scales, usually with some completeness and consistency. Assimilation Type Single Level 2 Input Multi Level 2 Inputs H Level 4 dataModel output or results from analyses of lower-level data. Model Input Type Single Instrument Multiple Instrument Remote Sensor + In- Situ H MetadataInformation about data to facilitate discovery, search, access, understanding and usage associated with each of the data products. Links product to algorithm version. Product Data Level Instrument Type Assimilation Type Model Input Type H

3. Product Documentation (1) Content NameDescriptionDiscriminator/ Components Product TeamState the product team members roles, contact information and period of responsibility. Team Type Investigator Development Help Desk Operations H Product Requirements Project's requirements for each product, either explicitly or by reference to the project's requirements document, if available. Requirements Type Content Format Latency Accuracy Quality M Product Dev. History Major product development steps and milestones, with links to other relevant items that are part of the preserved provenance and context contents. Instrument Type Assimilation Type Model Input Type M

3. Product Documentation (2) Content NameDescriptionDiscriminator/ Components Processing History Documentation of Processing history and production version history, indicating which versions were used when, why different versions came about, and what the improvements were from version to version. Instrument Type Product Level H Algorithm Version History Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive. Granule level metadata should indicate which version of software was used for producing a given granule. Links Product to Algorithm Version. Instrument Type Product Level H

3. Product Documentation (3) Content Name DescriptionDiscriminator/ Components Maintenance History Excerpts and/or references to maintenance documentation deemed of value to product users. Maintenance reports. M Operations History Excerpts and/or references to operations documentation deemed of value to product users. Operations event logs. H Product Generation Algorithms Processing algorithms and their scientific and mathematical basis, including complete description of any sampling or mapping algorithm used in creation of the product - geo-location, radiometric calibration, geophysical parameters, sampling or mapping algorithms used in creation of the product, algorithm software documentation, ATBD & high-level data flow diagrams Product Level Algorithm Output Algorithm Performance Assumption Error Budget Numerical Computation Considerations H

3. Product Documentation (4) Content NameDescriptionDiscriminator/ Components Product QualityDocumentation of product quality assessment (methods used, assessment summaries for each version of the datasets) Description of embedded data at the granule level including quality flags, product data uncertainty fields, data issues logs, etc. Instrument Type Product Level Product Accuracy Sensor Effects H Quality Assessment and Potential Algorithm improvements Describe potential future enhancements to the algorithm, the limitations they will mitigate, and provide all possible and useful related information and links. Instrument TypeH ReferencesA bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set. H User feedbackInformation received back from users of the data set or product L

Product Generation Algorithms ComponentDescription Algorithm OutputDescribe the output data products - not format - at a level of detail to determine if the product meets user requirements. H Algorithm Performance Assumptions Describe all assumptions that have been made concerning the algorithm performance estimates. Note any limitations that apply to the algorithms (e.g., conditions where retrievals cannot be made or where performance may be significantly degraded. To the extent possible, the potential for degraded performance should be explored, along with mitigating strategies. H Error BudgetOrganize the various error estimates into an error budget. Error budget limitations should be explained. Describe prospects for overcoming error budget limitations with future maturation of the algorithm, test data, and error analysis methodology. H Numerical Computation Considerations Describe how the algorithm is numerically implemented, including possible issues with computationally intensive operations (e.g., large matrix inversions, truncation and rounding). M

Product Quality ComponentDescription Product AccuracyAccuracy of products, as measured by validation testing, and compared to accuracy requirements. References to relevant test reports. H Sensor EffectsFlowed-through effects of sensor noise, calibration errors, spatial and spectral errors, and/or un-modeled or neglected geophysical phenomena on the quality of products. H

4. Mission Calibration (1) Content NameDescriptionDiscriminator/ Components Instrument/Sensor Calibration during mission Instrument/sensor calibration method - Radiometric calibration; Spectral response/ calibration; Noise characteristics; Geo-location Instrument TypeH In-situ measurement environment In the case of Earth based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors which could influence the long-term record Platform Type Aircraft Balloon Station/Tow er/Buoy H Mission Platform History Instrument events and maneuvers; attitude and ephemeris; aircraft position; event logs Platform Type Satellite Aircraft H

4. Mission Calibration (2) Content NameDiscriminator/ Components Discriminator/ Components Mission Calibration data Instrument/sensor calibration data - Radiometric calibration; Spectral response/ calibration; Noise characteristics; Geo-location Instrument TypeM Calibration software Source code used in applying calibration to generate look-up tables and/or parameters needed for producing calibrated products Instrument TypeM

5. Product Software (1) Content NameDescriptionDiscriminator/ Components Product generation algorithms Source code used to generate products at all levels. Instrument Type H Output dataset description For each output data file, details on data product's structure, format/type, range of values and special error values. Include data volume and file size. All information needed to verify that the required output data is created by a run. Verify that all expected datasets are product in the expected format. H Programming & Procedural Describe any important programming and procedural aspects related to implementing the algorithm into operating code. M-HM-H Exception Handling List the complete set of expected exceptions, and describes how they are identified, trapped, and handled. H

5. Product Software (2) Content NameDescription Test Data Description Description of data sets used for software verification and validation, including unit tests and system test, either explicitly or by reference to the developer's test plans, if available. This will be updated during operations to describe test data for maintenance. L-M Unit Test PlansDescription of all test plans that were produced during development, including links or references to the artifacts. L-M Test ResultsDescription of testing and test results performed during development, either explicitly or by references to test reports. If test reports are not available to external users, provide a summary of the test results in sufficient detail to give external users a good sense of how the test results indicate that the products meet requirements. L-M

6. Algorithm Input Content NameDescription Algorithm input documentation Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product, either explicitly or by reference to appropriate documents. Information should include full description of the input data and their attributes covering all input data used by the algorithm, including primary sensor data, ancillary data, forward models (e.g. radiative transfer models, optical models, or other model that relates sensor observables to geophysical phenomena) and look-up tables. H Algorithm Input data At granule level, include information on all inputs (including ancillary or other data granules, calibration files, look-up tables etc.) that were used to generate the product. At the appropriate level (granule or dataset) include calibration parameters, precision orbit & attitude data; Climatological norms, geophysical masks or first guess fields, spectrum and transmittance information Numerical weather or climate model inputs H

7. Validation Content NameDescriptionDiscriminator/ Components Validation datasetsDescription of validation process, including identification of validation data sets; Cal/Val plans & status; detailed history of validation activities Product LevelL-H Validation recordValidation data sets along with metadata Product LevelH

8. Software Tools Content NameDescriptionDiscriminator/ Components Tools for UsersReaders and data analysis toolsProduct LevelH

Next Steps (1 of 2) Review content matrix in this meeting – consider appropriateness of level of detail Form small team (4 or 5 volunteers willing to review drafts in detail) to carry forward; incorporate results from this meeting on use cases Look at ISO19115, SensorML, Policy changes needed(?), Draft introductory text material to include – Need for content standard – ESIP Data Stewardship and Preservation Cluster – definition/mission/charter – Cluster’s motivation to promote standard – Scope of standard – Description of matrix – Request for review include how review inputs should be provided Reviewers should look at the list from the points of view providing data as well as using someone else’s data

Next Steps (2 of 2) “user test” on a small group of data providers Send matrix with introductory text for broader review – NASA missions teams – satellite and aircraft investigations, MEaSUREs teams, DAAC UWGs, USGS, NOAA Data Centers, NOAA CDR Program PI’s, EPA(?), ESA(?) Provide same material to NASA ESDSWG for consideration with its “best practices” submission Decide on IEEE or ISO route to proceed further – may need to augment small team at this point

Back-up Charts

Proposal from NASA ESDSWG Technology Infusion Working Group (TIWG) Concept of a provenance and context standard was discussed at the ESDSWG meeting in October Tentatively agreed to develop a “Best Practice” Technical Note for NASA Standards Process Group consideration – Contents to include Items from the USGCRP 1998 Workshop Report 1-2 paragraph justification and best practice for each item – Process Develop Technical Note Submit RFC to SPG SPG initial screening Public evaluation – Process expected to complete in about 1 year

IEEE Standard Development Process (~2 years)* Form nucleus of a working group (core group of people interested in developing the standard) Submit Project Authorization Request (PAR) to New Standards Committee (NesCom). – Relatively short form specifying the scope, purpose, and contact points for the new project. – NesCom meets 4 times a year, but can start the review of the par shortly after it is submitted Mobilize Working Group – Individual-based IEEE working groups (one individual-one vote) are open to anyone to participate - participants don't have to be IEEE or IEEE-SA members. – Entity-based IEEE working groups (one entity-one vote) have specific membership requirements for an entity to observe or attain membership and voting rights. There are legal issues related to patents and such that must be dealt with, but the IEEE Standards Association (SA) provides staff support to help with this. Draft the Standard – IEEE-SA staff provides many tools and direct support in drafting the standards – Involvement of the Quantities, Units and Letter Symbols Committee, and the IEEE-SA Staff Editor are mandatory. Ballot the Standard – When working group has determined that a draft is mature enough, they submit it to the IEEE sponsor (SCC40, chaired by SJS Khalsa). – 75% affirmative ballots are needed to proceed to the approval stage Approve the Standard – Final approval of an IEEE standard is achieved by submitting document and supporting material to IEEE-SA Standards Board Standards Review Committee (RevCom), which issues a recommendation to the IEEE-SA Standards Board. Maintain the Standard – A standard has a validity period of five years from the date of approval by the IEEE-SA Standards Board. It is important to note that at the end of the five year validity period, one of three things has to happen: – revision, reaffirmation, or withdrawal. – During this five year validity period, amendments and corrigenda may need to be developed that offer minor additions/revision to the standard. *Thanks to Siri Jodha Singh Khalsa

ISO Standard Development Process (~4 years)* An INCITS/L1* member or members prepares a new work item proposal (NWIP) NWIP will be voted by INCITS/L1 members for a one month voting period during which clarification questions need to be answered If NWIP is approved by INCITS/L1, it will be voted by INCITS Executive board as U.S. contribution will be sent to ISO TC211 secretariat by INCITS/L1. TC 211 will send the proposal to member countries for one month informal review If more than half of voting members countries support the project and at least 5 members agree to participate in the project by nominating experts to the project team, the project is approved Project chair, editor, and the nominated experts will form the project team to draft the working draft (WD) of the standard. There are normally multiple versions of the WD. When the project team is satisfied with the WD, the final version of WD will be sent out by TC211 to countries’ members for Committee Draft (CD) voting. If CD voting approved, the project team is dissolved and an editing committee (EC) will be formed. The project chair will be the chair of the EC, and the ISO TC 211 working group 6 convener will become the EC chair. Edited CD will be put in for vote again to advance to Draft International Standard (DIS) stage. If approved, the EC will have a final EC meeting to edit the document to DIS based on voting comments. If not approved, a new version of CD needs to be produced. After it becomes DIS, the next step will be Formal Draft International Standard (FDIS). FDIS to International standard (IS): It will automatically advance from FDIS to IS after 6 months without major objections. Only minor editorial change will be allowed from FDIS to IS. **International Committee for Information Technology Standards – Geographic Information Systems *Thanks to Liping Di

Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H.

Similar presentations

Presentation on theme: "Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H.

Similar presentations

Presentation on theme: "Emerging Provenance/Context Content Standard Discussion at Data Preservation and Stewardship Cluster Session at ESIP Federation Meeting July 14, 2011 H."— Presentation transcript:

Similar presentations

About project

Feedback