Presentation is loading. Please wait.

Presentation is loading. Please wait.

NASA’s EOSDIS – Long Term Archive Infrastructure and Processes

Similar presentations


Presentation on theme: "NASA’s EOSDIS – Long Term Archive Infrastructure and Processes"— Presentation transcript:

1 NASA’s EOSDIS – Long Term Archive Infrastructure and Processes
Presented to the CEOS WGISS September 28, 2017 Dawn R. Lowe Hampapuram (Rama) Ramapriyan Chris Lynnes ESDIS Project, NASA/Goddard Space Flight Center

2 Topics NASA Earth Science Data Archives - overview
Archive Infrastructure and processes NASA Data Stewardship Approach to Data Preservation & Access EOSDIS Content Preservation Specification

3 Typical EOSDIS Archive Hardware Architecture
Blade Server Blade Server Controller PostgreSQL Server Metadata Controller Blade Server Disk Storage Area Network Tape Storage Area Network Physical Storage Array Physical Storage Array Backup Storage Small File Backup This shows a typical EOSDIS archive hardware architecture currently hosted on on-premise. File systems are often separated according to the purpose and size of a file. Small files are separated from large data files for performance reasons, and backed up on disk instead of tape. Automated tape libraries are used for backing up large data files. Disk systems are on one Storage Area Network, with Tape Libraries separated on the Tape SAN. A PostGRES database server maintains information about the data entities stored on the system. The metadata controllers maintain low-level (filesystem) information about the files stored in each SAN. Note that this shows only the hardware connectivity, not the software interfaces. Logical Storage Data Pool File Systems Small File Archive Browse Archive Archive Cache Automated Tape Library science files metadata staging

4 Data Stewardship Ensure that the data and information content are reliable, of high quality, easily accessible, and usable for as long as they are considered to be of value. Involves communications with data producing projects throughout their lifecycles Essential to plan for long term preservation

5 Preservation Implies…
Understandability Bits Discoverability & Accessibility PRESERVE Usability Readability Reproducibility of Results

6 Preserving Bits Checksums while transferring between subsystems
Regular media migration Raw (Level 0) data from satellites held at back-up archive physically distant from DAACs Product generation software held at the DAACs and Science Investigator-led Processing Systems Guards against catastrophic loss Raw data and higher level products are backed-up at the DAACs as well – for efficiency Periodic assessment of risk of data loss and impact on users given the back-up approach being used at the DAACs for different datasets and the expected time for recovery in case of loss of the primary copy

7 Risk Assessment Code Matrices
Study of Data Loss Risk/User Impact conducted in 2012 Periodic updates are made by DAACs Each DAAC assesses its holdings and enters counts that map to Data Loss Risk, User Impact Example of a RAC Matrix shown below Data Loss Risk 5 48 28 4 7 3 463 381 2 300 352 1506 2814 1 719 225 193 860 User Impact 

8 DAAC Hardware: Technology Refresh
Refresh every 3-7 years, with a “rolling wave” process, coordinated with capacity increases Refresh Process: Develop specifications based on projected data volumes, performance requirements, etc. Procure hardware (quotes, purchase orders) Take delivery (receive, tag, inventory) Install hardware* Install and configure system* Decommission and excess old hardware *Requires manufacturer or vendor training if refreshing with a new brand of hardware

9 Discoverability, Accessibility, Readability
Standard metadata are critical for discoverability of data Processing software automatically generates metadata at individual file level Metadata repository is constantly populated Common Metadata Repository (CMR) Unites collection level and file level metadata Provides a source of unified, high-quality and reliable Earth Science metadata across NASA’s Earth science data holdings Standard Data Formats, e.g. HDF, NetCDF HDF is a self-documenting formatting system Flexible structure for data producer to define “profile” HDF library facilitates writing and reading Need to maintain library for future users

10 Standards-based Packaging of Earth Observation Data
Dataset Interoperability Working Group Recommendations EOSDIS Community Best Practices Climate-Forecast Conventions (CF) Recognizable Coordinates network Common Data Form (netCDF) Common Data Model Robust Tool Support Hierarchical Data Format (HDF) Self-contained Metadata

11 Dataset Interoperability Working Group (DIWG) Recommendations
Detailed guidance on use of CF and related conventions Includes rationale for each recommendation Examples: “We recommend that packing attributes (i.e., scale_factor and add_offset) be employed only when data are packed as integers.” We recommend that datasets in grid structures include a Time dimension, even if Time is degenerate (i.e., includes only onevalue) for the cases when the entire grid has one time range or time stamp

12 HDF4 Maps for Long Term Preservation
Problem: Data Formats that are accessed only via Application Program Interface (API) are vulnerable to eventual deprecation and de-support of the API. Solution: Use the API to generate an external XML file that “maps” out the byte layout of the file; Test by writing read programs without using the API XML Map File HDF Data File

13 Understandability, Usability & Reproducibility
Algorithm Theoretical Basis Documents (ATBDs) Product information pages, guides, answers to frequently asked questions (FAQs), forums Usability Information on fitness for purpose Accuracy assessments, validation and data quality documentation Reproducibility Source code and/or software specification documents Versions of datasets or the means of regenerating them when they result in peer-reviewed publications

14 Preservation Content Specification (PCS)
Covers 8 categories of content plus a checklist: Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets Science Data Products: Raw instrument data, Level 0 through Level 4 data products and associated metadata Science Data Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information Mission Data Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers

15 Preservation Content Categories (cont.)
5. Science Data Product Software: Product generation software and software documentation 6. Science Data Product Algorithm Input: Any ancillary data or other data sets used in generation or calibration of the data or derived product; ancillary data description and documentation 7. Science Data Product Validation: Records, publications and data sets 8. Science Data Software Tools: product access (reader) tools. Checklist: “metadata” about the above 8 categories showing how and where items in each category are preserved

16 Sources of Content Calibration Team Mission logs
Instrument Teams / PI’s Instrument Developer/ Manufacturer Data gathering project (e.g., flight project) Product Generation Support Teams (SIPSs) DAACs Calibration Team Mission Operations Validation Team Preflight/ Pre-Operations Science Data Products Science Data Product Documentation Mission Data Calibration Science Data Product Software Science Data Product Algorithm Input Science Data Product Validation Science Data Software Tools Ancillary data sources (e.g., NOAA) Level 0 Data Mission logs

17 Use of PCS in NASA to-date
DAACs work with Instrument Teams, with higher priority for instruments at or near end-of-life, using PCS as checklist: UARS (Sept. 1991) Earth Probe/TOMS (July 1996) AMSR-E (EOS Aqua – May 2002) ICESat-1 (Jan. 2003) HIRDLS (EOS Aura – July 2004) LIS (TRMM – Nov. 1997) Artifacts called for in PCS have been gathered for several of the above, organized by categories and archived - e.g., see

18 Standards NASA would like to see a broad international standard identifying preservation content: NASA’s PCS is a good starting point as are CEOS Long Term Data Preservation documents ISO “Geographic Information - Preservation of digital data and metadata” – Final edits completed in November 2016 Ramapriyan participated on ISO Project Team as a U.S. expert ISO has approved a New Work Item Proposal for developing , a standard specific to Earth Observation data

19 Back-Up

20 Backup Strategy vs Data Loss RAC
Data Loss Risk Assessment Code Yes 2 3 4 No 5 Offsite Campus Onsite

21 Restoration vs User Access RAC
User Access Risk Assessment Code Restore User Access Alternate Site none 3 4 5 <1 month 2 <2 weeks 1 <1 week <2 days <3 months >3 months Time to Restore from Backup at your Primary site


Download ppt "NASA’s EOSDIS – Long Term Archive Infrastructure and Processes"

Similar presentations


Ads by Google