NASA’s EOSDIS – Long Term Archive Infrastructure and Processes

Slides:



Advertisements
Similar presentations
Product Quality and Documentation – Recent Developments H. K. Ramapriyan Assistant Project Manager ESDIS Project, Code 423, NASA GFSC
Advertisements

NASA Earth Science Data Preservation Content Specification H. K. (Rama) Ramapriyan John Moses 10 th ESDSWG Meeting – November 2, 2011 Newport News, VA.
Provenance and Context Content Standard (Emerging) – Status of Activities H. K. Ramapriyan Assistant Project Manager ESDIS Project, Code 423, NASA GFSC.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
1 Workshop on Metadata Interoperability for Electronic Records Management November 15, 2001 Archives II, College Park, MD.
Segment Two: Business Requirements Drive the Technical Updates January 26-27, 2012 Idaho ICD-10 Site Visit Training segments to assist the State of Idaho.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
MIT Libraries’ FileMaker Use Policy as an example local DLC policy.
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.
Records Survey and Retention Schedule Recertification 2011.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Archiving 40+ years of Planetary Mission Data - Lessons Learned and Recommendations K. E. Simmons LASP, University of Colorado, Boulder, CO
Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
AMSR-E SIPS Processing Status Presented by Helen Conover Information Technology and Systems Center at the University of Alabama in Huntsville AMSR-E Joint.
Chapter 1 1 Lecture # 1 & 2 Chapter # 1 Databases and Database Users Muhammad Emran Database Systems.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
NASA Earth Science Data and Information System (ESDIS) Project Data Preservation Activities – Update Andrew Mitchell (NASA Goddard Space Flight Center)
2015 GLM Annual Science Team Meeting: Cal/Val Tools Developers Forum 9-11 September, 2015 DATA MANAGEMENT For GLM Cal/Val Activities Helen Conover Information.
NASA Earth Science Data and Information System (ESDIS) Project Preservation Activities – Software & Documentation H. K. “Rama” Ramapriyan Science Systems.
NASA Perspectives on Data Quality July Overall Goal To answer the common user question, “Which product is better for me?”
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
1 U.S. Department of the Interior U.S. Geological Survey LP DAAC Stacie Doman Bennett, LP DAAC Scientist Dave Meyer, LP DAAC Project Scientist.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.
WGISS and GEO Activities Kathy Fontaine NASA March 13, 2007 eGY Boulder, CO.
Science Data in the Science Mission Directorate (SMD) Jeffrey J.E. Hayes Program Executive for MO & DA, Heliophysics Division August 17, 2011.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group WGISS-40 Meeting Preservation of SW & Documents at CEOS Agencies Approaches and Lessons.
ECS Metadata Considerations for Preservation SiriJodha S. Khalsa National Snow and Ice Data Center.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
1 SUZAKU HUG 12-13April, 2006 Suzaku archive Lorella Angelini/HEASARC.
1 U.S. Department of the Interior U.S. Geological Survey LP DAAC Stacie Doman Bennett, LP DAAC Scientist.
ESO and the CMR Life Cycle Process Winter ESIP, Jan 2015 ESDIS Standards Office (ESO) Yonsook Enloe Allan Doyle Helen Conover.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
International Planetary Data Alliance Registry Project Update September 16, 2011.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
2013 GSICS Joint Meeting, Williamsburg VA, USA, March GSICS Collaboration Servers Status: 2013 Peter Miu (EUMETSAT) CMA, CNES, EUMETSAT, ISRO,
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
Marianne König, Tim Hewison, Peter Miu
NASA HDF and HDF-EOS Status Use in EOSDIS
NASA Earth Science Data Stewardship
Synthetic Data and Data Formats for the GPM GMI Radiometer
SRNWP Interoperability Workshop
NSIDC DAAC Accessioning and “De-commissioning” Plans
Persistent Identifiers Implementation in EOSDIS
EOSDIS Data Preservation Archive (EDPA)
Active Data Management in Space 20m DG
API Documentation Guidelines
SQL Server BI on Windows Azure Virtual Machines
Storage & Digital Asset Management CIO Council Update
Data Stewardship Interest Group WGISS-45 Meeting
Presented to the CEOS WGISS October 22, 2018
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
How to Implement an Institutional Repository: Part II
Presented to the CEOS WGISS October 10, 2019
Presentation transcript:

NASA’s EOSDIS – Long Term Archive Infrastructure and Processes Presented to the CEOS WGISS September 28, 2017 Dawn R. Lowe Hampapuram (Rama) Ramapriyan Chris Lynnes ESDIS Project, NASA/Goddard Space Flight Center

Topics NASA Earth Science Data Archives - overview Archive Infrastructure and processes NASA Data Stewardship Approach to Data Preservation & Access EOSDIS Content Preservation Specification

Typical EOSDIS Archive Hardware Architecture Blade Server Blade Server Controller PostgreSQL Server Metadata Controller Blade Server Disk Storage Area Network Tape Storage Area Network Physical Storage Array Physical Storage Array Backup Storage Small File Backup This shows a typical EOSDIS archive hardware architecture currently hosted on on-premise. File systems are often separated according to the purpose and size of a file. Small files are separated from large data files for performance reasons, and backed up on disk instead of tape. Automated tape libraries are used for backing up large data files. Disk systems are on one Storage Area Network, with Tape Libraries separated on the Tape SAN. A PostGRES database server maintains information about the data entities stored on the system. The metadata controllers maintain low-level (filesystem) information about the files stored in each SAN. Note that this shows only the hardware connectivity, not the software interfaces. Logical Storage Data Pool File Systems Small File Archive Browse Archive Archive Cache Automated Tape Library science files metadata staging

Data Stewardship Ensure that the data and information content are reliable, of high quality, easily accessible, and usable for as long as they are considered to be of value. Involves communications with data producing projects throughout their lifecycles Essential to plan for long term preservation

Preservation Implies… Understandability Bits Discoverability & Accessibility PRESERVE Usability Readability Reproducibility of Results

Preserving Bits Checksums while transferring between subsystems Regular media migration Raw (Level 0) data from satellites held at back-up archive physically distant from DAACs Product generation software held at the DAACs and Science Investigator-led Processing Systems Guards against catastrophic loss Raw data and higher level products are backed-up at the DAACs as well – for efficiency Periodic assessment of risk of data loss and impact on users given the back-up approach being used at the DAACs for different datasets and the expected time for recovery in case of loss of the primary copy

Risk Assessment Code Matrices Study of Data Loss Risk/User Impact conducted in 2012 Periodic updates are made by DAACs Each DAAC assesses its holdings and enters counts that map to Data Loss Risk, User Impact Example of a RAC Matrix shown below Data Loss Risk 5 48 28 4 7 3 463 381 2 300 352 1506 2814 1 719 225 193 860 User Impact 

DAAC Hardware: Technology Refresh Refresh every 3-7 years, with a “rolling wave” process, coordinated with capacity increases Refresh Process: Develop specifications based on projected data volumes, performance requirements, etc. Procure hardware (quotes, purchase orders) Take delivery (receive, tag, inventory) Install hardware* Install and configure system* Decommission and excess old hardware *Requires manufacturer or vendor training if refreshing with a new brand of hardware

Discoverability, Accessibility, Readability Standard metadata are critical for discoverability of data Processing software automatically generates metadata at individual file level Metadata repository is constantly populated Common Metadata Repository (CMR) Unites collection level and file level metadata Provides a source of unified, high-quality and reliable Earth Science metadata across NASA’s Earth science data holdings Standard Data Formats, e.g. HDF, NetCDF HDF is a self-documenting formatting system Flexible structure for data producer to define “profile” HDF library facilitates writing and reading Need to maintain library for future users

Standards-based Packaging of Earth Observation Data Dataset Interoperability Working Group Recommendations EOSDIS Community Best Practices Climate-Forecast Conventions (CF) Recognizable Coordinates network Common Data Form (netCDF) Common Data Model Robust Tool Support Hierarchical Data Format (HDF) Self-contained Metadata

Dataset Interoperability Working Group (DIWG) Recommendations Detailed guidance on use of CF and related conventions Includes rationale for each recommendation Examples: “We recommend that packing attributes (i.e., scale_factor and add_offset) be employed only when data are packed as integers.” We recommend that datasets in grid structures include a Time dimension, even if Time is degenerate (i.e., includes only onevalue) for the cases when the entire grid has one time range or time stamp https://earthdata.nasa.gov/standards/dataset-interoperability-recommendations-for-earth-science

HDF4 Maps for Long Term Preservation Problem: Data Formats that are accessed only via Application Program Interface (API) are vulnerable to eventual deprecation and de-support of the API. Solution: Use the API to generate an external XML file that “maps” out the byte layout of the file; Test by writing read programs without using the API XML Map File HDF Data File

Understandability, Usability & Reproducibility Algorithm Theoretical Basis Documents (ATBDs) Product information pages, guides, answers to frequently asked questions (FAQs), forums Usability Information on fitness for purpose Accuracy assessments, validation and data quality documentation Reproducibility Source code and/or software specification documents Versions of datasets or the means of regenerating them when they result in peer-reviewed publications

Preservation Content Specification (PCS) Covers 8 categories of content plus a checklist: Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets Science Data Products: Raw instrument data, Level 0 through Level 4 data products and associated metadata Science Data Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information Mission Data Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers

Preservation Content Categories (cont.) 5. Science Data Product Software: Product generation software and software documentation 6. Science Data Product Algorithm Input: Any ancillary data or other data sets used in generation or calibration of the data or derived product; ancillary data description and documentation 7. Science Data Product Validation: Records, publications and data sets 8. Science Data Software Tools: product access (reader) tools. Checklist: “metadata” about the above 8 categories showing how and where items in each category are preserved https://earthdata.nasa.gov/standards/preservation-content-spec

Sources of Content Calibration Team Mission logs Instrument Teams / PI’s Instrument Developer/ Manufacturer Data gathering project (e.g., flight project) Product Generation Support Teams (SIPSs) DAACs Calibration Team Mission Operations Validation Team Preflight/ Pre-Operations Science Data Products Science Data Product Documentation Mission Data Calibration Science Data Product Software Science Data Product Algorithm Input Science Data Product Validation Science Data Software Tools Ancillary data sources (e.g., NOAA) Level 0 Data Mission logs

Use of PCS in NASA to-date DAACs work with Instrument Teams, with higher priority for instruments at or near end-of-life, using PCS as checklist: UARS (Sept. 1991) Earth Probe/TOMS (July 1996) AMSR-E (EOS Aqua – May 2002) ICESat-1 (Jan. 2003) HIRDLS (EOS Aura – July 2004) LIS (TRMM – Nov. 1997) Artifacts called for in PCS have been gathered for several of the above, organized by categories and archived - e.g., see http://disc.sci.gsfc.nasa.gov/Aura/additional/documentation/hirdls-preservation-documents)

Standards NASA would like to see a broad international standard identifying preservation content: NASA’s PCS is a good starting point as are CEOS Long Term Data Preservation documents ISO 19165 - “Geographic Information - Preservation of digital data and metadata” – Final edits completed in November 2016 Ramapriyan participated on ISO 19165 Project Team as a U.S. expert ISO has approved a New Work Item Proposal for developing 19165-2, a standard specific to Earth Observation data

Back-Up

Backup Strategy vs Data Loss RAC Data Loss Risk Assessment Code Yes 2 3 4 No 5   Offsite Campus Onsite

Restoration vs User Access RAC   User Access Risk Assessment Code Restore User Access Alternate Site none 3 4 5 <1 month 2 <2 weeks 1 <1 week <2 days <3 months >3 months Time to Restore from Backup at your Primary site