Evolving the Management and Dissemination of NASA Earth Observation Data in a Big Data World Jeff Walter 1, Mark McInerney 2 (1)NASA Langley Research Center; (2) NASA Goddard Space Flight Center OGC Location Powers Workshop – Orlando, FL September 20, 2016
Outline Brief Intro to NASA’s EOSDIS Interoperability and the White House Big Earth Data Initiative NASA Earth Science Data Systems Cloud Activities 2
Earth Observing System Data and Information (EOSDIS) 3 A multi-petabyte archive of environmental data that supports Earth science research and applications EOSDIS provides for – Data ingest – Data processing – Data distribution – Data access – Metadata management – Archive management – Data stewardship EOSDIS data collections are diverse – Primary sources are NASA spacecraft – Airborne, in-situ, ancillary, and socio-economic data – Data from international partners – Comprehensive approach to multi-discipline science MODIS Aqua image showing Typhoon Hagupit approaching the Philippines, December 6, 2014.
4
Missions and Measurements 5
Interoperability 6 A complex and multi-faceted issue – User workflows involve elements of data discovery, access, and use – Interoperability challenges touch each one of these areas NASA continues to make improvements in these areas – Standardized data and metadata formats – Digital Object Identifiers (DOIs) – Common Metadata Repository (CMR) Based on a modular Unified Metadata Model (UMM) Can provide metadata in multiple standard formats including ISO and others – Earthdata Search Client – Global Image Browse Service (GIBS) Centralized full resolution tiled image service WMTS
Interoperability 7 But challenges remain – Standardized data and metadata formats – Inherent mission, instrument, and measurement heterogeneity – Discipline-oriented data centers that have evolved to serve specific communities – Uneven instantiation of web services across the system as a whole – Some inconsistencies or incompleteness in metadata and documentation
How we evolve to meet requirements for the future Present NASA’s EOSDIS as an interoperable system of systems where users can select, view, interact and access the data they need transparently from all subsystems in support of interdisciplinary Earth Science research. Supplement current data system capabilities with new interoperable technologies to create a foundation for future evolution. – Support technology infusion of tools developed by internal programs and by industry – Adopt common framework for Earth Science; e.g. Big Earth Data Initiative (BEDI) Continuously Improve infrastructure, data access, and processes TodayTomorrow Working Towards 8
Big Earth Data Initiative 9 The White House Office of Science and Technology Policy (OSTP) is also focused on earth observation interoperability – U.S. federal government is the largest holder of civil earth observation data in the world – Data is still sometimes difficult for the non-expert user to discover, access, and use OSTP proposed the Big Earth Data Initiative (BEDI) focused on improving – Interoperability of earth observation data and systems between US federal agencies – Discoverability, accessibility, and usability of earth observation data – Earth observation data management practices The U.S. Group on Earth Observations (USGEO) tasked with interagency coordination and oversight of BEDI – Primarily the USGEO Data Management Working Group (DMWG) – Developed the Common Framework for Earth Observation Data NASA, NOAA, and USGS received funding to implement the BEDI objectives
NASA’s BEDI Strategy 10 Focus on the “enabling” pieces rather than end-user applications or creating new data products. Drive things toward open, community-driven standards for data formats, interfaces, and protocols as the key to interoperability both within EOSDIS as well as with other U.S. Federal agencies. To the maximum practicable extent, design and execute BEDI-related work activities so that the output and products of those activities are beneficial and useful to other U.S. Federal agencies as well as to NASA. Leverage current plans and priorities to accelerate things we were already doing, planning to do, or wanted to do.
NASA’s BEDI Implementation 11 Catalog and Data Discovery Improvements – Metadata Guidance and Recommendations – Digital Object Identifier Best Practices and Recommendations – Data Discovery via Commercial Search Engines Web Services for Direct Data Access – OPeNDAP (Hyrax) performance and functionality improvements – GIBS – OGC Testbed Support – Special Projects GDAL enhancements AppEARS Implementation of Metadata and Data Services Improvements at the DAACs – Metadata current and consistent – DOIs assigned and registered – Data available via OPeNDAP and/or some other standards-based API or web service (where applicable) – Imagery available in GIBS (where applicable)
This sounds great, but… Can we do more to… – Change the search/order/download paradigm? – Make it so users can more easily pull only their map/graph/analytical results as opposed to the data itself? – Create a stack of incrementally more abstract services on the data (based on common interoperable standards) to facilitate both rapid application development as well as various “points of entry” that are appropriate to a data users expertise and needs? 12
Can Cloud Technology Help? Potential cost improvements – How do we realize these goals in a flat funding environment? Potential functional improvements – Centralization Reduced hardware/facility footprint and associated overhead – Elasticity of resources – Containerization/virtualization to facilitate deployment – Facilitate large scale analytics and interdisciplinary earth science 13
NASA’s Challenges with Cloud How do we rethink our business and operating model to take a more “cloud native” approach? How do we architect the enterprise to avoid the risk of vendor lock-in? How do we go about evaluating whether there are any functional and/or cost benefits to migrating all or some of EOSDIS functions/data to a cloud environment? If so, how do we evolve the architecture while not disrupting service to our current users? 14
Cloud Prototypes 15 Archive Mgmt Analytics Support App Hosting Key Better Inter- usability Better Science Better RMA
Archive & Distribution 1.ASF Web Object Storage Edge Server (ASF WOS) 2.GIBS Ingest in the Cloud (GITC) 3.Cumulus Ingest, Archive and Management 4.NISAR Preparatory Prototype (NPP) 5.OPeNDAP / HDF in the Cloud
Analytics Prototypes Cloud Analysis Toolkit to Enable Earth Science – Python / Jupyter software to help users learn to use data and cloud computing NEXUS – Spark-based system to analyze EO data
Application Hosting Prototypes 1.Earthdata Search Client 2.Common Metadata Repository 3.Earthdata Code Collaborative 4.NASA-compliant General Application Platform
ExCEL Project Management Plan: Goals EOSDIS Cloud Evolution (ExCEL) Project 19 Primary Project Goals “Evaluate” commercial cloud native technologies for core EOSDIS capabilities centered on Data ingest, processing, archive, management, distribution, performance IT Security Cost and business For long-term consideration of EOSDIS in the cloud
EOSDIS Cloud Evolution (ExCEL) Project 20 ExCEL Project Management Plan: Success (01) Full Scale Deployment (?) Full scale enterprise deployment of EOSDIS services and infrastructure to the cloud (02) Partial Deployment (?) Select deployment of EOSDIS services and/or infrastructure to the cloud (03) Cloud Stand-down (?) No EOSDIS services or infrastructure operationally migrated to the cloud (04) Decision Point (?) More prototyping required, or cloud hybrid, or other next steps based on ExCEL prototyping and business analysis results Determining Project Success Project success is determined by viable outcomes of fully completed project prototypes and business analysis. - or - Technical and business results of the ExCEL project needed for stretegic decision on EOSDIS and the cloud.
Summary NASA is working on multiple fronts to evolve EOSDIS to improve interoperability and services Supporting the White House OSTP Big Earth Data Initiative (BEDI) to realize a greater return on investment of the US federal government’s Earth observation capability and data portfolio Moving aggressively to evaluate the potential of cloud technology – Improve performance and functionality in all areas of the system – Reduce costs – Facilitate large-scale data analysis – Enable the instantiation of new and varied services 21