Open Data Cubes Cloud Services Experiences and Lessons Learned

Slides:



Advertisements
Similar presentations
Kenya Data Cube Project Plans
Advertisements

CEOS Data Cube Concept and Prototype Project Plans
CEOS System Engineering Toolset (CSET) CSET is a Software Framework + Suite of Tools (Apps) that leverages a Common Architecture, Unified Data Model, Common.
SEO Report to WGISS Brian Killough CEOS Systems Engineering Office (SEO) WGISS-39 May 11-15, 2015.
Element 2: Country-specific Space Data Services
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Brian Killough / NASA, CEOS SEO SDCG-8 Session 7, Agenda Items 26 and 27 GFOI Space Data Services SDCG-8 DLR, Bonn, Germany September 23 rd -25 th 2015.
CEOS Data Cube Open Source Software Status Brian Killough CEOS Systems Engineering Office (SEO) WGISS-40 Harwell, Oxfordshire, UK September 30, 2015 (remote.
Committee on Earth Observation Satellites Brian Killough & Kim Holloway, CEOS SEO Plenary Agenda Item # th CEOS Plenary Kyoto, Japan 6 November 2015.
GFOI Space Data Services 3-year Work Plan Brian Killough NASA LaRC, CEOS SEO Presented at SDCG-6 Oslo, Norway October 22-24, 2014.
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License Cloud Hosting Practices Lessons DuraSpace has learned Bill Branan Open Repositories.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
CEOS Data Cubes A new need for Capacity Building Brian Killough CEOS Systems Engineering Office (SEO) WGCapD-5 Meeting March 30 – April 1, 2016.
CEOS Data Cubes A briefing for the GEO Secretariat Brian Killough CEOS Systems Engineering Office (SEO) April 1, 2016.
Brian Killough / NASA, CEOS SEO SDCG-10, Session 5 Agenda Items 32 and 33 GFOI Space Data Services.
Future Data Access and Analysis Architectures – Discussion
GFOI Space Data Needs and Issues
Cloud Computing for Science
Chapter 6: Securing the Cloud
Innovations for EO Data Earth Observation Research and Innovation Centre Mark Amo-Boateng, Ph.D EORIC, Sunyani, Ghana 13th-15th June, 2017 Sunyani, Ghana.
Analysis Ready Data (ARD) SEO Status Report
SESSION 7: Business Wrap-up & Review
Land Cover Side Event: A new path forward for generating products
CEOS Data Cube Report Agenda Item #7 September 13, 2017
SEO Capacity Building Agenda #15 Brian Killough
Colombia Data Cube Brian Killough CEOS Systems Engineering Office (SEO) WGISS-43 Meeting April 4, 2017.
CEOS Data Cube (CDC) and FDA Pilot Outcomes
Data Interoperability Summary
Future Data Architectures (FDA) Pilot Project Summary
SEO Report to WGISS Brian Killough CEOS Systems Engineering Office (SEO) WGISS-42 Meeting September 19, 2016.
What are the most popular services offered by Amazon Web Services..?Amazon Web Services
Technology Exploration Cloud Hosting at USGS
Space Data Services Session 2: Space Data Country Outreach and Delivery Agenda Item #3 Brian Killough CEOS Systems Engineering Office (SEO) February 25,
AOGEOSS Task 11. Develop Regional GEOSS Data set.
Copernicus Sentinel Data Uptake and Application
SDCG Support to Colombia
Cloud Computing.
CEOS Database API Overview
Status Report on ARD Usage
Java in the cloud PaaS Platform in Comparison
Opening Remarks European Commission CEOS 2018 Chair
Land Imagery Data Architectures
Future Data Architectures Big Data Workshop – April 2018
SDCG REPORT TO SIT TW 2016 Oxford, September, 2016.
SEO Report Brian Killough NASA, CEOS Systems Engineering Office
Technology Exploration Cloud Hosting at USGS
DESIGN & IMPLEMENTATION
ARD Needs and Plans for Thematic Pilots
FDA Objectives and Implementation Planning
Open Data Cube Pilots Joint SDCG/GEOGLAM/LSI-VC Meeting
Purpose and Objectives of the FDA Big Data Workshop
Data Cubes Brian Killough, NASA, CEOS SEO
AWS Cloud Computing Masaki.
GEO-XIII Plenary St. Petersburg Russian Federation
SESSION 2: GLOBAL DATA FLOWS STUDY
GFOI Space Data Services
SEO Report to WGISS Brian Killough CEOS Systems Engineering Office (SEO) WGISS-46 Meeting October 22, 2018.
Emerging technologies-
Data Cubes and FDA Pilot Status Report
Agency Reports – USGS Jenn Lacey LSI-VC-5 Agenda Item #2 February 2018
SEO Report Brian Killough, NASA, CEOS SEO CEOS Plenary 2018
CEOS ARD Strategy LSI-VC-7 S Ward, Feb 2019.
Brian Killough (NASA, SEO), Mirko Albani (ESA, WGISS Chair)
Status Report on the Open Data Cube and the use of ARD
GEO-Amazon Cloud Credit Program and a Prototype Sentinel Data Pipeline
CEOS Systems Engineering Tools for Gap Analyses
SEO Report to WGISS-47 Brian Killough
Open Data Cube Demo and FDA Interfaces
Roadmap and short term activities on interoperability of Data Cubes
SEO Report to WGISS-48 Brian Killough
Presentation transcript:

Open Data Cubes Cloud Services Experiences and Lessons Learned Future Data Architectures Big Data Workshop April 26, 2018 Brian Killough CEOS Systems Engineering Office NASA Langley Research Center

Amazon Web Services (AWS) Since 2016, the SEO has been using annual AWS research credits (~$25,000 per year) to support global Data Cube prototypes and investigate improved approaches for cloud-based analyses. The SEO uses a combination of S3 storage and EC2 processing instances to support our online Data Cube user interface tool (tinyurl.com/datacubeui), prepare datasets (ingestion) and run application analyses. The performance has been excellent! The Data Cube initiative is interacting with over 40 countries. Many of those countries (~10) desire to use cloud-based storage and computing, to reduce operational costs and enhance analysis performance. Lessons Learned ... Storage is cheap (~$270/TB/year), egress (in/out) can be costly for large volumes and frequent moves ... processing and analysis is the primary cost.

Africa Regional Data Cube Ghana, Kenya, Senegal, Sierra Leone, Tanzania

Africa Regional Data Cube Operational Model Strathmore University (Nairobi, Kenya) Amazon Web Services (AWS) Cloud S3 Data Storage (13 to 23 TB) EC2 Computing * Data Cube management, new data ingestion, web-based user interface * Managed by Strathmore Landsat 7/8 Sentinel-1 Sentinel-2 (year 2) Kenya Sierra Leone Senegal Ghana Tanzania Each country has its own EC2 Computing “instance” for analysis purposes (User Interface and Jupyter Notebooks), but S3 data storage is shared among countries in the AWS cloud

AWS Plans for 2018 Develop a "data cube on demand" function using hosted AWS datasets Test the use of on-demand "spot" processing and analysis Test the use of Lambda functions for finding new datasets to ingest into data cubes and running rapid queries Test how EC2 instance performance scales with multiple data cube users Test elastic load balancing for horizontal scaling of EC2 instances Test AWS "Workspaces" to host QGIS and Jupyter Notebooks Test the use of "docker containers" for on-demand computing instances Explore the use of QGIS to read data cube content directly from S3

Google Earth Engine (GEE) The Google Earth Engine system is similar to a Data Cube and allows users to interact with satellite data without data downloads The Data Cube team has been working with the GEE team to investigate approaches to use their satellite datasets and take advantage of the large community of user algorithms Google sees the Open Data Cube as the “ground based” implementation of their cloud-based solution since there is no good method to supply GEE data to users that desire local deployments. One example ... We have developed a process to generate a Sentinel-1 data cube “on demand” using the GEE level-2 intensity data (gamma nought) for any location in the world. GEE is the only known source of level-2 global pre-processed S1 data. Data Cubes can be created at a country size scale within 1 day. This has been demonstrated for Colombia, Switzerland, and Samoa Island.

Why not use GEE versus the Open Data Cube? PROS Free, open, global datasets (e.g. Landsat, MODIS, Sentinel-1, Sentinel-2) Powerful analysis tools using Javascript and Python CONS Commercial dependency ... Will it be sustained for the future? Limited time and spatial scales for analyses Cloud-based computing only ... No options for local computing solutions. Users go to the data for analyses and downloads are not encouraged. Google “owns” all of the data. Some users prefer control and ownership or source data and results. Missing datasets .. You get what Google offers. Datasets such as ALOS, CBERS, and SPOT are not included. Difficulty using multiple interoperable datasets (space or ground), as they may not be available or not prepared in common grid formats. Does not allow commericial or private use

Conclusions and thoughts ... There is a significant global trend toward the use of cloud-based storage and processing. Due to large data volumes, we need to find ways to avoid downloads and perform analyses near the data. Cloud-based solutions can be significantly cheaper and demonstrate better performance over local computing systems (but not always). We are currently using AWS and GEE for our cloud service testing, but we are open to other options, such as DIAS. The Open Data Cube can be deployed on any cloud platform. We DO NOT want to be “locked in” to specific cloud system vendors. We need to find solutions that produce ARD in the cloud and produce data cubes in the cloud. Egress of large data volumes is costly and inefficient. We are all comfortable with cloud computing, but much of the world still wants the data locally.