Data Flow & Data Services for MOSAiC Challenges from a large scale program Arndt Steinhage MOSAiC Implementation Workshop 13-16/11/2017 Stephan Frickenhaus, Hans Pfeiffenberger AWI Computing and Data Centre
Vision for MOSAiC Data Management Interdisciplinary Data Collection, Collective Data publications Data Policy Co-ordinated Data Flows Data Protocol, Data Management Plan Underway Data Science support/ Service, on-shore Data Scientist Taking care of the MOSAiC Data Collection
Challenges from the data point of view to harmonize Science and Infrastructure/ Logistics/ Services in interdisciplinary work Need for a common data protocol Need a committment from the groups to help organize this
The Data Protocol MOSAiC [2019/20] is an observatory producing meaningful scientific data Successful data management needs scientists‘ 100% committment The MOSAiC Data Legacy [2021] is the primary and lasting output, and the basis for science Prerequisites for high impact publications: Common meta-data and procedure standards Quality managed and published data Consortium agreements on publication strategy Open by Default; group-specific embargo timing tbd
Chapters of the Data Protocol Objective Definitions raw data, primary data, data products, … Data Policy, incl. fair publication rules Data Standards Data Management roles and responsibilities common data pool Sample Management Data Archival Amendmends and resolution of conflicts
Data Protocol Group AWI Roland Neuber (Atmos) Thomas Krumpen (Ice/Snow) Allison Fong (Eco, Bio-Sampling) Ellen Damm (BGC) Ben Rabe (Ocean) Andreas Herber (airbourne) Amelie Driemel (PANGAEA) Peter Gerchow, Angela Schäfer, Hans Pfeiffenberger, Ingo Schewe (Logistics, Data Flow/ Data Services/ Data Science) Further members tbd during break out! Gunnar Spreen (remote sensing) …
Expected Contributions from Groups Initial work Expected Contributions from Groups One Person per Group for Data Protocol Available for regular Telcos Device lists (until next spring meeting) Responsible person Sensor type, transport through POLARSTERN Satcom | access through home institution, commercial processor, … Data volumes, data frequencies, near real time needs, data processing tools, shore-to-ship needs? Embargo periods … (input from breakouts)
Multivariate Visualization as an App Data science services Managed Data Base on board for core variables as multivariate time-series Multivariate Visualization as an App Show full time series (PCA, MDS, …) Allow for clustering Enable focussing on sub-sets/ events Export selected data Allow for comparing events
Offer to publish MOSAiC data through an ESSD Special Issue Chief Eds offer support structuring issue At least one guest editor from MOSAiC, At least one guest editor from outside
O2A – Observations to Archive Data Flow Framework Arndt Steinhage
Current Use Case: FRAM Ice tethered platforms Frontiers in Arctic Marine Monitoring Seit 2014 25 M Euro Ice tethered: distributed buoys and networks Water columns: moorings, winched profiling, sampler, autonoumous under water vehicle (AUV) Deeper water column: particle camers, light frame on sight key species investigations (LOKI), acoustic recorders Ocean floor: ocean floor observing system (OFOS), benthic lander system, autonomous crawler (Tramper) radiation, snow height, depth, ice thickness, temperature, salinity, oxygen, chlorophyll a … Medieningenieure Bremen / Sabine Lüdeling
Current Use Case: FRAM Water column fluorescence, nutrients, salinity, Ice tethered: distributed buoys and networks Water columns: moorings, winched profiling, sampler, autonoumous under water vehicle (AUV) Deeper water column: particle camers, light frame on sight key species investigations (LOKI), acoustic recorders Ocean floor: ocean floor observing system (OFOS), benthic lander system, autonomous crawler (Tramper) fluorescence, nutrients, salinity, temperature, conductivity, depth, acoustic doppler current profiler, water and phytoplankton samples, … Medieningenieure Bremen / Sabine Lüdeling
Next Use Case: MOSAiC? Multidisciplinary drifting Observatory for the Study of Arctic Climate 2019 - 2020 > 60 M Euro
Data Flow Framework
Data Flow Framework
Objectives Generic infrastructure for data flows Sustainability and up-to-date services Interoperability and standards e.g. Open Geospatial Consortium Seamless integration with existing infrastructure Web GIS Web Portals Data Archive
Challenges Heterogeneity of scientific needs and workflows Number of different instruments, data sources and formats Integration with existing solutions, e.g. for the data flow, but also administrative information limited additional Effort acceptable multitude of Standards
Sensor Description Platform and device descriptions for provenance information and reduced data integration effort Versioning and citability Interoperability and standards ~1200 descriptions available and counting
Dashboard User-customizable, flexible dashboards for data monitoring Automatic data streaming of near-real time and delayed-mode data Based on sensor descriptions and configurations Since 2011 Fast growing number of values and sensors 350 M measurements 460 sensors
Dashboard Since 2011 Fast growing number of values and sensors 350 M measurements 460 sensors
Maps and Portals
Data Flow Framework
Current work for FRAM Developing a science community workspace for data sharing and data analyses State-of-the-art storage, replicated between Bremerhaven and Potsdam User-friendly “one-click” compute solutions with virtual machines and containers Hadoop big data analysis based on Hortonworks data flow and data platform Raster data management and analysis with rasdaman
Use Case: MOSAiC Multidisciplinary drifting Observatory for the Study of Arctic Climate 2019 - 2020 > 60 M Euro
O2A: MOSAiC only 2 x 100MB/day ? Polarstern Satellite Link for Data Monitoring and Remote Service Polarstern Data Storage MOSAiC Raw Data only 2 x 100MB/day ? Ship-to-shore Data Transfer Onboard Data Transfer “direct” satellite links to partner sites
Comprehensive onshore O2A: MOSAiC ftp from 3rd party’s sites raw data Aircraft data ftp from partner’s sites ftp from EO (sat.) providers Polarstern Data MOSAiC Comprehensive onshore Data Collection Primary data
With contributions from: Roland Koppe Peter Gerchow Angela Schäfer Ana Macario