Catalog-driven workflows using CSW Rich Signell, USGS, Woods Hole, MA, USA Filipe Fernandes, SECOORA, Brazil Kyle Wilcox, Axiom Data Science, Wickford, RI, USA ESIP Winter Meeting, Washington, DC Rich Signell, USGS, Woods Hole, MA, USA Filipe Fernandes, SECOORA, Brazil Kyle Wilcox, Axiom Data Science, Wickford, RI, USA ESIP Winter Meeting, Washington, DC
The 4 th Network Layer: Data “We need an end-to-end, layer-by-layer, designed information technology … that are composed of no more than a stack of protocols” “We need open standards… and above all, we need to teach scientists to work in this new layer of data” “We need an end-to-end, layer-by-layer, designed information technology … that are composed of no more than a stack of protocols” “We need open standards… and above all, we need to teach scientists to work in this new layer of data” 2 From the essay: “I have seen the Paradigm Shift, and It Is Us”, byJohn Wilbanks, in the book “The Fourth Paradigm” Data Web TCP/IP Ethernet
US Integrated Ocean Observing System (IOOS ® ) Global ComponentGlobal Component Coastal ComponentCoastal Component 17 Federal Agencies 11 Regional Associations
IOOS Core Principles Adopt open standards & practices Avoid customer-specific stovepipes Standardized access services implemented at data providers Adopt open standards & practices Avoid customer-specific stovepipes Standardized access services implemented at data providers 4 Customer Web access service DataProvider Observations Models
Numerical model Output
Time Series, Trajectories Meteorology and Wave Buoy in the Gulf of Maine. Image courtesy of NOAA. Ocean Glider. Photo by Dave Fratantoni, Woods Hole Oceanographic Institution
IOOS Data Infrastructure Diagram ROMS ADCIRC HYCOM SELFE NCOM NcML Common Data Model OPeNDAP NetCDF Subset THREDDS Data Server Standardized (CF-1.6, SGRID-0.1, UGRID-0.9) Virtual Datasets Nonstandard Model Output Data Files Web Services Matlab Panoply IDV Clients NetCDF -Java Library or Broker WMS ncISO ArcGIS NetCDF4 -Python FVCOM Python EDC NetCDF-Java SOS Geoportal Server GeoNetwork CKAN Observed data (buoy, gauge, ADCP, glider) Web Portals pycsw NcML Grid TimeSeries Profile Trajectory TimeSeriesProfile Sgrid Ugrid Nonstandard Data Files Catalog Services Rectilinear ERDDAP WCS
Catalog Search 8
Interoperable Access in Python (Iris)
IOOS System Test
2015 Boston Light Swim 2015 Aug 15, 7:00 am start 8 mile swim No wet suit How cold will the water be?
NECOFS Massbay Forecast
Reproducible Jupyter Notebook Go to click on “launch binder” to run on cloudhttps://github.com/ocefpaf/boston_light_swim
Final Result
18
19
pycsw 20
Workflow for the USGS CMG Portal 21
Workflow (3/3) Axiom Data Science –Runs a CSW search (in a cron job) on the modeling groups pycsw services, filtering on datasets that contain a project called “CMG_Portal” –Datasets that have valid WMS services are added to the portal See for details of the workflow Axiom Data Science –Runs a CSW search (in a cron job) on the modeling groups pycsw services, filtering on datasets that contain a project called “CMG_Portal” –Datasets that have valid WMS services are added to the portal See for details of the workflow 22
23
WMS-driven Model Viewing Portal
25
Interoperable access in Matlab (nctoolbox)
27
28
Catalog-driven dynamic portals 29
30
Benefits of catalog-driven applications Dynamically adapt to new or changing data Find the machine-to-machine issues –Easy problems that can be fixed in minutes to day –Harder problems to guide future work Fixes for your workflow benefit everyone Build success stories Create reproducible workflows that others can learn from, expand on, or transform Standardized workflows help develop the 4 th network layer for data Dynamically adapt to new or changing data Find the machine-to-machine issues –Easy problems that can be fixed in minutes to day –Harder problems to guide future work Fixes for your workflow benefit everyone Build success stories Create reproducible workflows that others can learn from, expand on, or transform Standardized workflows help develop the 4 th network layer for data