Ocean data dissemination Jon Blower, University of Reading, UK Steve Hankin, Bob Keeley, Sylvie Pouliquen, Jeff de la Beaujardière, Edward Vanden Berghe, Margarita Conkright Gregg, Janet Fredericks, Derrick Snowden … and many others
Context Sylvie Pouliquen (Day 1): – Growth of online ocean databases in last 10 years Bob Keeley (Day 4): – Good data management practices Themes of this talk: – Integrating data – Exchanging data between communities Steve Hankin (Day 5): – The way forward
Drivers Modern science demands the ability to integrate different data streams We need to communicate data outside our community
Theme 1: Integrating data
The Internet Global Telecommunications System (GTS) Paper records How do we disseminate data?
Internet: where we are now (mostly)
We’re pretty good at providing data direct to scientists But to integrate data we need to be able to share data between machines
Automation, automation and automation!
Achieving automation 1: Standardize data formats Ocean modelling community has standardized around CF-NetCDF file format Obs community currently more heterogeneous But CF-NetCDF being adopted in many obs projects: – In situ, satellite, underway, radar Biology community has rather different data And ASCII file formats still have practical uses Hankin, Pouliquen CWPs
2: Improve metadata handling Everyone means different things by the word “metadata”! Aspects include: – Spatial and temporal referencing – Using standard terms for measured quantities / species – Describing measuring instruments – Describing context – Describing quality Needs to be machine-understandable If you ask for too much metadata, you may get none at all! Gregg, Fredericks, Snowden CWPs
Example: Ocean Biogeographic Information System (OBIS) 100,000 records viewed or downloaded per day 18.5 million records 633 distinct datasets 105,000 species Simple data format attracted lots of input! Species richness Potential spread of invasive Species (lionfish) Vanden Berghe CWP
3. Direct communication between machines Web Services e.g. OPeNDAP
ECCO-JPL minus World Ocean Atlas Example: intercomparison of distributed data
4. Catalogues Discovery of data is currently a problem Solution is to create machine-readable (and human- searchable) catalogues These can be aggregated
5. Access control Access control can be a barrier to automation But often exists for good reasons: – Data privacy (esp. in biological domains) – IT security – Audit trails Strong trend towards open access to data Single sign-on technologies can help But problem will never completely go away
Example: SeaDataNet Amazon-like discovery and delivery of data Integrates different data sources Harmonizes file formats and vocabularies Single sign-on Links to Ocean Data View
Theme 2: Exchanging data with other communities
Key issues Our key technologies are not widely-used in many other communities We need technologies that are common across communities Geographic Information Systems provide a promising approach – Commonly used by policymakers, decision-makers and terrestrial science groups (Side note: Many other communities do not need to see the full complexity of ocean data)
Linking ocean data with GIS Visualize ocean data using Web Map Service standard Capability now built into THREDDS data server
Ocean Data Portal Developed through International Oceanographic Data and Information Exchange programme Integrates data from National Ocean Data Centres Uses and promotes standards Reed CWP
Google Ocean: reaching the public
OpenGIS technologies key points Strong potential for integrating ocean data with other geospatial data Visual integration of ocean data is popular and powerful – Google Earth and Web-GIS have lowered the barrier to entry But integration of actual data is a key problem! – GIS concepts don’t map neatly to 4D data de la Beaujardière CWP
Combining all the above: Some large integrating efforts WMO Information System (WIS) – Combines GTS and internet-based delivery MyOcean (European Marine Core Service) – Integrated catalogues – Data delivery in CF-NetCDF via FTP and OPeNDAP – Visualization based on Web Map Services INSPIRE – European Spatial Data Infrastructure – Heavy use of GIS standards GEOSS – Global Earth Observation System of Systems
Tools: The missing link we need to forge argos = getArgos(‘Atlantic’, 2008, ‘delayed’) plot(argos) profiles = getProfiles( … )
Recommendations To enable data integration and communicate with other communities: 1. Converge on small number of file formats 2. Pursue GIS integration, but be aware of costs and benefits - recognize the limitations 3. Promote our own proven technologies in GIS community 4. Set up cross-community pilot projects 5. Invest in linking data systems with end user tools
Sylvie Pouliquen (Day 1): – Growth of online ocean databases in last 10 years Bob Keeley (Day 4): – Good data management practices Themes of this talk: – Integrating data – Exchanging data between communities Steve Hankin (Day 5): – The way forward