A Fleet-wide Approach to Optimizing Data Quality Vicki Ferrini, Suzanne O’Hara (LDEO) Paul Johnson, Kevin Jerram (UNH)
MBES Data Workflow Acquisition Analysis & Interpretation Products
Increasing Emphasis on Open Data Access Acquisition Costs Spatial & Temporal Change Scientific Reproducibility Federal Data Policy Compliance Data Syntheses & Big Data Enable New Analyses
*NSF-funded cruises can have 2-year proprietary restriction
A lot of this high-value data is acquired opportunistically!
How can we cost-effectively optimize data quality? The Economist, 2010
GMRT 1992 R2R 2009 MAC 2011 Multibeam Sonar Data Continuum GOAL: Well-documented high-quality publicly available data
Multibeam Advisory Committee Optimize data quality at acquisition – Encourage opportunistic data acquisition Consolidate Tools, Resources & Expertise US Academic Research Fleet Technical Teams / Ship Visits Data Resources – BIST Database – Reference Surfaces – Patch Test Locations Help Desk More details tomorrow…
MAC-Supported Ships +2 new ships coming online 2016
GMRT 1992 R2R 2009 MAC 2011 Multibeam Sonar Data Continuum GOAL: Well-documented high-quality publicly available data
Data Stewardship of Underway Data Unprocessed data from permanent sensors Cruise Catalog Cruise and data set metadata Optimize delivery to National Data Centers Programmatic Quality Assessment Rolling Deck to Repository (R2R)
Identify potential problems in data No judgment on scientific utility Provide Feedback Vessel Operators – address problems Down-stream data users (scientists/engineers) – facilitate data use/re-use Enable evaluation of fleet-wide system performance over time R2R Quality Assessment: Goals
R2R: MB Quality Assessment Lead: S. O’Hara (LDEO) Leverage open source (MB System) Programmatically introspect data files Fully document tests, results, and ranking criteria/thresholds in I/O XML Customizable test thresholds Output includes QA Test Results, Ranking (R,Y,G) and other relevant info
R2R: MB Quality Assessment Lead: S. O’Hara (LDEO)
GMRT 1992 R2R 2009 MAC 2011 Multibeam Sonar Data Continuum GOAL: Well-documented high-quality publicly available data
GMRT Synthesis: Overview Open-access bathymetry product Support specialists & non-specialists Multi-resolutional synthesis – GEBCO + MBES + land + grids – Full-native resolution of MBES (100m+) Tiled Global Compilation – Images, grids, mask – Mercator, South Polar, North Polar – 2 scheduled releases / year (~80 cruises) Attribution to data contributors Access to source data
GMRT – Open Data Access Java Applications (GeoMapApp, Virtual Ocean) Web Application (GMRT MapTool) iPhone App (Earth Observer) Web Services Grid Server, Image Server, Attribution Service WMS (Mercator, SP, + NP 2016) Point Service + Profile Service (Dec 2015) Broad distribution through collaborations GEBCO, Google, ESRI, NOAA NCEI
GMRT: MB QA/QC Bad navigation Noisy outer beams Attitude problems Bad soundings Instrument problems Bad weather Sound velocity Slow speed in turns Quality assessment –Grid weighting –Grid resolution
Raw MB Files* MB QA/QC Processed MB files rDB *source data in public domain Tiled images, grids, mask GMRT Services GMRT – MBES Workflow MB System
GMRT – MBES Content GMRT v3.1 released Nov 2015 >175,000 data files + metadata ~4.4 million ship-track km of data 875 cruises 26 Ships 21 Swath File Formats 15 Sonar Systems – Most modern data acquired with Kongsberg systems
GMRT Metadata Per Data Set Processing notes Make/Model/Ship Quality, resolution Contributor Per Data File Metrics (from mbinfo) Track-line geometry Under Development Polygon geometry Area mapped Processing metrics Per Data Set Processing notes Make/Model/Ship Quality, resolution Contributor Per Data File Metrics (from mbinfo) Track-line geometry Under Development Polygon geometry Area mapped Processing metrics
GMRT 1992 R2R 2009 MAC 2011 Multibeam Sonar Data Continuum GOAL: Well-documented high-quality publicly available data
Pulling the pieces together…
Consolidated access to resources
Fleet-wide Review of R2R MBQA Tests & Results Which tests correspond with issues corrected in GMRT processing? Are test thresholds correct? Other tests needed?
Fleet-wide Review of GMRT Processed Data Statistics Are there metrics can we programmatically assemble from processed data that can help improve data quality at acquisition?
Compare/Combine Results Preliminary
Next Steps… Quantitatively compare R2R MBQA results with GMRT Processed Data Statistics –Identify which tests are working –Refine MBQA tests/parameters Code-sharing between MAC-GMRT-R2R Use fleet-wide data review (MBQA tests + GMRT results) to help improve MAC guidelines/best practices