Download presentation
Presentation is loading. Please wait.
Published byMarcia Parsons Modified over 9 years ago
1
Rolling Deck to Repository II: Getting Control of Provenance and Quality AGU Poster IN43A-1169 AGU Fall Meeting December 17, 2008 http://rvdata.us/ Stephen P. Miller 1, Dru Clark 1, Caryn Neiswender 1, Robert A. Arko 2, Cynthia L. Chandler 3 R R R R 2 2 II. Provenance and Quality Control Current Epoch 1.Gather all existing data 2.Go to sea 3.Merge new data with old 4.Try to figure out why things don’t agree (iterative process) 5.Publish in online journal 6.Exchange data Prior Epoch 1.Go to sea 2.Work up your own data 3.Publish in a journal 4.Exchange reprints V. Current and Pending Development Provenance: Track events throughout data life cycle: acquisition, QC, editing, merging, calibration, archiving Quality Control: Check data (and metadata) values according to established criteria, flag or repair, and report findings Make provenance and quality control information readily available for wide range of users, over decades, with “Institutional Quality Control Certificate” I. A Paradigm Shift Case Study 2: Multibeam Seafloor Mapping Data Multibeam systems depend on accurate navigation, vertical reference and sound velocity data. NSF may support mandatory roll and pitch bias tests, at least on an annual basis. A major barrier to interdisciplinary re-use of data comes from a lack of understanding of the quality of a data file as it is exchanged among users, institutions and repositories. Current practice can lead to the propagation of artifacts, or at best wasteful duplication of QC effort. The R2R project is working toward standard MB- System based QC tools and reporting methods, to be recorded in an Institutional Quality Certificate (xml) that may travel with the swath file throughout its life cycle. One tool under active development by Scripps Institution of Oceanography Shipboard Technical Services is navd, which utilizes complex algorithms to select the “best” navigation data from multiple GPS data streams. Tools To evaluate data, based on established criterion, the R2R project will include the development, testing and deployment of standardized tools. Quality Control Certificate The Quality Control Certificate will include information about both quality and provenance. This certificate will utilize XML formatting, along with controlled vocabularies for quality tests, and data processing activities. Once certified by the appropriate authority, the Quality Control Certificate will inform data consumers about the quality and history of a data object. Case Study 1: Navigation Data Almost every data stream and sampling event depends on accurate navigation. Even in the era of modern GPS systems, artifacts arise from instrument and data transfer errors, signal blockage, combining data from diverse receivers, from semi- automatic or manual errors in intermediate file management, inappropriate resampling, or conversions to other formats. Improvements are needed to automatically detect and flag unrealistic values and outliers. Standard graphical tools would help to avoid embarrassing track lines over land. filenameSB.19990202.edp.mb32 filesize4032914 checksum63758953c5a96e3f91d8f2049b9ed149 date of current file 2007-07-26 Record TypeRecord NameRecord DateAuthority Name Authority Institution DescriptionDiscussion provenanceacquisition1999-02-02Charters, JamesSIO SOMTSOriginal acquisition of data that led to version in this fileOriginal SB 2000 data provided depths assuming 1500 m/sec provenanceprocessed2001-10-12Peckman, UtaSIO GDCThis file contains values that have been transformed by some sort of algorithm or filtering Converted to true depths with correct svp, and reprocessed with correct pitch-, roll- and yaw-biases removed provenanceaccess release2002-03-09Peckman, UtaSIO GDCThis file may be released to the public, according to the right- to-use statement in the accompanying metadata or supporting archive web site Proprietary hold has now been released by original data owner, chief scientist Hubert Staudigel qualityversion2002-05-16Clark, DruSIO GDCCertifies that this file is the best available version at current time Part of standard multibeam QC review by GDC qualitymetadata2002-05-16Clark, DruSIO GDCCertifies that metadata associated with this file are free of errors Part of standard multibeam QC review by GDC provenancepublished in SIOExplorer 2002-07-06Clark, DruSIO GDCThis file has been archived in the SIOExplorer Digital Library, along with supporting metadata, http://SIOExplorer.ucsd.edu Published in original version of SIOExplorer digital library provenancesubmitted to NGDC2002-08-15Smith, StuartSIO GDCThis file submitted to National Geophysical Data Center (NGDC) repository, www.ngdc.noaa.gov Part of bulk transfer of multibeam cruises to NGDC provenancepublished in SIOExplorer 2007-07-26Clark, DruSIO GDCThis file has been archived in the SIOExplorer Digital Library, along with supporting metadata, http://SIOExplorer.ucsd.edu Re-published in revised version of SIOExplorer collection Sample Quality Control Certificate IV. Case Studies III. Background It Takes a Team Researchers Students Technicians Data Managers It Takes a Team Researchers Students Technicians Data Managers Preparation Cruise Level Metadata The Who, When and Where of a research cruise Data Gathering Scientists gather data in preparation for this cruisePreparation Cruise Level Metadata The Who, When and Where of a research cruise Data Gathering Scientists gather data in preparation for this cruise Data Acquisition Start with correct metadata Processing at Sea No Processing Raw data submitted to repository Basic Processing Data converted to standard format; Quality Control (QC) tests performed; Data issues annotated Processing at Sea No Processing Raw data submitted to repository Basic Processing Data converted to standard format; Quality Control (QC) tests performed; Data issues annotated Processing on Shore Basic Processing QC tests are interpreted; Data issues are resolved; QC Certificate generated Full Processing Data converted to standard format; QC tests are run and interpreted; Data issues are resolved; Metadata generated for each file; QC Certificate generated Institutional and central repositories may synchronize with web service Processing on Shore Basic Processing QC tests are interpreted; Data issues are resolved; QC Certificate generated Full Processing Data converted to standard format; QC tests are run and interpreted; Data issues are resolved; Metadata generated for each file; QC Certificate generated Institutional and central repositories may synchronize with web service Submission to Repository Retrieve for Research or Cruise Planning Expert Processing Expert Research or Data Manager Special knowledge or technology are used to enhance published data Journal Publication Persistent citation for data in repository Expert Processing Expert Research or Data Manager Special knowledge or technology are used to enhance published data Journal Publication Persistent citation for data in repository Reprocessed Data Submitted to Repository Rolling Deck to Repository (R2R) Project Overview NSF-supported research vessels collectively produce an enormous volume and diversity of scientific data. With today’s rapidly rising ship costs, and the current trend toward greater re-use of shipboard data, it is imperative that the community takes positive, cost-effective, systematic steps to ensure greater data access. The NSF Division of Ocean Sciences Data and Sample Policy (pub. NSF 04-004) states, “Principal Investigators are required to submit all environmental data collected to the designated National Data Centers as soon as possible, but no later than two (2) years after the data are collected. Inventories (metadata) of all marine environmental data collected should be submitted to the designated National Data Centers within sixty (60) days after the observational period/cruise.” However, procedures for such submissions are poorly established, require lengthy follow-up with investigators, and yield documentation of variable quality. As the volume and diversity of data collected by the fleet increases, this problem will only grow worse. This new approach provides a “direct pipeline” from operating institutions to a central shoreside facility. Working directly with ship operators, we will ensure more complete and consistent data collection, quality control, and reporting. This modernized system will transition the U.S. academic research fleet from a collection of independent expeditionary platforms into an integrated ocean observing system – a network of ships and submersibles around the world that routinely report a standard suite of underway data and documentation to a central repository. The streamlined R2R system will facilitate data discovery and integration, quality assessment, cruise planning, compliance with funding agency data policies, and long- term data preservation. R2R Poster Series Rolling Deck to Repository I: Designing a Database Infrastructure AGU Poster # IN43A-1168 Rolling Deck to Repository II: Getting Control of Provenance and Quality AUG Poster # IN43A-1169 Rolling Deck to Repository III: Shipboard Event Logging AGU Poster # IN43A-1170 R2R Project Leads Scripps Institution of Oceanography 1 Stephen P. Miller spmiller@ucsd.edu Lamont-Doherty Earth Observatory 2 Robert A. Arko arko@ldeo.columbia.edu Woods Hole Oceanographic Institution 3 Cynthia L. Chandler cchandler@whoi.edu The Rolling Deck to Repository Project acknowledges support from the National Science Foundation (NSF), Oceanographic Instrumentation and Technical Services (OITS) Program
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.