Rolling Deck to Repository II: Getting Control of Provenance and Quality AGU Poster IN43A-1169 AGU Fall Meeting December 17, 2008 Stephen.

Slides:



Advertisements
Similar presentations
Prototype Phase SIO Accomplishments
Advertisements

Rolling Deck to Repository: Transforming the United States Academic Fleet Into an Integrated Global Observing System Suzanne M. Carbotte, Robert Arko,
A centre of expertise in digital information management A QA Framework To Support Your Library Web Site Review Brian Kelly UKOLN University of Bath Bath.
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Exchange Product Overview Secure Transmission for Transaction-based Documents.
Visualizing Fitness for Purpose Bob Groman and Dicky Allison Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution.
Ocean Data Interoperability Platform EU-US-Australia collaborative project Grant Number: Call: FP7-INFRASTRUCTURES INFSO Activity: INFRA :
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Biological and Chemical Oceanography Data Management Office 1 of 12 An Introduction to the Biological and Chemical Oceanography Data Management Office.
Short Course on Introduction to Meteorological Instrumentation and Observations Techniques QA and QC Procedures Short Course on Introduction to Meteorological.
An Oceanographic Event Logger James R. Wilkinson and Karen S. Baker Scripps Institution of Oceanography, University of California San Diego Field Practices.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Presented by Ansie van der Westhuizen Unisa Institutional Repository: Sharing knowledge to advance research
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
Information Requirements for Integrating Spatially Discrete, Feature- Based Earth Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Kerstin Lehnert,
Metadata Guides for Smarties Marine Metadata Initiative URL:
Providing Access to Your Data Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review Date.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
The Value of Geospatial Metadata Metadata has tremendous value to Individuals within your organization, as well as to individuals outside of your organization.
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
Technical Working Group, II Teruko Manabe Steven Worley Miroslaw Mietus Shawn Smith Simon Tett Volker Wagner Scott Woodruff David Berry Liz Kent.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
M u l t I b e a m III W o r k s h o p M u l t I b e a m III W o r k s h o p National Geophysical Data Center / World Data Centers NOAA Slide 1 End-to-End.
PDS Geosciences Node Page 1 Archiving Mars Mission Data Sets with the Planetary Data System Report to MEPAG Edward A. Guinness Dept. of Earth and Planetary.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
3/30/04 16:14 1 Lessons Learned CERES Data Management Presented to GIST 21 “If the 3 laws of climate are calibrate, calibrate, calibrate, then the 3 laws.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
Data Management during GEOTRACES Data Management sub-committee: Reiner Schlitzer, Jing Zhang, Bill Jenkins, Chris Measures Scientific.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
WHOI and SIO (II): Next Steps Towards Multi-Institution Archiving of Shipboard and Deep Submergence Vehicle Data (IN51A-0306) The Woods Hole Oceanographic.
1 NATIONAL CENTERS FOR ENVIRONMENTAL INFORMATION NCEI-IOOS Project Updates Mathew Biddle May 28th, 2015 IOOS DMAC Meeting, IOOS.
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
NOAA/NESDIS/National Oceanographic Data Center Following the Flow of Two Underway Data Streams Within the U. S. National Oceanographic Data Center Steven.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
1 1 NOAA Office of Ocean Exploration End-to-End Data Management: A Success Story NOAA Tech Conference November 2005 Susan Gottfried National Coastal Data.
SIOExplorer Stephen Miller Scripps Institution of Oceanography USA International Data Exchange Workshop Building a Global Data Network for Studies of Earth.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Semantic Concepts in Expedition Metadata Semantic Concepts in Expedition Metadata Bob Arko Lamont-Doherty Earth Observatory OOSSI Workshop Nov. 18, 2008.
Public Libraries Survey Data File Overview. What We’ll Talk About PLS: Public Libraries Survey State level data Public library data (Administrative Entities)
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Working prototype Multi-Institution Testbed for Scalable Digital Archiving Three institutions are working together to rescue at-risk media, establish interoperability,
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Biological and Chemical Oceanography Data Management Office slide 1 of 10 U.S. GEOTRACES Data Management Cyndy Chandler BCO-DMO ~ WHOI 23 September 2008.
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
NOAA EDMC Ocean Observatories Initiative Cyberinfrastructure Karen Stocks OOI CI Data Curator University of California, San Diego Ocean Observatories.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
SIOExplorer: Digital Library Projects R/V Alexander Agassiz November, 1907 UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center.
LEGACY MEETING – SUMMARY NOTES LEGACY MEETING SUMMARY NOTES September 3-5, 2008 Palisades, NY.
Biological and Chemical Oceanography Data Management Office slide 1 of 10 The Biological and Chemical Oceanography Data Management Office (BCO-DMO) Cyndy.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Acknowledgments Funding provided by the Jewett Foundation Introduction Data collected in ocean sciences, whether generated from research or operational.
SAMOS Data Management System
Using Ocean Data View for EMODnet Chemistry Reiner Schlitzer
Linked Data for Field Deployments
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Project Information Management Jiwei Ma
Bird of Feather Session
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

Rolling Deck to Repository II: Getting Control of Provenance and Quality AGU Poster IN43A-1169 AGU Fall Meeting December 17, Stephen P. Miller 1, Dru Clark 1, Caryn Neiswender 1, Robert A. Arko 2, Cynthia L. Chandler 3 R R R R 2 2 II. Provenance and Quality Control Current Epoch 1.Gather all existing data 2.Go to sea 3.Merge new data with old 4.Try to figure out why things don’t agree (iterative process) 5.Publish in online journal 6.Exchange data Prior Epoch 1.Go to sea 2.Work up your own data 3.Publish in a journal 4.Exchange reprints V. Current and Pending Development Provenance: Track events throughout data life cycle: acquisition, QC, editing, merging, calibration, archiving Quality Control: Check data (and metadata) values according to established criteria, flag or repair, and report findings Make provenance and quality control information readily available for wide range of users, over decades, with “Institutional Quality Control Certificate” I. A Paradigm Shift Case Study 2: Multibeam Seafloor Mapping Data Multibeam systems depend on accurate navigation, vertical reference and sound velocity data. NSF may support mandatory roll and pitch bias tests, at least on an annual basis. A major barrier to interdisciplinary re-use of data comes from a lack of understanding of the quality of a data file as it is exchanged among users, institutions and repositories. Current practice can lead to the propagation of artifacts, or at best wasteful duplication of QC effort. The R2R project is working toward standard MB- System based QC tools and reporting methods, to be recorded in an Institutional Quality Certificate (xml) that may travel with the swath file throughout its life cycle. One tool under active development by Scripps Institution of Oceanography Shipboard Technical Services is navd, which utilizes complex algorithms to select the “best” navigation data from multiple GPS data streams. Tools To evaluate data, based on established criterion, the R2R project will include the development, testing and deployment of standardized tools. Quality Control Certificate The Quality Control Certificate will include information about both quality and provenance. This certificate will utilize XML formatting, along with controlled vocabularies for quality tests, and data processing activities. Once certified by the appropriate authority, the Quality Control Certificate will inform data consumers about the quality and history of a data object. Case Study 1: Navigation Data Almost every data stream and sampling event depends on accurate navigation. Even in the era of modern GPS systems, artifacts arise from instrument and data transfer errors, signal blockage, combining data from diverse receivers, from semi- automatic or manual errors in intermediate file management, inappropriate resampling, or conversions to other formats. Improvements are needed to automatically detect and flag unrealistic values and outliers. Standard graphical tools would help to avoid embarrassing track lines over land. filenameSB edp.mb32 filesize checksum c5a96e3f91d8f2049b9ed149 date of current file Record TypeRecord NameRecord DateAuthority Name Authority Institution DescriptionDiscussion provenanceacquisition Charters, JamesSIO SOMTSOriginal acquisition of data that led to version in this fileOriginal SB 2000 data provided depths assuming 1500 m/sec provenanceprocessed Peckman, UtaSIO GDCThis file contains values that have been transformed by some sort of algorithm or filtering Converted to true depths with correct svp, and reprocessed with correct pitch-, roll- and yaw-biases removed provenanceaccess release Peckman, UtaSIO GDCThis file may be released to the public, according to the right- to-use statement in the accompanying metadata or supporting archive web site Proprietary hold has now been released by original data owner, chief scientist Hubert Staudigel qualityversion Clark, DruSIO GDCCertifies that this file is the best available version at current time Part of standard multibeam QC review by GDC qualitymetadata Clark, DruSIO GDCCertifies that metadata associated with this file are free of errors Part of standard multibeam QC review by GDC provenancepublished in SIOExplorer Clark, DruSIO GDCThis file has been archived in the SIOExplorer Digital Library, along with supporting metadata, Published in original version of SIOExplorer digital library provenancesubmitted to NGDC Smith, StuartSIO GDCThis file submitted to National Geophysical Data Center (NGDC) repository, Part of bulk transfer of multibeam cruises to NGDC provenancepublished in SIOExplorer Clark, DruSIO GDCThis file has been archived in the SIOExplorer Digital Library, along with supporting metadata, Re-published in revised version of SIOExplorer collection Sample Quality Control Certificate IV. Case Studies III. Background It Takes a Team Researchers Students Technicians Data Managers It Takes a Team Researchers Students Technicians Data Managers Preparation Cruise Level Metadata The Who, When and Where of a research cruise Data Gathering Scientists gather data in preparation for this cruisePreparation Cruise Level Metadata The Who, When and Where of a research cruise Data Gathering Scientists gather data in preparation for this cruise Data Acquisition Start with correct metadata Processing at Sea No Processing Raw data submitted to repository Basic Processing Data converted to standard format; Quality Control (QC) tests performed; Data issues annotated Processing at Sea No Processing Raw data submitted to repository Basic Processing Data converted to standard format; Quality Control (QC) tests performed; Data issues annotated Processing on Shore Basic Processing QC tests are interpreted; Data issues are resolved; QC Certificate generated Full Processing Data converted to standard format; QC tests are run and interpreted; Data issues are resolved; Metadata generated for each file; QC Certificate generated Institutional and central repositories may synchronize with web service Processing on Shore Basic Processing QC tests are interpreted; Data issues are resolved; QC Certificate generated Full Processing Data converted to standard format; QC tests are run and interpreted; Data issues are resolved; Metadata generated for each file; QC Certificate generated Institutional and central repositories may synchronize with web service Submission to Repository Retrieve for Research or Cruise Planning Expert Processing Expert Research or Data Manager Special knowledge or technology are used to enhance published data Journal Publication Persistent citation for data in repository Expert Processing Expert Research or Data Manager Special knowledge or technology are used to enhance published data Journal Publication Persistent citation for data in repository Reprocessed Data Submitted to Repository Rolling Deck to Repository (R2R) Project Overview NSF-supported research vessels collectively produce an enormous volume and diversity of scientific data. With today’s rapidly rising ship costs, and the current trend toward greater re-use of shipboard data, it is imperative that the community takes positive, cost-effective, systematic steps to ensure greater data access. The NSF Division of Ocean Sciences Data and Sample Policy (pub. NSF ) states, “Principal Investigators are required to submit all environmental data collected to the designated National Data Centers as soon as possible, but no later than two (2) years after the data are collected. Inventories (metadata) of all marine environmental data collected should be submitted to the designated National Data Centers within sixty (60) days after the observational period/cruise.” However, procedures for such submissions are poorly established, require lengthy follow-up with investigators, and yield documentation of variable quality. As the volume and diversity of data collected by the fleet increases, this problem will only grow worse. This new approach provides a “direct pipeline” from operating institutions to a central shoreside facility. Working directly with ship operators, we will ensure more complete and consistent data collection, quality control, and reporting. This modernized system will transition the U.S. academic research fleet from a collection of independent expeditionary platforms into an integrated ocean observing system – a network of ships and submersibles around the world that routinely report a standard suite of underway data and documentation to a central repository. The streamlined R2R system will facilitate data discovery and integration, quality assessment, cruise planning, compliance with funding agency data policies, and long- term data preservation. R2R Poster Series Rolling Deck to Repository I: Designing a Database Infrastructure AGU Poster # IN43A-1168 Rolling Deck to Repository II: Getting Control of Provenance and Quality AUG Poster # IN43A-1169 Rolling Deck to Repository III: Shipboard Event Logging AGU Poster # IN43A-1170 R2R Project Leads Scripps Institution of Oceanography 1 Stephen P. Miller Lamont-Doherty Earth Observatory 2 Robert A. Arko Woods Hole Oceanographic Institution 3 Cynthia L. Chandler The Rolling Deck to Repository Project acknowledges support from the National Science Foundation (NSF), Oceanographic Instrumentation and Technical Services (OITS) Program