Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and

Slides:



Advertisements
Similar presentations
Jeopardy Objects Navigation Buttons True/False Parts of a Report Vocabulary Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final.
Advertisements

Submission Process. Overview Preparing for submission The submission process The review process.
Open Exeter Project Team
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Software Development, Programming, Testing & Implementation.
This chapter is extracted from Sommerville’s slides. Text book chapter
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
"Piled Higher and Deeper" by Jorge Cham
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.
LESSON 8 Booklet Sections: 12 & 13 Systems Analysis.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Configuration Management (CM)
Design and Programming Chapter 7 Applied Software Project Management, Stellman & Greene See also:
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
Replicating Results- Procedures and Pitfalls June 1, 2005.
1 NTTC/NTC ERO Training 2011 Tax Year 2007 ERO TRAINING ELECTRONIC RETURN ORIGINATOR (ERO) (Transmitter in Tax-Wise)
Overview The problem The solution Benefits/Lesson Resource Audit and Comparison Tool (ReACT) Ray Moore (Archaeology.
Programming Principles Chapter 1. Objectives Discuss the program design process. Introduce the Game of Life. Discuss object oriented design. – Information.
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
How to Turnitin Dr Stephen Rankin Lecturer in Academic Writing and Literacy Murdoch University A 6 step guide for submitting your assignments to Turnitin.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
Webinar on increasing openness and reproducibility April Clyburne-Sherin Reproducible Research Evangelist
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
MANAGEMENT INFORMATION SYSTEM
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Advanced Higher Computing Science
Dr.V.Jaiganesh Professor
Open Exeter Project Team
ELECTRONIC RETURN ORIGINATOR (ERO) (Transmitter in Tax-Wise)
Implementation Process
Ingest and Dissemination with DAITSS
Florio O. Arguillas, Jr., Ph.D.
MANAGEMENT OF STATISTICAL PRODUCTION PROCESS METADATA IN ISIS
ReproZip: Computational Reproducibility With Ease
An Overview of Data-PASS Shared Catalog
CISER Setup Files Creator and Its DDI 2.5 Codebook
CAPE Internal Assessment
Accelerate define.xml using defineReady - Saravanan June 17, 2015.
Publishing software and data
Enhancing Scholarly Communication with ReproZip
Sophia Lafferty-hess | research data manager
TRAINING OF FOCAL POINTS ON THE CountrySTAT/FENIX SYSTEM
Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau William C. Block,1 Warren Brown,1 Jeremy Williams,1 Lars Vilhuber,2.
What comes first? Metadata or Data Access?
12 Product Configurator
Open Access to your Research Papers and Data
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Sharing of Eurostat predefined tables
Sharing of Eurostat predefined tables
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Dataverse for citing and sharing research data
Introduction of PTM (Planning Tracking & Management) Tool - developed by Meridian Technology 29/05/2019.
Using Spreadsheets in Research – Best Practices
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and Reproduction of Results Service William C. Block Florio O. Arguillas Jeremy Williams Cornell Institute for Social and Economic Research (CISER) Corresponding author e-mail: block@cornell.edu CISER Staff prints the paper, runs the code against their data and compare the results with those cited (yellow-highlighted) on the paper Abstract A year and a half after its implementation, CISER’s Research Data Quality Review and Reproduction of Results Service (or R2, for short), a service developed to encourage sharing of high quality data, code, documentation, and metadata associated with a study for the purpose of reproducible research, continues to evolve and improve. This poster discusses: a) the service at its current state; b) the improvements made to the service including cost-reduction and buy-in strategies to encourage researchers to use the service; c) pre- and post-reproduction services to improve data, code, documentation, and metadata quality; d) the inclusion of the Cornell’s CED2AR software for assessing and generating complete DDI metadata; and e) the utilization of the CISER Data Archive as the free and permanent home for the study and its associated files. Last, our poster will include for the first time our efforts to incorporate the W3C’s Data Quality Vocabulary (DQV) and PROV metadata into our R2 service. DQV provides a means to describe the subjective measures applied to ensure the integrity of data sources and introduces a way of expressing guidelines for data quality that producers of data can abide by. The W3C PROV ontology defines entities, agents, and activities related to the origin of resources, which is particularly important in the common scenario of referencing multiple datasets for a given investigation." By incorporating DQV and PROV into the R2 data management workflow, we hope to provide a foundation of support improving the quality of research inputs and outputs, which in turn can have a chain effect for derivative data products. Golden Rule of CISER’s Replication Service: Output produced by running code against the data should be identical to the publication up to the last decimal place. Slight deviation is not acceptable and must be investigated. Common problems: Very long, complex codes Unnecessary/excess sections of codes whose outputs are not found in the paper (this delays replication because the Staff has to go through the entire code and its output, and figure out where they are on the paper) Code points to subdirectories for retrieving or saving data, thus Staff has to recreate the directory structure for the code to run correctly Some codes are not efficient, but will not be modified by the Staff. The Staff, however, will suggest ways to make it efficient. Codes are often multiple files with no indication of sequence. Reproducer has to determine which to run first especially if codes build on top of the other. Data Management Access Researcher provides CISER Staff copy of article, code(s), and data; highlights the sections on the article with figures derived from running code against data; and put comments on the codes that describe what the section of the code will produce or is doing. CISER uses CED2AR to check for completeness and create DDI metadata Build Metadata (DDI, DQV, PROV) Metadata Database CED2AR Common problems: No variable and value labels Ingest Archival Storage Github Check Submission: Complete? Run Code: Error-free? Compare Results: Identical? Github Cornell’s E-Commons Client Deposits Receive Submission Yes Yes Yes Assemble AIP DIPs for CISER Data Archive Bitbucket CISER Data Archive DIPs for Custom Website No No No Discuss with client and make necessary corrections Dataverse Cost-reduction strategies Data curation and management training Code writing and organization training Code efficiency training e.g., macro programming, SQL programming Version control software training e.g., Github Common problems: Some results do not match article Not all figures printed on the article are produced by the code. Some involved other software packages such as Excel. Order of variables in the model in the printed table do not match the order of the variables in the output table for that model, which slows down the verification process. International Association for Social Science Information Services & Technology