The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.

Slides:



Advertisements
Similar presentations
Software Re-engineering
Advertisements

Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
DATA PROCESSING SYSTEMS
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Data Management: Documentation & Metadata Types of Documentation.
CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.
Open Exeter Project Team
Software Re-engineering
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style ADVANCED PROGRAMING PRACTICES API documentation.
Class 6 Data and Business MIS 2000 Updated: September 2012.
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Not our data, but we use it in research Wietse Dol, LEI-WUR 6 October 2014.
CHAPTER 4 Marketing Information and Research: Analyzing the Business Environment Off-line and Online M A R K E T I N G.
1 California State University, Fullerton Chapter 8 Personal Productivity and Problem Solving.
Planning and Writing Your Documents Chapter 6. Start of the Project Start the project by knowing the software you will write about, but you should try.
Web Site Design Principles
An introduction to MEDIN Data Guidelines. What MEDIN data guidelines are not… Protocols for collection methods Prescriptive of how you have to collect.
BTEC Unit 06 – Lesson 08 Principals of Software Design Mr C Johnston ICT Teacher
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Replicating Results- Procedures and Pitfalls June 1, 2005.
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, for UNECE.
MIS 327 Database Management system 1 MIS 327: DBMS Dr. Monther Tarawneh Dr. Monther Tarawneh Week 2: Basic Concepts.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
1 Technical & Business Writing (ENG-315) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 4 Data and Databases. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe the differences between data,
Conducting a Sound Systematic Review: Balancing Resources with Quality Control Eric B. Bass, MD, MPH Johns Hopkins University Evidence-based Practice Center.
Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
Data Organization Quality Assurance and Transformations.
Chapter – 8 Software Tools.
Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
Using SAS Enterprise Guide Add-In to Enable Guided Statistics Bas van Bakel OCS Consulting, the Netherlands © OCS Consulting.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
Webinar on increasing openness and reproducibility April Clyburne-Sherin Reproducible Research Evangelist
April Center for Open Fostering openness, integrity, and reproducibility of scientific research.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
Analysis Model Zhengyun You University of California Irvine Mu2e Computing Review March 5-6, 2015 Mu2e-doc-5227.
For more course tutorials visit PSYCH 625 Entire Course PSYCH 625 Week 1 Individual Assignment Basic Concepts in Statistics Worksheet.
PSYCH 625 MENTOR It's Your Life/psych625mentor.com
Open Exeter Project Team
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Advanced Programing practices
An introduction to MEDIN Data Guidelines September 2016
The Reproducible Research Advantage
Introduction to the C Language
GIT AND GITHUB WORKSHOP
Introduction to Visual Basic 2008 Programming
Transparency increases the credibility and relevance of research
Supplementary Table 1. PRISMA checklist
The Reproducible Research Advantage
An introduction to MEDIN Data Guidelines.
Best practices in R scripting
Reproducible research
Data Management: Documentation & Metadata
Generic Statistical Business Process Model (GSBPM)
Secondary Data Analysis Lec 10
Software Re-engineering and Reverse Engineering
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April Clyburne-Sherin

Objectives What is reproducibility? Why practice reproducibility? What is necessary for research to be reproducible? How can you make your research reproducible? What is literate programming?

What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses on validating the scientific claim Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Replication

What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses of the validating the scientific claim Many studies cannot be replicated Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Replication

What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses of the validating the scientific claim Many studies cannot be replicated Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis ?

What is reproducibility? Reproducibility Reproduction of study findings using study materials Requires transparency of methods, data, and code Focuses on the validity of the data analysis Limited type of replication Minimum standard for any scientific study Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Reproduction

Study report Why practice reproducibility? Numerical summaries Figures Reported results Tables Study report is enough to: Assess study justification Assess study design Understand how the experiment was conducted Assess the relevance of findings

Study report Why practice reproducibility? Numerical summaries Figures Reported results Tables Study report is not enough to: Assess errors in analyses Assess the sensitivity of findings to assumptions Reproduce the analyses Cannot evaluate the study analyses and findings using a study report alone.

Study report Reporting Why practice reproducibility? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing

Study report Reporting Why practice reproducibility? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing To fully assess the analyses and findings of a study, we need more information.

Why practice reproducibility? The idealist Shoulders of giants! Minimum scientific standard Allows others to build on your findings Improved transparency Increased transfer of knowledge Increased utility of your data + methods The pragmatist Data sharing citation advantage (Piwowar 2013) “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout Improves capacity for complex and large datasets or analyses Increased productivity

Study report Reporting What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing

Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code

Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code

Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code 1.Data + metadata 2.Code 3.Documentation of data + code

How can you make your research reproducible? Data management plan Informative naming + location Study plan + pre-analysis plan 1. Plan for reproducibility before you start Version control Documentation 2. Keep track of things Use software that can be coded Literate programming 3. Let your computer do the work 4. Archive + share your materials

1. Plan for reproducibility before you start Data management plan Prepare to share Data that is well-managed from the start is easier to prepare for sharing Smooths transitions between researchers Protects you if questions are raised about data validity Metadata provides context Document metadata while collecting to save time How? Use open data formats rather than proprietary:.csv,.txt,.png Data: – Collected – Stored – Documented – Managed Metadata: – Collected – Documented / Version control

1. Plan for reproducibility before you start Informative name + location Plan your file naming + location system a priori Names and locations should be distinctive, consistent, and informative: – What it is – Why it exists – How it relates to other files

1. Plan for reproducibility before you start Informative name + location The rules don’t matter. That you have rules matters. Make it machine readable: – Default ordering – Use of meaningful deliminators and tags – Example: use “_” and “-” to store metadata in name (eg, YYYY-MM- DD_assay_sample-set_well) Make it human readable: – Choose self-explanatory names and locations

1. Plan for reproducibility before you start Study plan Pre-register your study plan before you look at your data! Hypothesis Study design – Type of design – Sampling – Power and sample size – Randomization? Variables measured – Meaningful effect size Variables constructed – Data processing Etc… Open Science Framework ClinicalTrials.gov

1. Plan for reproducibility before you start Pre-analysis plan Define data analysis set Statistical analyses – Primary – Secondary – Exploratory Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Raw data Analytic data Raw results ProcessingAnalysing

2. Keep track of things Version control Everything created manually should use version control Tracks changes to files, code, metadata Allows you to revert to old versions Make incremental changes: commit early, commit often Git / GitHub / BitBucket Version control for data Metadata should be version controlled

2. Keep track of things Documentation Document your software environment (eg, dependencies, libraries, sessionInfo () in R) Everything done by hand or not automated from data and code should be precisely documented: – README files Make raw data read only – You won’t edit it by accident – Forces you to document or code data processing Document in code comments

3. Let your computer do the work Use software that can be coded Graphical user interfaces are hard to reproduce. Telling a computer what to do maximizes reproducibility. Teaching a computer what to is telling researcher using your code what to do.

3. Let your computer do the work Literate programming Links data, code, output, and documentation Combines code “chunks” with text and output Requires a documentation language + a programming language Produces documents in html, pdf, and more R Studio + R Notebook, Sweave, or knitr

4. Archive + share your materials Open Science Framework R R Pubs

How can you make your research reproducible? Data management plan – Prepare to share Informative naming + location – The rules don’t matter. That you have rules matters. Study plan + pre-analysis plan – Pre-register your plan 1. Plan for reproducibility before you start Version control – Track your changes Documentation – Everything done by hand 2. Keep track of things Use software that can be coded – Teaching a computer is teaching others Literate programming - Link data, code, output, and documentation 3. Let your computer do the work Where doesn’t matter. That you share matters. 4. Archive + share your materials

How to learn more Organizing a project for reproducibility – Reproducible Science Curriculum by Jenny Bryan – e-science-curriculum/ e-science-curriculum/ Data management – Data Management from Software Carpentry by Orion Buske – carpentry.org/v4/data/mgmt.h tml carpentry.org/v4/data/mgmt.h tml Literate programming – Literate Statistical Programming by Roger Peng – ch?v=YcJb1HBc-1Q ch?v=YcJb1HBc-1Q Version control – Version Control by Software Carpentry – carpentry.org/v4/vc/ carpentry.org/v4/vc/ Sharing materials – Open Science Framework by Center for Open Science –

An example of reproducible analyses using R + Open Science Framework 1.Pre-register analysis plan 2.Read only dataset 3.Version control of analyses 4.Literate programming using knitr