Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.

Similar presentations


Presentation on theme: "The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April."— Presentation transcript:

1 The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April Clyburne-Sherin

2 Objectives What is reproducibility? Why practice reproducibility? What is necessary for research to be reproducible? How can you make your research reproducible? What is literate programming?

3 What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses on validating the scientific claim Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Replication

4 What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses of the validating the scientific claim Many studies cannot be replicated Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Replication

5 What is reproducibility? Replicability Replication of findings is highest standard of evaluating evidence Focuses of the validating the scientific claim Many studies cannot be replicated Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis ?

6 What is reproducibility? Reproducibility Reproduction of study findings using study materials Requires transparency of methods, data, and code Focuses on the validity of the data analysis Limited type of replication Minimum standard for any scientific study Scientific method ObservationQuestionHypothesisPredictionTestingAnalysis Reproduction

7 Study report Why practice reproducibility? Numerical summaries Figures Reported results Tables Study report is enough to: Assess study justification Assess study design Understand how the experiment was conducted Assess the relevance of findings

8 Study report Why practice reproducibility? Numerical summaries Figures Reported results Tables Study report is not enough to: Assess errors in analyses Assess the sensitivity of findings to assumptions Reproduce the analyses Cannot evaluate the study analyses and findings using a study report alone.

9 Study report Reporting Why practice reproducibility? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing

10 Study report Reporting Why practice reproducibility? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing To fully assess the analyses and findings of a study, we need more information.

11 Why practice reproducibility? The idealist Shoulders of giants! Minimum scientific standard Allows others to build on your findings Improved transparency Increased transfer of knowledge Increased utility of your data + methods The pragmatist Data sharing citation advantage (Piwowar 2013) “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout Improves capacity for complex and large datasets or analyses Increased productivity

12 Study report Reporting What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results ProcessingAnalysing

13 Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code

14 Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code

15 Study report Presentation code What is necessary for research to be reproducible? Raw data Numerical summaries Figures Analytic data Reported results Tables Raw results Processing code Analytic code 1.Data + metadata 2.Code 3.Documentation of data + code

16 How can you make your research reproducible? Data management plan Informative naming + location Study plan + pre-analysis plan 1. Plan for reproducibility before you start Version control Documentation 2. Keep track of things Use software that can be coded Literate programming 3. Let your computer do the work 4. Archive + share your materials

17 1. Plan for reproducibility before you start Data management plan Prepare to share Data that is well-managed from the start is easier to prepare for sharing Smooths transitions between researchers Protects you if questions are raised about data validity Metadata provides context Document metadata while collecting to save time How? Use open data formats rather than proprietary:.csv,.txt,.png Data: – Collected – Stored – Documented – Managed Metadata: – Collected – Documented / Version control

18 1. Plan for reproducibility before you start Informative name + location Plan your file naming + location system a priori Names and locations should be distinctive, consistent, and informative: – What it is – Why it exists – How it relates to other files

19 1. Plan for reproducibility before you start Informative name + location The rules don’t matter. That you have rules matters. Make it machine readable: – Default ordering – Use of meaningful deliminators and tags – Example: use “_” and “-” to store metadata in name (eg, YYYY-MM- DD_assay_sample-set_well) Make it human readable: – Choose self-explanatory names and locations

20 1. Plan for reproducibility before you start Study plan Pre-register your study plan before you look at your data! Hypothesis Study design – Type of design – Sampling – Power and sample size – Randomization? Variables measured – Meaningful effect size Variables constructed – Data processing Etc… Open Science Framework ClinicalTrials.gov

21 1. Plan for reproducibility before you start Pre-analysis plan Define data analysis set Statistical analyses – Primary – Secondary – Exploratory Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Raw data Analytic data Raw results ProcessingAnalysing

22 2. Keep track of things Version control Everything created manually should use version control Tracks changes to files, code, metadata Allows you to revert to old versions Make incremental changes: commit early, commit often Git / GitHub / BitBucket Version control for data Metadata should be version controlled

23 2. Keep track of things Documentation Document your software environment (eg, dependencies, libraries, sessionInfo () in R) Everything done by hand or not automated from data and code should be precisely documented: – README files Make raw data read only – You won’t edit it by accident – Forces you to document or code data processing Document in code comments

24 3. Let your computer do the work Use software that can be coded Graphical user interfaces are hard to reproduce. Telling a computer what to do maximizes reproducibility. Teaching a computer what to is telling researcher using your code what to do.

25 3. Let your computer do the work Literate programming Links data, code, output, and documentation Combines code “chunks” with text and output Requires a documentation language + a programming language Produces documents in html, pdf, and more R Studio + R Notebook, Sweave, or knitr

26 4. Archive + share your materials Open Science Framework R R Pubs

27 How can you make your research reproducible? Data management plan – Prepare to share Informative naming + location – The rules don’t matter. That you have rules matters. Study plan + pre-analysis plan – Pre-register your plan 1. Plan for reproducibility before you start Version control – Track your changes Documentation – Everything done by hand 2. Keep track of things Use software that can be coded – Teaching a computer is teaching others Literate programming - Link data, code, output, and documentation 3. Let your computer do the work Where doesn’t matter. That you share matters. 4. Archive + share your materials

28 How to learn more Organizing a project for reproducibility – Reproducible Science Curriculum by Jenny Bryan – https://github.com/reproducibl e-science-curriculum/ https://github.com/reproducibl e-science-curriculum/ Data management – Data Management from Software Carpentry by Orion Buske – http://software- carpentry.org/v4/data/mgmt.h tml http://software- carpentry.org/v4/data/mgmt.h tml Literate programming – Literate Statistical Programming by Roger Peng – https://www.youtube.com/wat ch?v=YcJb1HBc-1Q https://www.youtube.com/wat ch?v=YcJb1HBc-1Q Version control – Version Control by Software Carpentry – http://software- carpentry.org/v4/vc/ http://software- carpentry.org/v4/vc/ Sharing materials – Open Science Framework by Center for Open Science – https://osf.io/ https://osf.io/

29 An example of reproducible analyses using R + Open Science Framework 1.Pre-register analysis plan 2.Read only dataset 3.Version control of analyses 4.Literate programming using knitr


Download ppt "The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April."

Similar presentations


Ads by Google