Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Reproducible Research Advantage

Similar presentations


Presentation on theme: "The Reproducible Research Advantage"— Presentation transcript:

1 The Reproducible Research Advantage
Why + how to make your research more reproducible Presentation for Research Week February 22, 2016 April Clyburne-Sherin

2 Objectives What is reproducibility? Why practice reproducibility?
What materials are necessary for reproducibility? How can you make your research reproducible?

3 What is reproducibility?
Scientific method Replication of findings is highest standard of evaluating evidence Focuses on validating the scientific claim Requires transparency of methods, data, and code Minimum standard for any scientific study Observation Question Hypothesis Prediction Testing Analysis Replication Replicability: How well scientific findings can be replicated. A study is considered replicable when the original findings were produced again when the study was completely repeated.

4 Why practice reproducibility?
Research that produces novel results, statistically significant results, clean results, more likely to be published (Fanelli, D. Negative results are disappearing from most disciplines and countries. Scientometrics 90, 891–904 (2012). Greenwald, A. G. Consequences of prejudice against the null hypothesis. Psychol. Bull. 82, 1–20 (1975)) As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true effect . Such practices include using flexible study designs and flexible statistical analyses and running small studies with low statistical power1,5 .

5 Why practice reproducibility?
There is evidence that our published literature is too good to be true. Daniele Fanelli did an analysis of what gets published across scientific disciplines and found that all disciplines had positive result rates of 70% or higher. From physics through psychology, the rates were 85-92%. Consider our field’s 92% positive result rate in comparison to the average power of published studies. Estimates suggest that the average psychology study has a power of somewhere around .5 to .6 to detect its effects. So, if all published results were true, we’d expect somewhere between 50-60% of the critical tests to reject the null hypothesis. But we get 92%. That does not compute. Something is askew in the accumulating evidence. [It is not in my interest to write up negative results, even if they are true, because they are less likely to be published (negative) – file-drawer effect] The accumulating evidence suggests an alarming degree of mis-estimation. Across disciplines, most published studies demonstrate positive results – results that indicate an expected association between variables or a difference between experimental conditions (Fanelli, 2010, 2012; Sterling, 1959). Fanelli observed a positive result rate of 85% in neuroscience (Fanelli, 2010). Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068.

6 Why practice reproducibility?
Study report Study report is not enough to: Assess the sensitivity of findings to assumptions Replicate the findings Distinguish confirmatory from exploratory analyses Identify protocol deviations Cannot evaluate the study analyses and findings using a study report alone. Reported results Figures Even the best-reported study report does not contain enough information to completely evaluate the study analyses and findings. Some describe scientific publications as simply “advertising” the actual research. Tables Numerical summaries

7 Why practice reproducibility?
Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results Following data collection, there are: Many materials created that are not routinely published with the study report Many steps are taken in processing and analysing that are not routinely published in the study report Tables Collecting Numerical summaries

8 Why practice reproducibility?
Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results To fully assess the analyses and findings of a study, we need more information from scientists than the study report alone. Why should scientists care? Why should scientists put in the time to prepare and share additional materials and information? Tables Collecting Numerical summaries To fully assess the analyses and findings of a study, we need more information.

9 What materials are necessary for reproducibility?
Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results In order for research to be reproducible, the methods, the data, and the code must be shared. Although the raw data can be shared if desired or appropriate, the minimum requirement for reproducibility is: Documentation of your methods Processing, analytic, and presentation code Analytic data Tables Collecting Data + metadata Code Documentation of methods Numerical summaries

10 Why practice reproducibility?
The idealist The pragmatist Shoulders of giants! Minimum scientific standard Allows others to build on your findings Improved transparency Increased transfer of knowledge Increased utility of your data + methods Data sharing citation advantage (Piwowar 2013) “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout Improves capacity for complex and large datasets or analyses Increased productivity Reproducible research practices are good for science! Reproducible research practices are good for YOU. By sharing your research materials, you may bump up your citations! You will NOT REMEMBER what you did. You will NOT REMEMBER why you did it. By documenting what you did clearly, you are communicating what you did with your future self: Defend what you did Means you can build on it with your next project Pass your work on to the next lab personnel Reproducible research practices involve research planning, organization, and automation. This can improve your research capacity and productivity.

11 How can you make your research reproducible?
Power Data management plan Informative naming + location Study plan + pre-analysis plan 1. Plan for reproducibility before you start Version control Documentation 2. Keep track of things Reporting Confirmatory vs. exploratory analyses 3. Contain bias 4. Archive + share your materials

12 1. Plan for reproducibility before you start
Power How? Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects Estimate the size of effect you are studying Design your study with sufficient power to detect that effect If you need more power, consider collaborating If your study is underpowered, report this and acknowledge this limitation in the interpretation of your results

13 1. Plan for reproducibility before you start
Data management plan How? Prepare to share Data that is well-managed from the start is easier to prepare for sharing Smooths transitions between researchers Protects you if questions are raised about data validity Metadata provides context Document metadata while collecting to save time Use open data formats rather than proprietary: .csv, .txt , .png Data: Collected Stored Documented Managed Metadata: Documented / Version control Metadata provides context How When What were the experimental conditions

14 1. Plan for reproducibility before you start
Informative name + location Plan your file naming + location system a priori Names and locations should be distinctive, consistent, and informative: What it is Why it exists How it relates to other files

15 1. Plan for reproducibility before you start
Informative name + location The rules don’t matter. That you have rules matters. Make it machine readable: Default ordering Use of meaningful deliminators and tags Example: use “_” and “-” to store metadata in name (eg, YYYY-MM-DD_assay_sample-set_well) Make it human readable: Choose self-explanatory names and locations

16 1. Plan for reproducibility before you start
Study plan How? Pre-register your study plan before you look at your data Public registration of all studies counters publication bias Counters selective reporting and outcome reporting bias Distinguishes a priori design decisions from post hoc Corroborates the rigor of your findings Hypothesis Study design Type of design Sampling Power and sample size Randomization? Variables measured Meaningful effect size Variables constructed Data processing Open Science Framework ClinicalTrials.gov

17 1. Plan for reproducibility before you start
Pre-analysis plan How? Pre-register your analysis plan before you look at your data Defines your confirmatory analyses Corroborates the rigor of your findings Define data analysis set Statistical analyses Primary Secondary Exploratory Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Your analysis plan should include the planning all the way from raw to processed data, from analysis to presentation. Processing Analysing Raw data Analytic data Raw results

18 2. Keep track of things Version control Version control for data
Track your changes Everything created manually should use version control Tracks changes to files, code, metadata Allows you to revert to old versions Make incremental changes: commit early, commit often Git / GitHub / BitBucket Metadata should be version controlled

19 2. Keep track of things Documentation Document everything done by hand
Document your software environment (eg, dependencies, libraries, sessionInfo () in R) Everything done by hand or not automated from data and code should be precisely documented: README files Make raw data read only You won’t edit it by accident Forces you to document or code data processing Document in code comments Track software environment: Computer architecture operating system software toolchain supporting software + infrastructure (libraries, r packages, dependencies) external dependencies (data repositories, web sites) version numbers for everything

20 3. Contain bias CONSORT SAMPL Reporting How?
Report transparently + completely Transparently means: Readers can use the findings Replication is possible Users are not misled Findings can be pooled in meta-analyses Completely means: All results are reported, no matter their direction or statistical significance Use reporting guidelines CONSORT Consolidated Standards of Reporting Trials SAMPL Statistical analyses and methods in the published literature

21 3. Contain bias Confirmatory vs. exploratory How?
Distinguish confirmatory from exploratory analyses Provide access to your pre-registered analysis plan Avoid HARKing: Hypothesizing After the Results are Known Report all deviations from your study plan Report which decisions were made after looking at the data Share with RPubs

22 4. Archive + share your materials
Open Science Framework Where doesn’t matter. That you share matters. Get credit for your code, your data, your methods Increase the impact of your research

23 How can you make your research reproducible?
Power – Calculate your power Data management plan – Prepare to share Informative naming + location – The rules don’t matter. That you have rules matters. Study plan + pre-analysis plan – Pre-register your plans 1. Plan for reproducibility before you start Version control – Track your changes Documentation – Document everything done by hand 2. Keep track of things Reporting – Report transparently + completely Confirmatory vs. exploratory analyses – Distinguish confirmatory from exploratory 3. Contain bias Where doesn’t matter. That you share matters. 4. Archive + share your materials

24 How to learn more Organizing a project for reproducibility
Reproducible Science Curriculum by Jenny Bryan Data management Data Management from Software Carpentry by Orion Buske Literate programming Literate Statistical Programming by Roger Peng Version control Version Control by Software Carpentry Sharing materials Open Science Framework by Center for Open Science


Download ppt "The Reproducible Research Advantage"

Similar presentations


Ads by Google