The Reproducible Research Advantage

The Reproducible Research Advantage
Why + how to make your research more reproducible Presentation for Research Week February 22, 2016 April Clyburne-Sherin

Objectives What is reproducibility? Why practice reproducibility?
What materials are necessary for reproducibility? How can you make your research reproducible?

What is reproducibility?
Scientific method Replication of findings is highest standard of evaluating evidence Focuses on validating the scientific claim Requires transparency of methods, data, and code Minimum standard for any scientific study Observation Question Hypothesis Prediction Testing Analysis Replication Replicability: How well scientific findings can be replicated. A study is considered replicable when the original findings were produced again when the study was completely repeated.

Why practice reproducibility?
Research that produces novel results, statistically significant results, clean results, more likely to be published (Fanelli, D. Negative results are disappearing from most disciplines and countries. Scientometrics 90, 891–904 (2012). Greenwald, A. G. Consequences of prejudice against the null hypothesis. Psychol. Bull. 82, 1–20 (1975)) As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true effect . Such practices include using flexible study designs and flexible statistical analyses and running small studies with low statistical power1,5 .

There is evidence that our published literature is too good to be true. false positives heavily contaminate the literature Daniele Fanelli did an analysis of what gets published across scientific disciplines and found that all disciplines had positive result rates of 70% or higher. From physics through psychology, the rates were 85-92%. Consider our field’s 92% positive result rate in comparison to the average power of published studies. Estimates suggest that the average psychology study has a power of somewhere around .5 to .6 to detect its effects. So, if all published results were true, we’d expect somewhere between 50-60% of the critical tests to reject the null hypothesis. But we get 92%. That does not compute. Something is askew in the accumulating evidence. [It is not in my interest to write up negative results, even if they are true, because they are less likely to be published (negative) – file-drawer effect] The accumulating evidence suggests an alarming degree of mis-estimation. Across disciplines, most published studies demonstrate positive results – results that indicate an expected association between variables or a difference between experimental conditions (Fanelli, 2010, 2012; Sterling, 1959). Fanelli observed a positive result rate of 85% in neuroscience (Fanelli, 2010). Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068.

Study report Study report is not enough to: Assess the sensitivity of findings to assumptions Replicate the findings Distinguish confirmatory from exploratory analyses Identify protocol deviations Cannot evaluate the study analyses and findings using a study report alone. Reported results Figures Even the best-reported study report does not contain enough information to completely evaluate the study analyses and findings. Some describe scientific publications as simply “advertising” the actual research. Tables Numerical summaries

Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results Following data collection, there are: Many materials created that are not routinely published with the study report Many steps are taken in processing and analysing that are not routinely published in the study report Tables Collecting Numerical summaries

Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results To fully assess the analyses and findings of a study, we need more information from scientists than the study report alone. Why should scientists care? Why should scientists put in the time to prepare and share additional materials and information? Tables Collecting Numerical summaries To fully assess the analyses and findings of a study, we need more information.

What materials are necessary for reproducibility?
Study report Processing Analysing Reporting Reported results Figures Raw data Analytic data Raw results In order for research to be reproducible, the methods, the data, and the code must be shared. Although the raw data can be shared if desired or appropriate, the minimum requirement for reproducibility is: Documentation of your methods Processing, analytic, and presentation code Analytic data Tables Collecting Data + metadata Code Documentation of methods Numerical summaries

The idealist The pragmatist Shoulders of giants! Minimum scientific standard Allows others to build on your findings Improved transparency Increased transfer of knowledge Increased utility of your data + methods Data sharing citation advantage (Piwowar 2013) “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout Improves capacity for complex and large datasets or analyses Increased productivity Reproducible research practices are good for science! Reproducible research practices are good for YOU. By sharing your research materials, you may bump up your citations! You will NOT REMEMBER what you did. You will NOT REMEMBER why you did it. By documenting what you did clearly, you are communicating what you did with your future self: Defend what you did Means you can build on it with your next project Pass your work on to the next lab personnel Reproducible research practices involve research planning, organization, and automation. This can improve your research capacity and productivity.

How can you make your research reproducible?
Power Data management plan Informative naming + location Study plan + pre-analysis plan 1. Plan for reproducibility before you start Version control Documentation Literate programming 2. Keep track of things Reporting Confirmatory vs. exploratory analyses 3. Contain bias 4. Archive + share your materials

1. Plan for reproducibility before you start
Power Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects Low powered studies produce more false negatives than high powered If there are 100 true positive effects in a field, 20% power means only 20 of them will be discovered

Power Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects the low positive predictive value (PPV) the lower the power of a study, the lower the probability that an observed effect that passes the required threshold of claiming its discovery (that is, reaching nominal statistical significance, such as p PPV = ([1 – β] × R) ⁄ ([1− β] × R + α) (1−β) is the power, β is the type II error, α is the type I error and R is the pre-study odds ratio The positive predictive value (PPV) is the probability that a ‘positive’ research finding reflects a true effect (that is, the finding is a true positive). This probability of a research finding reflecting a true effect depends on the prior probability of it being true (before doing the study), the statistical power of the study and the level of statistical significance.

Power The Winner’s Curse Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects The winner's curse refers to the phenomenon that studies that find evidence of an effect often provide inflated estimates of the size of that effect. Such inflation is expected when an effect has to pass a certain threshold — such as reaching statistical significance — in order for it to have been 'discovered'. Effect inflation is worst for small, low-powered studies, which can only detect effects that happen to be large. If, for example, the true effect is medium-sized, only those small studies that, by chance, estimate the effect to be large will pass the threshold for discovery (that is, the threshold for statistical significance, which is typically set at p < 0.05). In practice, this means that research findings of small studies are biased in favour of inflated effects. By contrast, large, high-powered studies can readily detect both small and large effects and so are less biased, as both over- and underestimations of the true effect size will pass the threshold for 'discovery'. We optimistically estimate the median statistical power of studies in the neuroscience field to be between ~8% and ~31%. The figure shows simulations of the winner's curse (expressed on the y-axis as relative bias of research findings). These simulations suggest that initial effect estimates from studies powered between ~ 8% and ~31% are likely to be inflated by 25% to 50% (shown by the arrows in the figure). Inflated effect estimates make it difficult to determine an adequate sample size for replication studies, increasing the probability of type II errors.

Power How? Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects More likely obtain different estimates of the magnitude of the effect depending on the analytical options it implements A manipulation affecting only three observations could change the odds ratio from 1.00 to 1.50 in a small study but might only change it from 1.00 to 1.01 in a large study

Power How? Calculate your power Low power means: Low probability of finding true effects Low probability that a positive is a true positive (positive predictive value) Exaggerated estimate of the magnitude of effect when true effect discovered Greater vibration of effects Estimate the size of effect you are studying Design your study with sufficient power to detect that effect If you need more power, consider collaborating If your study is underpowered, report this and acknowledge this limitation in the interpretation of your results

Data management plan How? Prepare to share Data that is well-managed from the start is easier to prepare for sharing Smooths transitions between researchers Protects you if questions are raised about data validity Metadata provides context Document metadata while collecting to save time Use open data formats rather than proprietary: .csv, .txt , .png Data: Collected Stored Documented Managed Metadata: Documented / Version control Metadata provides context How When What were the experimental conditions

Informative name + location Plan your file naming + location system a priori Names and locations should be distinctive, consistent, and informative: What it is Why it exists How it relates to other files

Informative name + location The rules don’t matter. That you have rules matters. Make it machine readable: Default ordering Use of meaningful deliminators and tags Example: use “_” and “-” to store metadata in name (eg, YYYY-MM-DD_assay_sample-set_well) Make it human readable: Choose self-explanatory names and locations

Study plan How? Pre-register your study plan before you look at your data Public registration of all studies counters publication bias Counters selective reporting and outcome reporting bias Distinguishes a priori design decisions from post hoc Corroborates the rigor of your findings Hypothesis Study design Type of design Sampling Power and sample size Randomization? Variables measured Meaningful effect size Variables constructed Data processing Open Science Framework ClinicalTrials.gov

Pre-analysis plan How? Pre-register your analysis plan before you look at your data Defines your confirmatory analyses Corroborates the rigor of your findings Define data analysis set Statistical analyses Primary Secondary Exploratory Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Your analysis plan should include the planning all the way from raw to processed data, from analysis to presentation. Processing Analysing Raw data Analytic data Raw results

Pre-registration Pre-register your study + analysis plan with Registered Reports Your analysis plan should include the planning all the way from raw to processed data, from analysis to presentation.

2. Keep track of things Version control Version control for data
Track your changes Everything created manually should use version control Tracks changes to files, code, metadata Allows you to revert to old versions Make incremental changes: commit early, commit often Git / GitHub / BitBucket Metadata should be version controlled

2. Keep track of things Documentation Document everything done by hand
Document your software environment (eg, dependencies, libraries, sessionInfo () in R) Everything done by hand or not automated from data and code should be precisely documented: README files Make raw data read only You won’t edit it by accident Forces you to document or code data processing Document in code comments Track software environment: Computer architecture operating system software toolchain supporting software + infrastructure (libraries, r packages, dependencies) external dependencies (data repositories, web sites) version numbers for everything

2. Let your computer do the work
Use software that can be coded Graphical user interfaces are hard to reproduce. Telling a computer what to do maximizes reproducibility. Teaching a computer what to is telling researcher using your code what to do.

2. Let your computer do the work
Literate programming Links data, code, output, and documentation Combines code “chunks” with text and output Requires a documentation language + a programming language Produces documents in html, pdf, and more R Studio + R Notebook, Sweave, or knitr Share with RPubs

3. Contain bias CONSORT SAMPL Reporting How?
Report transparently + completely Transparently means: Readers can use the findings Replication is possible Users are not misled Findings can be pooled in meta-analyses Completely means: All results are reported, no matter their direction or statistical significance Use reporting guidelines CONSORT Consolidated Standards of Reporting Trials SAMPL Statistical analyses and methods in the published literature

3. Contain bias Confirmatory vs. exploratory How?
Distinguish confirmatory from exploratory analyses Provide access to your pre-registered analysis plan Avoid HARKing: Hypothesizing After the Results are Known Report all deviations from your study plan Report which decisions were made after looking at the data Share with RPubs

4. Archive + share your materials
Open Science Framework Where doesn’t matter. That you share matters. Get credit for your code, your data, your methods Increase the impact of your research

How can you make your research reproducible?
Power – Calculate your power Data management plan – Prepare to share Informative naming + location – The rules don’t matter. That you have rules matters. Study plan + pre-analysis plan – Pre-register your plans 1. Plan for reproducibility before you start Version control – Track your changes Documentation – Document everything done by hand Literate programming - Let your computer do the work 2. Keep track of things Reporting – Report transparently + completely Confirmatory vs. exploratory analyses – Distinguish confirmatory from exploratory 3. Contain bias Where doesn’t matter. That you share matters. 4. Archive + share your materials

How to learn more Organizing a project for reproducibility
Reproducible Science Curriculum by Jenny Bryan Data management Data Management from Software Carpentry by Orion Buske Literate programming Literate Statistical Programming by Roger Peng Version control Version Control by Software Carpentry Sharing materials Open Science Framework by Center for Open Science

The Reproducible Research Advantage

Similar presentations

Presentation on theme: "The Reproducible Research Advantage"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Reproducible Research Advantage

Similar presentations

Presentation on theme: "The Reproducible Research Advantage"— Presentation transcript:

Similar presentations

About project

Feedback