Download presentation
Presentation is loading. Please wait.
Published byPreston White Modified over 8 years ago
1
April Clyburne-Sherin @april_cs Center for Open Science @OSFramework http://cos.io/ Fostering openness, integrity, and reproducibility of scientific research
2
Technology to enable change Training to enact change Incentives to embrace change
3
Technology to enable change Training to enact change Incentives to embrace change
4
Reproducible statistics in the health sciences April Clyburne-Sherin Reproducible Research Evangelist april@cos.io
5
Reproducible statistics in the health sciences The problem with the published literature Reproducibility Power Reporting Bias Research degrees of freedom The solution Preregistration How to evaluate the published literature p-values Effect sizes and confidence intervals How to preregister Open Science Framework
6
Reproducible statistics in the health sciences Learning objectives The findings of many studies cannot be reproduced Low powered studies produce inflated effect sizes Low powered studies produce low chance of finding true positives Researcher Degrees of Freedom lead to false positive inflations Selective reporting biases the literature Preregistration is a simple solution for reproducible statistics A p-value is not enough to establish clinical significance Effect sizes plus confidence intervals work better together
8
Button et al. (2013) Power in Neuroscience
9
Figure 1. Positive Results by Discipline. Fanelli D (2010) “ Positive ” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068. doi:10.1371/journal.pone.0010068 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0010068
10
The findings of many studies cannot be reproduced Why should you care? To increase the efficiency of your own work Hard to build off our own work, or work of others in our lab We may not have the knowledge we think we have Hard to even check this if reproducibility low
11
Current barriers to reproducibility ● Statistical o Low power o Researcher degrees of freedom o Ignoring null results ● Transparency o Poor documentation o Loss of materials and data o Infrequent sharing
12
Low powered studies mean low chance of finding a true positive ● Low reproducibility due to power o 16% chance of finding the effect twice ● Inflated effect size estimates ● Decreased likelihood of true positives
13
Researcher Degrees of Freedom lead to false positive inflations Simmons, Nelson, & Simonsohn (2012)
14
Selective reporting biases the literature Selective reporting Outcome reporting bias 1.Chan, An-Wen, et al. "Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles." Jama 291.20 (2004): 2457-2465. 2.Macleod, Malcolm R., et al. "Biomedical research: increasing value, reducing waste." The Lancet 383.9912 (2014): 101-104. 62% of trials had at least one primary outcome changed, introduced or omitted 50%+ of pre-specified outcomes not reported
15
Why does selective reporting matter? Selective reporting Outcome reporting bias Response from a trialist who had analysed data on a prespecified outcome but not reported them “When we looked at that data, it actually showed an increase in harm amongst those who got the active treatment, and we ditched it because we weren’t expecting it and we were concerned that the presentation of these data would have an impact on people’s understanding of the study findings. … The argument was, look, this intervention appears to help people, but if the paper says it may increase harm, that will, it will, be understood differently by, you know, service providers. So we buried it.” Smyth, R. M. D., et al. "Frequency and reasons for outcome reporting bias in clinical trials: interviews with trialists." Bmj 342 (2011): c7153.
16
Solution: Pre-registration Before data is collected, specify The what of the study Research question Population Primary outcome General design Pre-analysis plan Information on exact analysis that will be conducted Sample size Data processing and cleaning procedures Exclusion criterion Statistical Analyses ● Registered in a read-only format and time-stamped
17
Positive Result Rate dropped from 57% to 8% after preregistration required.
18
Pre-registration in the health sciences
19
Evaluating the literature A p-value is not enough to establish clinical significance ● Missing clinical insight such as treatment effect size, magnitude of change, or direction of the outcome ● Clinically significant differences can be statistically insignificant ● Clinically unimportant differences can be statistically significant
20
P-values What is a p-value? ● The probability of getting your data if there is no treatment effect ● p‐level of α = 0.05 means there is a 95% probability that the researcher will correctly conclude that there is no treatment effect when there is really is no treatment effect
21
P-values What is a p-value? ● Generally leads to dichotomous thinking o Either something is significant or it is not ● Influenced by the number and variability of subjects ● Changes from one sample to the next
22
The dance of the p-values
23
P-values A p-value is not enough to establish clinical significance ● P-values should be considered along with ● Effect size ● Confidence intervals ● Power ● Study design
24
Effect Size ● A measure of the magnitude of interest, tells us ‘how much’ ● Generally leads to thinking about estimation, rather than a dichotomous decision about significance ● Often combined with confidence intervals (CIs) to give us a sense of how much uncertainty there is around our estimate
25
Confidence Intervals ● Provide a ‘plausible’ range for effect size in the population o In 95% of the samples you draw from a population, the interval will contain the true population effect Not the same thing as saying that 95% of the sample ES will fall within the interval ● Can also be used for NHST o if 0 falls outside of the CI, then your test will be statistically significant
26
Better together ● Why should you always report both effect sizes and CIs? o Effect sizes, like p-values, are bouncy o Point estimate can convey an invalid sense of certainty about your ES ● CIs give you additional information about the plausible upper and lower bounds of bouncing ESs
27
Better together
28
So why use the ESs + CIs? ● Give you more fine grained information about your data o point estimates, plausible values, and uncertainty ● Give more information for replication attempts ● Used for meta-analytic calculations, so are more helpful for accumulating knowledge across studies
29
Low powered studies still produce inflated effect sizes ● If I use ES and CIs rather than p-values, do I still have to worry about sample size? o Underpowered studies tend to over-estimate ES o Larger samples will lead to better estimation of the ES and smaller CIs They will have higher levels of precision
30
Precision isn’t cheap ● To get high precision (narrow CIs) in any one study, you need large samples o Example: You need about 250 people to get an accurate, stable estimate of the ES in psychology
31
Precision isn’t cheap
32
Free training on how to make research more reproducible http://cos.io/stats_consulting
33
Find this presentation at https://osf.io/rwtyf/ Questions: contact@cos.iocontact@cos.io
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.