April Center for Open Fostering openness, integrity, and reproducibility of scientific research.

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Advertisements

Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
Statistical Issues in Research Planning and Evaluation
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Meta-analysis & psychotherapy outcome research
Critical Appraisal of an Article by Dr. I. Selvaraj B. SC. ,M. B. B. S
2 Accuracy and Precision Accuracy How close a measurement is to the actual or “true value” high accuracy true value low accuracy true value 3.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Estimation and Hypothesis Testing Now the real fun begins.
Hypothesis Testing.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Chapter 8 Introduction to Hypothesis Testing
Empowering Evidence: Basic Statistics June 3, 2015 Julian Wolfson, Ph.D. Division of Biostatistics School of Public Health.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
N318b Winter 2002 Nursing Statistics Hypothesis and Inference tests, Type I and II errors, p-values, Confidence Intervals Lecture 5.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
2 Accuracy and Precision Accuracy How close a measurement is to the actual or “true value” high accuracy true value low accuracy true value 3.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Copyright © 2010, 2007, 2004 Pearson Education, Inc Section 8-2 Basics of Hypothesis Testing.
MATH 2400 Ch. 15 Notes.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
How confident are we in the estimation of mean/proportion we have calculated?
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Issues concerning the interpretation of statistical significance tests.
Confidence Intervals, Effect Size and Power Chapter 8.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Probability & Significance Everything you always wanted to know about p-values* *but were afraid to ask Evidence Based Chiropractic April 10, 2003.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Compliance Original Study Design Randomised Surgical care Medical care.
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
1 Probability and Statistics Confidence Intervals.
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
Confidence Intervals and Hypothesis Tests Week 5.
Webinar on increasing openness and reproducibility April Clyburne-Sherin Reproducible Research Evangelist
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
AP Stat 2007 Free Response. 1. A. Roughly speaking, the standard deviation (s = 2.141) measures a “typical” distance between the individual discoloration.
Reporting in health research Why it matters How to improve Presentation for the Center for Open Science July 10, 2015 April Clyburne-Sherin.
CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.
Fostering openness, integrity, and reproducibility of scientific research April Center for Open
Evidence-Based Medicine Appendix 1: Confidence Intervals
Null Hypothesis Testing
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations Greenland et al (2016)
The objective of this lecture is to know the role of random error (chance) in factor-outcome relation and the types of systematic errors (Bias)
Interpreting Epidemiologic Results.
The end of statistical significance
Presentation transcript:

April Center for Open Fostering openness, integrity, and reproducibility of scientific research

Technology to enable change Training to enact change Incentives to embrace change

Technology to enable change Training to enact change Incentives to embrace change

Reproducible statistics in the health sciences April Clyburne-Sherin Reproducible Research Evangelist

Reproducible statistics in the health sciences The problem with the published literature Reproducibility Power Reporting Bias Research degrees of freedom The solution Preregistration How to evaluate the published literature p-values Effect sizes and confidence intervals How to preregister Open Science Framework

Reproducible statistics in the health sciences Learning objectives The findings of many studies cannot be reproduced Low powered studies produce inflated effect sizes Low powered studies produce low chance of finding true positives Researcher Degrees of Freedom lead to false positive inflations Selective reporting biases the literature Preregistration is a simple solution for reproducible statistics A p-value is not enough to establish clinical significance Effect sizes plus confidence intervals work better together

Button et al. (2013) Power in Neuroscience

Figure 1. Positive Results by Discipline. Fanelli D (2010) “ Positive ” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e doi: /journal.pone

The findings of many studies cannot be reproduced Why should you care? To increase the efficiency of your own work Hard to build off our own work, or work of others in our lab We may not have the knowledge we think we have Hard to even check this if reproducibility low

Current barriers to reproducibility ● Statistical o Low power o Researcher degrees of freedom o Ignoring null results ● Transparency o Poor documentation o Loss of materials and data o Infrequent sharing

Low powered studies mean low chance of finding a true positive ● Low reproducibility due to power o 16% chance of finding the effect twice ● Inflated effect size estimates ● Decreased likelihood of true positives

Researcher Degrees of Freedom lead to false positive inflations Simmons, Nelson, & Simonsohn (2012)

Selective reporting biases the literature Selective reporting  Outcome reporting bias 1.Chan, An-Wen, et al. "Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles." Jama (2004): Macleod, Malcolm R., et al. "Biomedical research: increasing value, reducing waste." The Lancet (2014): % of trials had at least one primary outcome changed, introduced or omitted 50%+ of pre-specified outcomes not reported

Why does selective reporting matter? Selective reporting  Outcome reporting bias Response from a trialist who had analysed data on a prespecified outcome but not reported them “When we looked at that data, it actually showed an increase in harm amongst those who got the active treatment, and we ditched it because we weren’t expecting it and we were concerned that the presentation of these data would have an impact on people’s understanding of the study findings. … The argument was, look, this intervention appears to help people, but if the paper says it may increase harm, that will, it will, be understood differently by, you know, service providers. So we buried it.” Smyth, R. M. D., et al. "Frequency and reasons for outcome reporting bias in clinical trials: interviews with trialists." Bmj 342 (2011): c7153.

Solution: Pre-registration Before data is collected, specify The what of the study Research question Population Primary outcome General design Pre-analysis plan Information on exact analysis that will be conducted Sample size Data processing and cleaning procedures Exclusion criterion Statistical Analyses ● Registered in a read-only format and time-stamped

Positive Result Rate dropped from 57% to 8% after preregistration required.

Pre-registration in the health sciences

Evaluating the literature A p-value is not enough to establish clinical significance ● Missing clinical insight such as treatment effect size, magnitude of change, or direction of the outcome ● Clinically significant differences can be statistically insignificant ● Clinically unimportant differences can be statistically significant

P-values What is a p-value? ● The probability of getting your data if there is no treatment effect ● p‐level of α = 0.05 means there is a 95% probability that the researcher will correctly conclude that there is no treatment effect when there is really is no treatment effect

P-values What is a p-value? ● Generally leads to dichotomous thinking o Either something is significant or it is not ● Influenced by the number and variability of subjects ● Changes from one sample to the next

The dance of the p-values

P-values A p-value is not enough to establish clinical significance ● P-values should be considered along with ● Effect size ● Confidence intervals ● Power ● Study design

Effect Size ● A measure of the magnitude of interest, tells us ‘how much’ ● Generally leads to thinking about estimation, rather than a dichotomous decision about significance ● Often combined with confidence intervals (CIs) to give us a sense of how much uncertainty there is around our estimate

Confidence Intervals ● Provide a ‘plausible’ range for effect size in the population o In 95% of the samples you draw from a population, the interval will contain the true population effect  Not the same thing as saying that 95% of the sample ES will fall within the interval ● Can also be used for NHST o if 0 falls outside of the CI, then your test will be statistically significant

Better together ● Why should you always report both effect sizes and CIs? o Effect sizes, like p-values, are bouncy o Point estimate can convey an invalid sense of certainty about your ES ● CIs give you additional information about the plausible upper and lower bounds of bouncing ESs

Better together

So why use the ESs + CIs? ● Give you more fine grained information about your data o point estimates, plausible values, and uncertainty ● Give more information for replication attempts ● Used for meta-analytic calculations, so are more helpful for accumulating knowledge across studies

Low powered studies still produce inflated effect sizes ● If I use ES and CIs rather than p-values, do I still have to worry about sample size? o Underpowered studies tend to over-estimate ES o Larger samples will lead to better estimation of the ES and smaller CIs  They will have higher levels of precision

Precision isn’t cheap ● To get high precision (narrow CIs) in any one study, you need large samples o Example: You need about 250 people to get an accurate, stable estimate of the ES in psychology

Precision isn’t cheap

Free training on how to make research more reproducible

Find this presentation at Questions: