Why do so many researchers misreport p-values?

Slides:



Advertisements
Similar presentations
Determining how well an individual paper satisfies Poppers criteria Popper said good science involves: A substantial theory being put up to test Safe background.
Advertisements

Multiple Analysis of Variance – MANOVA
Bivariate Analysis Cross-tabulation and chi-square.
Analysis of Variance (ANOVA) Statistics for the Social Sciences Psychology 340 Spring 2010.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Statistics for Managers Using Microsoft® Excel 5th Edition
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Descriptive Statistics
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi
Chapter 1: Introduction to Statistics
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Chapter 8 Introduction to Hypothesis Testing
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
© 2001 Dr. Laura Snodgrass, Ph.D.1 Conducting Experiments Choosing methods Sampling and sample size Independent variables Dependent variables Controls.
CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
One-Way Analysis of Covariance (ANCOVA)
The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each variable Participants or subjects are the.
Analysis and Interpretation: Analysis of Variance (ANOVA)
Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Advanced Data Analytics
Chapter 15 Multiple Regression Model Building
PSY 626: Bayesian Statistics for Psychological Science
Psych 231: Research Methods in Psychology
Handling Attrition and Non-response in the 1970 British Cohort Study
Selecting the Best Measure for Your Study
Hypothesis and research questions
Experimental Psychology
Supplementary Table 1. PRISMA checklist
Research design I: Experimental design and quasi-experimental research
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Introduction to the General Linear Model (GLM)
Study Pre-Registration
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Experimental Psychology PSY 433
Making Sense of Advanced Statistical Procedures in Research Articles
Chapter Eight: Quantitative Methods
PSY 626: Bayesian Statistics for Psychological Science
Chapter 6 Research Validity.
Cross Sectional Designs
Reasoning in Psychology Using Statistics
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
Main Effects and Interaction Effects
Joint Statistical Meetings, Vancouver, August 1, 2018
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Rick Hoyle Duke Dept. of Psychology & Neuroscience
Psych 231: Research Methods in Psychology
Reasoning in Psychology Using Statistics
I. Introduction and Data Collection C. Conducting a Study
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 4 Summary.
Experimental Psychology PSY 433
Open Science & Reproducibility
Presentation transcript:

Why do so many researchers misreport p-values? Jelte M. Wicherts

Misreporting of statistical results in psychology Of 142 high-impact papers, 53.5 % contained errors. 17.6% contained gross errors. 38% Nature 25% BMJ 36% Psychiatry journals REPLICATION ALERT Source: Bakker, M. & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666-678.

Reporting errors are related to failure to share data DATA NOT SHARED (N=28) DATA SHARED (N=21) REPLICATION ALERT Source: Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE, 6, e 26828.

Misreported results are common Source: Nuijten, M. B., Hartgerink, C.H.J., Van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The Prevalence of Statistical Reporting Errors in Psychology (1985-2013). Behavior Research Methods

Misreported results have been around for decades Source: Nuijten, M. B., Hartgerink, C.H.J., Van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The Prevalence of Statistical Reporting Errors in Psychology (1985-2013). Behavior Research Methods

Misreported results indicate other p-hacking tricks Questionable Research Practice Prev. In a paper, failing to report all of a study’s dependent measures 78% Deciding whether to collect more data after looking to see whether the results were significant 72% In a paper, selectively reporting studies that “worked” 67% Deciding whether to exclude data after looking at the impact of doing so on the results 62% In a paper, reporting an unexpected finding as having been predicted from the start 54% In a paper, failing to report all of a study’s conditions 42% In a paper, “rounding off” a p value (e.g., reporting that a p value of .054 is less than .05) 39% Stopping collecting data earlier than planned because one found the result that one had been looking for 36% In a paper, claiming that results are unaffected by demographic variables (e.g., gender) when one is actually unsure (or knows that they do) 13% Falsifying data 9% Source: John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science.

P-hacking ? ? Call this a “failed study” P>.05 Remove outliers (Z > |2|) P>.05 Call this a “failed study” Perform new study P<.05 P>.05 ? Add 10 cases P>.05 P<.05 Redo analysis with adapted dependent var. P<.05 ? P>.05 Effect! 65% 2nd dependent 57% sequential testing 41% removed outliers 23% misreported p value 48% publication bias Planned analysis P<.05 Misreport the p-value as being <.05 Write paper

But why?

Design Implement multiple flexible independent variables Measure multiple flexible dependent variables Measure additional variables that enable data selection (e.g., background variables, awareness checks). Use a vague and flexible theory and non-falsifiable hypotheses Use scales that show ceiling or floor effects (creating artefactual interactions) Run multiple small (underpowered) studies Create confounds in the design (e.g., different questions, lack of blinding)

During data collection Not using random assignment to conditions Incomplete blinding of experimenters and/or participants (experimenter effects, disclosing hypotheses to participants, using non-naïve participants) Discarding data depending on outcomes or observed behavior Quitting data collection after having received a "hit" rather than a "miss" Adding data or quitting data collection earlier based on intermediate significance testing Filling in of missing values or decisions in coding in an unblinded manner

Analyses Various ways to deal with violated assumptions of statistical tests (non-parametric analysis, transformations) Use ad hoc scales by deleting, recoding, combining, weighing, or transforming item scores. Choices on how to deal with missingness and outliers Use of alternative inclusion and exclusion criteria Choice between among multiple independent variables, conditions, and covariates Choice between among multiple dependent variables Choice among statistical models, estimation methods, and inference criteria (Bayes factors, Alpha, one-sided testing)

Reporting Report only a subset of many analyses Failure to report sensitivity analyses in the report Failure to report data exclusions, missingness, transformations Not adequately testing interactions (i.e., only comparing simple effects across conditions) Failure to correct for multiple testing HARKing or explorative studies presented as confirmatory Not reporting so-called “failed studies” Misreport p-values

P-value p-value Various ways to deal with violated assumptions Use ad hoc scalesChoices on how to deal with missingness and outliers Use of alternative inclusion and exclusion criteria Choice between among multiple independent variables, conditions, and covariates Choice between among multiple dependent variables Choice among statistical models, estimation methods, and inference criteria (Bayes factors, Alpha, one-sided testing) Implement multiple flexible independent variables Measure multiple flexible dependent variables Measure additional variables that enable data selection (e.g., background variables, awareness checks). Use a vague and flexible theory and non-falsifiable hypotheses Use scales that show ceiling or floor effects (creating artefactual interactions) Run multiple small (underpowered) studies Create confounds in the design (e.g., different questions, lack of blinding) p

Thanks! @JelteWicherts http://metaresearch.nl J.M.Wicherts@uvt.nl Marcel van Assen Coosje Veldkamp Chris Hartgerink Marjan Bakker Paulette Flore Robbie van Aert Michèle Nuijten Hilde Augusteijn + Sacha Epskamp @JelteWicherts http://metaresearch.nl J.M.Wicherts@uvt.nl

Overarching Insufficient detail in pre-registration Combining various DFs Pre-registering a study multiple times

Cognitive effects of alcohol and caffeine Or: does it help to drink coffee on the “morning after”?