Simine Vazire UC DAVIS SPSP 1.28.16 GETTING PAPERS ACCEPTED IN SOCIAL/PERSONALITY JOURNALS POST REPLICABILITY CRISIS.

Slides:



Advertisements
Similar presentations
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Foreknowledge and free will God is essentially omniscient. So assuming that there are facts about the future, then God knows them. And it’s impossible.
Hypothesis Testing An introduction. Big picture Use a random sample to learn something about a larger population.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Chapter 10: Hypothesis Testing
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Richard M. Jacobs, OSA, Ph.D.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
More About Tests and Intervals Chapter 21. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must.
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
An Introduction to Empirical Investigations. Aims of the School To provide an advanced treatment of some of the major models, theories and issues in your.
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
How to read a scientific paper
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
How to Satisfy Reviewer B and Other Thoughts on the Publication Process: Reviewers’ Perspectives Don Roy Past Editor, Marketing Management Journal.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Six.
Chapter 21: More About Test & Intervals
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Issues concerning the interpretation of statistical significance tests.
CHAPTER 9 Testing a Claim
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Slide 21-1 Copyright © 2004 Pearson Education, Inc.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Understanding Statistics © Curriculum Press 2003     H0H0 H1H1.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Section 9.1 First Day The idea of a significance test What is a p-value?
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
April Center for Open Fostering openness, integrity, and reproducibility of scientific research.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Safer science: making psychological science more Transparent & replicable Simine Vazire UC Davis.
Experimental Psychology
Unit 5: Hypothesis Testing
CHAPTER 9 Testing a Claim
Chapter 21 More About Tests.
Warm Up Check your understanding p. 541
Meta-Analytic Thinking
CHAPTER 9 Testing a Claim
Study Pre-Registration
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
Significance Tests: The Basics
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 4 Summary.
CHAPTER 9 Testing a Claim
Presentation transcript:

Simine Vazire UC DAVIS SPSP GETTING PAPERS ACCEPTED IN SOCIAL/PERSONALITY JOURNALS POST REPLICABILITY CRISIS

 Counterintuitive is cool  Three-way interactions are ‘sophisticated’  p =.04 is just as good as p =.01  6 small-sample studies all showing significant results is impressive PRE-2011

We now know some things that predict replication:  Low p-value (close to 0)  Larger samples  Main effects/low-order interactions  Pre-registration  Internal replication (direct or pre-registered) We kinda always knew this, but now we know it more. 2016

How much do we care about replicability? How do we balance it with other values? Different journals are trying different approaches. HOW DOES THIS AFFECT JOURNALS?

CHOOSING A JOURNAL Society/Publisher that owns the journal Publication committee/board Editor in Chief Associate editors Reviewers SPSP, ARP, EASP, SESP SPPS Consortium Me 10 AEs You

 Replications accepted  Effect sizes, 95% CIs, exact p-values  Tables and figures embedded, don’t count towards 5,000 word limit  Handling editor’s name will be published with each article SPPS POLICIES

 Upon submission:  Confirm that you have reported how sample size was determined for each study and a discussion of statistical power.  Confirm that you have reported all data exclusions (e.g., dropped outliers) and how decisions about data exclusions were made.  Confirm that you have reported all measures or conditions for variables of interest to the research question(s), whether they were included in the analyses or not.  Confirm that all key results are accompanied by exact p- values, effect sizes, and 95% confidence intervals, or an explanation of why this is not possible. SPPS POLICIES

 About 300 submissions  38% desk rejected  40% rejected after review  22% revise & resubmit  Of which we anticipate 80% will get accepted  → 17% acceptance rate  Average number of days to decision: 30 days  46 excluding desk rejections  Impact factor: 2.56 SPPS SINCE JULY 2015

 What proportion of the desk rejections at SPPS have something to do with power/sample size? A.25% B.50% C.75% D.100% POP QUIZ

SPPS DESK REJECTIONS PowerSelf-report Design issues Other N = 92 “Unimportant” 0% “Brick in the wall” 0%

SPPS DESK REJECTIONS PowerSelf-report Design issues Other Triple whammy Double whammy Single whammy N = 92

 Power is ignored  Power analysis uses unrealistic effect size  A priori expectation of huge effect not justified  Justification is based on one or a few underpowered studies (imprecise estimates)  Justification is based on selective slice of literature (ignores failed replications, controversy)  Justification is based on meta-analysis that doesn’t adequately take into account publication bias & p-hacking  Power analysis uses observed or post-hoc power  Authors cite Simmons et al., 2011 to justify n = 20 COMMON PROBLEMS

 I encouraged people to conduct power analyses  I was wrong  Don’t conduct a power analysis for your specific effect unless  There is a large, unbiased meta-analysis  So, don’t conduct a power analysis YOU CAN’T WIN

 Here is a power analysis for the field: Average published effect size: d =.43 (r =.21) 80% power = 90 people per condition (N = 180 for correlational study) Maybe total sample size matters more than sample size per condition, but it’s complicated THE ONLY POWER ANALYSIS YOU’LL EVER NEED*

N = 180 (90/condition)  THIS IS FOR THE AVERAGE PUBLISHED EFFECT SIZE!  Average = half of published effect sizes are smaller than this  Published = definitely inflated  Consider planning for smaller effect  d =.25 (r =.12) -> 250 people per condition (500 total)  If you are looking for a two-way interaction or mediation or partial correlation, assume a much smaller effect!  If you are looking for a three-way interaction, pre- register and replicate. THE ONLY POWER ANALYSIS YOU’LL EVER NEED*

 You’re not doing a traditional, between-person study  Then, do a power analysis but be conservative in your effect size estimate  And report all assumptions you’re making (e.g., correlation between repeated measures)  If you want to interpret null effects, use Cumming’s planning for precision (need Very Large Sample)  If you aren’t concerned about effect size, you can use sequential analysis *UNLESS

 Hard to collect data  Unusual sample/population  Intensive procedure  Intensive coding  Expensive  Extraordinary event  High risk  In that case  Consider sequential analysis  Pay attention to the confidence intervals  Adjust your conclusions  Definitely don’t interpret null results WHAT IF YOU CAN’T?

 Pre-registration will save your butt  Truly large effects and p-hacked small-N studies look the same to the observer.  You can prove it’s the former and not the latter with pre- registration.  I’ll believe almost anything if it’s pre-registered  Direct replications have many of the benefits of pre- registration ALSO

 They’re nice, but they don’t help with the problem of false positives/low power because of:  Possibility of file drawer  Undisclosed flexibility in data collection and analysis  HARKing is still possible  So, readers can’t tell if conceptual replication was truly a strong test  Direct replication eliminates most, but not all, of these problems  Pre-registered direct replication is great WHAT ABOUT CONCEPTUAL REPLICATIONS?

We can’t all do pre-registered direct replications… How else can I convince you that my study isn’t likely to be a false positive?

 If you aren’t p-hacking, show us by being open and transparent  21-word solution: disclose all flexibility in data collection and analysis  Tell us what’s in your file drawer  Tell us what predictions were a priori and what was HARKed  Make your data and materials publicly available  I’ll forgive a lot if you show me you’re being extra open GIVE YOURSELF CREDIT

 I’m assuming everyone is p-hacking  Even without p-hacking, small samples are flukey, risk of false positive is high.  If I tell you I’m not sure your result will replicate, you’re in good company. IT’S NOT THAT I’M ASSUMING THAT YOU’RE P-HACKING

 Only if the only things that are true are boring, obvious things  Maybe that’s the case  The fact that something would be really important if true is not a good reason to publish preliminary evidence when it wouldn’t be that hard to collect more conclusive evidence.  When it would be hard, then it makes sense, but conclusions still have to be calibrated to the strength of the evidence. ARE JOURNALS GOING TO BE FULL OF BORING, OBVIOUS STUDIES?

 Power is almost necessary  Often don’t need a power analysis, just get a large sample  Pre-registration is very helpful  Direct replication is great  Conceptual replication doesn’t address replicability unless pre- registered  Transparency always helps  What gets published might look quite different than in the past  If your effect is real, you should still be able to get it in  If you’re willing to be extra open, you’ll have a better chance*  Submit your work to journals that reward your practices CONCLUSIONS

Do them Submit your papers to journals that have explicitly expressed these values As a reviewer, use these considerations when evaluating manuscripts WHAT CAN YOU DO TO MAKE THESE PRACTICES MORE COMMON?

THE END