Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons Penn also Photo not necessary
Definition p-hacking: exploiting researchers degrees-of- freedom seeking p<.05
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?
~ Median study: n=20 False-Positive Psych: n>20 What can you reliably detect with n=20? Mturk study. – N=674 – Why not published ds?
n=20 is enough for: Men taller than women n=6 People above median age closer to retirement n=10 Women, more shoes than men n=15
n=20 is not enough for: People who like spicy food are more likely to like Indian food n = 27 Liberals rate social equality as more important than do conservatives n = 34 People who like eggs report eating egg salad more often n = 47 Men weigh more than women n = 47 Smokers think smoking is less likely to kill someone than do non-smokers n = 146
People who like spicy food are more likely to like Indian food n = 27 Liberals rate social equality as more important than do conservatives n = 34 People who like eggs report eating egg salad more often n = 47 Men weigh more than women n = 47 Smokers think smoking is less likely to kill someone than do non-smokers n = 146
Are you studying a bigger effect than: Men weigh more than women? If not, use n>50
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?
Estimates are way off Subjects confused? Big outliers
p <.03 Estimates are way off Subjects confused? Big outliers
p <.03 Study 1?
Run calories study again. Same exclusion rule.
Why not just conceptual replication? Restart p-hacking clock Failures do not count
Replications Conceptual – Rule out confounds – Rule in generalizability Direct – Rule out false-positive
Life after p-hacking n>50 Direct replications 21 words (Google it) Compromise writing Who to hire What about Bayesian?
How can an organic farmer compete?
How can an organic researcher compete? If you determined sample size in advance Say it. If you did not drop variables Say it. If you did not drop conditions Say it.
21 Word Solution get.pdf here Footnote 1 We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study. Organic Farmer Organic Researcher
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?
Compromise writing While reviewers still in dark ages. Have it both ways. Clean version in main text – All studies worked & < 2500 words Supplement/footnote – n=100 n=150 – p=.08 w/o exclusion – Data and materials online Only reformers read small print Organic 21 words applies. Everybody likes the paper
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?
If you hire based on quantity you pass on these guys
Whats the alternative to counting papers? Rookies: Best 1 Tenure: Best 3 Full: Best 5 Try it. It is a powerful question. Whats her best paper?
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian? Only speak for myself here. My prior: Bayesians will be unhappy in 321
P-hacking also invalidates Bayesian results
Let me say that again
Bayesian proposals for Psych 1) Bayesian t-test Replications use it sometimes Turns out – α = 5% 2) Bayesian estimation Latest JEP:G. Turns out – Changes nothing 1%
t-test vs Bayesian Estimation changes nothing How similar? Results change by less than if we dropped 1 observation at random.
But! Isnt data-peeking OK for Bayes? – Not when used for hypothesis testing Also: – Dropped subjects, measures, conditions invalidate all inference.
P-hacking Bayesian stats Drunk driving leather seats Good reasons to go Bayesian do not include p-hacking.
Next slide is the last.
Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian? Only speak for myself here. Leif Nelson UC Berkeley Joe Simmons Penn