R.J.Watt D.I.Donaldson University of Stirling High Expectations R.J.Watt D.I.Donaldson University of Stirling
A typical population We start with a simulated population of known effect size: r=0.3
A typical design We sample from the population: 42 participants random sampling
A random sample p=0.03 (2-tailed) null hypothesis is rejected n=42 sample: r=0.34 population: r=0.3
Another random sample p=0.055 (2-tailed) null hypothesis is not rejected This is a miss (because we know that the population has an effect) n=42 sample: r=0.30 population: r=0.3
Step1: how common are misses? p value p=0.05 p=0.01 p=0.001 NB: 10% p<0.0016 10% p>0.47 In this design, 50% of samples generate a miss.
Step 1: miss rates are high In this design, 50% of samples generate a miss. 2 implications: the effect will be found by someone and reported the effect has a 50% chance of subsequent “failure to replicate” NB. Privileged viewpoint: …we know that the effect exists.
Back to our 2 samples
p value for replicate alone p value for combined data original study
p value for combined data p value for replicate alone replications fail combination fails p value for replicate alone replications individually significant replications individually fail but combined data significant
p value for replicate alone p value for combined data The two samples we started with p value for replicate alone p value for combined data original study replication
p value for replicate alone p value for combined data The two samples we started with p value for replicate alone p value for combined data original study replication Replication failed Combined data is more significant than original alone.
p value for replicate alone p value for combined data Many further attempts at replication p value for replicate alone p value for combined data original study replications failure to replicate 51%
Step 1 summary: known population for our sample n=42, r=0.34 (p=0.04), the probability of a failure to replicate is 50% What if we don’t know the population it came from?
Step 2: unknown population Same original sample now suppose population is unknown Constraints on population: distribution of effect sizes that could produce current sample a priori distribution of effect sizes
Constraint 1 The density function for populations that our sample could have arisen from.
Constraint 2 The simulated density function for effect sizes in Psychology.
Constraint 2 The simulated density function for effect sizes in Psychology.
Constraint 2 We use this. We then see this. The simulated density function for effect sizes in Psychology.
Constraint 2 The simulated density function for effect sizes in Psychology - matches the Science paper
Step 2: unknown population We draw replication samples from anywhere in this distribution of possible source populations: and ask how often they are significant. &
p value for replicate alone p value for combined data original study replications failure to replicate 60%
Step 2 summary: unknown population for our sample n=42, r=0.34 (p=0.04), given an a-priori distribution of population effect sizes the probability of a failure to replicate is 60% What about other combinations of r and n?
Final step: other r & n r: we already have an a-priori Psychology distribution
Final step: other r & n r: we already have an a-priori distribution n: 90 years of journal articles: median n: JEP: 30 QJEP: 25
The Simulation Process “published” % success >1000 Original Studies 1000 exact replications count: p<0.05 Replicate r chosen at random from a-priori population n chosen at random from [10 20 40 80] r chosen at random from all source populations n kept as original keep p<0.05
Final Step: other r & n p-value for original sample proportion of successful replications 1e-05 0.0001 0.001 0.01 0.1 1.0 0.2 0.4 0.6 0.8 1 10 20 40 80 n mean replication rate = 40-45% Each dot is one study: r is random n as indicated
Discussion 1 The only free-parameter in this is a-priori distribution of effect sizes in Psychology. The result also depends on distribution of n in Psychology
Discussion 1 If you accept this, then it necessarily follows or if: everyone is behaving impeccably then: p(replication) is 40-45% in psychology or if: p(replication) is 40-45% in psychology then: everyone is behaving impeccably What about other a-priori distributions?
Overall Summary effects of a-priori assumption on outcome:
Overall Summary effects of a-priori assumption on outcome: we thought: %replication should be up here actually: %replication should be down here
This is inescapable. Discussion 1(a) if: everyone is behaving impeccably then: p(replication) is typically 40-45% This is inescapable.
The End very much inspired by: Cumming, B (2011) Understanding the New Statistics
Links rep-p> al nh e_es
Start with an population that generates a power of 42% - to match the typical value for Psychology Only the green studies are published.
This is the distribution of all expected effect sizes. All measured effect sizes mean(r) = 0.3
But this is the distribution of published effect sizes. (ie those where p<0.05) mean(r) = 0.45
Now we ask what the power is calculated for the published sample effect size of 0.45 The answer is 80%. …compared with the actual power of 42%.
Power Calculations Actual power = 42% Power calculated from published effect sizes = 80% This difference arises because published effect sizes are over-estimates caused by publication bias
This graph shows how much power is over-estimated. 0.2 0.4 0.6 0.8 1 Real (Population) Power A p a r e n t ( S m l ) P o w This graph shows how much power is over-estimated.
Slide 32
anything that increases Type II error rate will make matters worse What can we do? anything that increases Type II error rate will make matters worse eg reducing alpha (0.05) anything that decreases Type II error rate might make matters better** eg adaptive sampling ** except that 42% power may be maximally reinforcing for the researcher
Slide 32
null hypothesis testing - the outcomes Population no effect has effect p<0.05 Type I error correct p>0.05 Type II error A pair of studies, with different outcomes, occupy the same column. It cannot be known which column. So failure to replicate either of: 1st study made Type I error & 2nd study correct 1st study correct & 2nd study made Type II error It cannot be known which.
we get : we want : p(Type I error): p(Type II error) given null hypothesis is true, probability of obtaining your result or better we want : p(Type II error) given null hypothesis is not true
think about: compared with: these are unrelated given it is a weekday probability of dying in hospital compared with: given it is not a weekday what is the probability of dying in hospital? these are unrelated
Slide 32
The effects of publication bias on the effect sizes that are seen. red: n=10 green: n=20 blue: n=40 yellow: n=80
Slide 32
Finding the best fit value for prior effect size. Minimum chi-sqr at 0.28 Frequency of effect sizes: in Science paper in whole simulated population (with ef=0.28) in simulated published studies