Presentation is loading. Please wait.

Presentation is loading. Please wait.

R.J.Watt D.I.Donaldson University of Stirling

Similar presentations


Presentation on theme: "R.J.Watt D.I.Donaldson University of Stirling"— Presentation transcript:

1 R.J.Watt D.I.Donaldson University of Stirling
High Expectations R.J.Watt D.I.Donaldson University of Stirling

2 A typical population We start with a simulated population of known effect size: r=0.3

3 A typical design We sample from the population: 42 participants
random sampling

4 A random sample p=0.03 (2-tailed) null hypothesis is rejected n=42
sample: r=0.34 population: r=0.3

5 Another random sample p=0.055 (2-tailed)
null hypothesis is not rejected This is a miss (because we know that the population has an effect) n=42 sample: r=0.30 population: r=0.3

6 Step1: how common are misses?
p value p=0.05 p=0.01 p=0.001 NB: 10% p<0.0016 10% p>0.47 In this design, 50% of samples generate a miss.

7 Step 1: miss rates are high
In this design, 50% of samples generate a miss. 2 implications: the effect will be found by someone and reported the effect has a 50% chance of subsequent “failure to replicate” NB. Privileged viewpoint: …we know that the effect exists.

8 Back to our 2 samples

9 p value for replicate alone p value for combined data
original study

10 p value for combined data p value for replicate alone
replications fail combination fails p value for replicate alone replications individually significant replications individually fail but combined data significant

11 p value for replicate alone p value for combined data
The two samples we started with p value for replicate alone p value for combined data original study replication

12 p value for replicate alone p value for combined data
The two samples we started with p value for replicate alone p value for combined data original study replication Replication failed Combined data is more significant than original alone.

13 p value for replicate alone p value for combined data
Many further attempts at replication p value for replicate alone p value for combined data original study replications failure to replicate  51%

14 Step 1 summary: known population
for our sample n=42, r=0.34 (p=0.04), the probability of a failure to replicate is 50% What if we don’t know the population it came from?

15 Step 2: unknown population
Same original sample now suppose population is unknown Constraints on population: distribution of effect sizes that could produce current sample a priori distribution of effect sizes

16 Constraint 1 The density function for populations that our sample could have arisen from.

17 Constraint 2 The simulated density function for effect sizes in Psychology.

18 Constraint 2 The simulated density function for effect sizes in Psychology.

19 Constraint 2 We use this. We then see this.
The simulated density function for effect sizes in Psychology.

20 Constraint 2 The simulated density function for effect sizes in Psychology - matches the Science paper

21 Step 2: unknown population
We draw replication samples from anywhere in this distribution of possible source populations: and ask how often they are significant. &

22 p value for replicate alone p value for combined data
original study replications failure to replicate  60%

23 Step 2 summary: unknown population
for our sample n=42, r=0.34 (p=0.04), given an a-priori distribution of population effect sizes the probability of a failure to replicate is 60% What about other combinations of r and n?

24 Final step: other r & n r: we already have an a-priori Psychology distribution

25 Final step: other r & n r: we already have an a-priori distribution
n: 90 years of journal articles: median n: JEP: 30 QJEP: 25

26 The Simulation Process
“published” % success >1000 Original Studies 1000 exact replications count: p<0.05 Replicate r chosen at random from a-priori population n chosen at random from [ ] r chosen at random from all source populations n kept as original keep p<0.05

27 Final Step: other r & n p-value for original sample
proportion of successful replications 1e-05 0.0001 0.001 0.01 0.1 1.0 0.2 0.4 0.6 0.8 1 10 20 40 80 n mean replication rate = 40-45% Each dot is one study: r is random n as indicated

28 Discussion 1 The only free-parameter in this is
a-priori distribution of effect sizes in Psychology. The result also depends on distribution of n in Psychology

29 Discussion 1 If you accept this, then it necessarily follows or
if: everyone is behaving impeccably then: p(replication) is 40-45% in psychology or if: p(replication) is 40-45% in psychology then: everyone is behaving impeccably What about other a-priori distributions?

30 Overall Summary effects of a-priori assumption on outcome:

31 Overall Summary effects of a-priori assumption on outcome:
we thought: %replication should be up here actually: %replication should be down here

32 This is inescapable. Discussion 1(a)
if: everyone is behaving impeccably then: p(replication) is typically 40-45% This is inescapable.

33 The End very much inspired by: Cumming, B (2011) Understanding the New Statistics

34 Links rep-p> al nh e_es

35 Start with an population that generates a power of 42% - to match the typical value for Psychology
Only the green studies are published.

36 This is the distribution of all expected effect sizes.
All measured effect sizes mean(r) = 0.3

37 But this is the distribution of published effect sizes.
(ie those where p<0.05) mean(r) = 0.45

38 Now we ask what the power is calculated for the published sample effect size of 0.45
The answer is 80%. …compared with the actual power of 42%.

39 Power Calculations Actual power = 42%
Power calculated from published effect sizes = 80% This difference arises because published effect sizes are over-estimates caused by publication bias

40 This graph shows how much power is over-estimated.
0.2 0.4 0.6 0.8 1 Real (Population) Power A p a r e n t ( S m l ) P o w This graph shows how much power is over-estimated.

41 Slide 32

42 anything that increases Type II error rate will make matters worse
What can we do? anything that increases Type II error rate will make matters worse eg reducing alpha (0.05) anything that decreases Type II error rate might make matters better** eg adaptive sampling ** except that 42% power may be maximally reinforcing for the researcher

43

44 Slide 32

45 null hypothesis testing - the outcomes
Population no effect has effect p<0.05 Type I error correct p>0.05 Type II error A pair of studies, with different outcomes, occupy the same column. It cannot be known which column. So failure to replicate  either of: 1st study made Type I error & 2nd study correct 1st study correct & 2nd study made Type II error It cannot be known which.

46 we get : we want : p(Type I error): p(Type II error)
given null hypothesis is true, probability of obtaining your result or better we want : p(Type II error) given null hypothesis is not true

47 think about: compared with: these are unrelated given it is a weekday
probability of dying in hospital compared with: given it is not a weekday what is the probability of dying in hospital? these are unrelated

48 Slide 32

49 The effects of publication bias on the effect sizes that are seen.
red: n=10 green: n=20 blue: n=40 yellow: n=80

50 Slide 32

51 Finding the best fit value for prior effect size.
Minimum chi-sqr at 0.28 Frequency of effect sizes: in Science paper in whole simulated population (with ef=0.28) in simulated published studies


Download ppt "R.J.Watt D.I.Donaldson University of Stirling"

Similar presentations


Ads by Google