R.J.Watt D.I.Donaldson University of Stirling

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Random variable Distribution. 200 trials where I flipped the coin 50 times and counted heads no_of_heads in a trial.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
AP Statistics – Chapter 9 Test Review
Topic 6: Introduction to Hypothesis Testing
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Review: What influences confidence intervals?
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
Chapter Seventeen HYPOTHESIS TESTING
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Inference about a Mean Part II
Descriptive Statistics
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. A sampling error occurs.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Descriptive statistics Inferential statistics
Hypothesis Testing.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Overview Basics of Hypothesis Testing
Chapter 8 Introduction to Hypothesis Testing
P Values - part 4 The P value and ‘rules’ Robin Beaumont 10/03/2012 With much help from Professor Geoff Cumming.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Confidence intervals and hypothesis testing Petter Mostad
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Power and Sample Size Anquan Zhang presents For Measurement and Statistics Club.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
© Copyright McGraw-Hill 2004
Statistical Techniques
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. Sampling error means.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Prof. Robert Martin Southeastern Louisiana University.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
TESTING STATISTICAL HYPOTHESES
Review and Preview and Basics of Hypothesis Testing
Inference and Tests of Hypotheses
AP STATISTICS REVIEW INFERENCE
Introduction to Inferential Statistics
Inferential Statistics
When we free ourselves of desire,
CONCEPTS OF HYPOTHESIS TESTING
Week 11 Chapter 17. Testing Hypotheses about Proportions
Chapter 9 Hypothesis Testing.
Elementary Statistics: Picturing The World
Chi-Square Test.
Review: What influences confidence intervals?
Chi-Square Test.
Discrete Event Simulation - 4
Chi-Square Test.
Testing Hypotheses I Lesson 9.
Chapter 9 Hypothesis Testing: Single Population
Type I and Type II Errors
Presentation transcript:

R.J.Watt D.I.Donaldson University of Stirling High Expectations R.J.Watt D.I.Donaldson University of Stirling

A typical population We start with a simulated population of known effect size: r=0.3

A typical design We sample from the population: 42 participants random sampling

A random sample p=0.03 (2-tailed) null hypothesis is rejected n=42 sample: r=0.34 population: r=0.3

Another random sample p=0.055 (2-tailed) null hypothesis is not rejected This is a miss (because we know that the population has an effect) n=42 sample: r=0.30 population: r=0.3

Step1: how common are misses? p value p=0.05 p=0.01 p=0.001 NB: 10% p<0.0016 10% p>0.47 In this design, 50% of samples generate a miss.

Step 1: miss rates are high In this design, 50% of samples generate a miss. 2 implications: the effect will be found by someone and reported the effect has a 50% chance of subsequent “failure to replicate” NB. Privileged viewpoint: …we know that the effect exists.

Back to our 2 samples

p value for replicate alone p value for combined data original study

p value for combined data p value for replicate alone replications fail combination fails p value for replicate alone replications individually significant replications individually fail but combined data significant

p value for replicate alone p value for combined data The two samples we started with p value for replicate alone p value for combined data original study replication

p value for replicate alone p value for combined data The two samples we started with p value for replicate alone p value for combined data original study replication Replication failed Combined data is more significant than original alone.

p value for replicate alone p value for combined data Many further attempts at replication p value for replicate alone p value for combined data original study replications failure to replicate  51%

Step 1 summary: known population for our sample n=42, r=0.34 (p=0.04), the probability of a failure to replicate is 50% What if we don’t know the population it came from?

Step 2: unknown population Same original sample now suppose population is unknown Constraints on population: distribution of effect sizes that could produce current sample a priori distribution of effect sizes

Constraint 1 The density function for populations that our sample could have arisen from.

Constraint 2 The simulated density function for effect sizes in Psychology.

Constraint 2 The simulated density function for effect sizes in Psychology.

Constraint 2 We use this. We then see this. The simulated density function for effect sizes in Psychology.

Constraint 2 The simulated density function for effect sizes in Psychology - matches the Science paper

Step 2: unknown population We draw replication samples from anywhere in this distribution of possible source populations: and ask how often they are significant. &

p value for replicate alone p value for combined data original study replications failure to replicate  60%

Step 2 summary: unknown population for our sample n=42, r=0.34 (p=0.04), given an a-priori distribution of population effect sizes the probability of a failure to replicate is 60% What about other combinations of r and n?

Final step: other r & n r: we already have an a-priori Psychology distribution

Final step: other r & n r: we already have an a-priori distribution n: 90 years of journal articles: median n: JEP: 30 QJEP: 25

The Simulation Process “published” % success >1000 Original Studies 1000 exact replications count: p<0.05 Replicate r chosen at random from a-priori population n chosen at random from [10 20 40 80] r chosen at random from all source populations n kept as original keep p<0.05

Final Step: other r & n p-value for original sample proportion of successful replications 1e-05 0.0001 0.001 0.01 0.1 1.0 0.2 0.4 0.6 0.8 1 10 20 40 80 n mean replication rate = 40-45% Each dot is one study: r is random n as indicated

Discussion 1 The only free-parameter in this is a-priori distribution of effect sizes in Psychology. The result also depends on distribution of n in Psychology

Discussion 1 If you accept this, then it necessarily follows or if: everyone is behaving impeccably then: p(replication) is 40-45% in psychology or if: p(replication) is 40-45% in psychology then: everyone is behaving impeccably What about other a-priori distributions?

Overall Summary effects of a-priori assumption on outcome:

Overall Summary effects of a-priori assumption on outcome: we thought: %replication should be up here actually: %replication should be down here

This is inescapable. Discussion 1(a) if: everyone is behaving impeccably then: p(replication) is typically 40-45% This is inescapable.

The End very much inspired by: Cumming, B (2011) Understanding the New Statistics

Links rep-p> al nh e_es

Start with an population that generates a power of 42% - to match the typical value for Psychology Only the green studies are published.

This is the distribution of all expected effect sizes. All measured effect sizes mean(r) = 0.3

But this is the distribution of published effect sizes. (ie those where p<0.05) mean(r) = 0.45

Now we ask what the power is calculated for the published sample effect size of 0.45 The answer is 80%. …compared with the actual power of 42%.

Power Calculations Actual power = 42% Power calculated from published effect sizes = 80% This difference arises because published effect sizes are over-estimates caused by publication bias

This graph shows how much power is over-estimated. 0.2 0.4 0.6 0.8 1 Real (Population) Power A p a r e n t ( S m l ) P o w This graph shows how much power is over-estimated.

Slide 32

anything that increases Type II error rate will make matters worse What can we do? anything that increases Type II error rate will make matters worse eg reducing alpha (0.05) anything that decreases Type II error rate might make matters better** eg adaptive sampling ** except that 42% power may be maximally reinforcing for the researcher

Slide 32

null hypothesis testing - the outcomes Population no effect has effect p<0.05 Type I error correct p>0.05 Type II error A pair of studies, with different outcomes, occupy the same column. It cannot be known which column. So failure to replicate  either of: 1st study made Type I error & 2nd study correct 1st study correct & 2nd study made Type II error It cannot be known which.

we get : we want : p(Type I error): p(Type II error) given null hypothesis is true, probability of obtaining your result or better we want : p(Type II error) given null hypothesis is not true

think about: compared with: these are unrelated given it is a weekday probability of dying in hospital compared with: given it is not a weekday what is the probability of dying in hospital? these are unrelated

Slide 32

The effects of publication bias on the effect sizes that are seen. red: n=10 green: n=20 blue: n=40 yellow: n=80

Slide 32

Finding the best fit value for prior effect size. Minimum chi-sqr at 0.28 Frequency of effect sizes: in Science paper in whole simulated population (with ef=0.28) in simulated published studies