Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.

Similar presentations


Presentation on theme: "Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018."— Presentation transcript:

1 Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018

2 Measuring Effect Size: Cohen’s d
Simply finding whether a hypothesis test is significant or not tells us very little. It only tells whether or not an effect exists in a population. It does not tell us how much of an effect there actually is! Definition: Effect size is a statistical measure of the size of an effect—how far the sample statistic and a population parameter—in a population. This allows researchers to describe how far scores have shifted in the population, or the percent of variability in scores that can be explained by a given variable.

3 Measuring Effect Size: Cohen’s d
One common measure of effect size is Cohen’s d. 𝑑= 𝑥 −𝜇 𝜎 The direction of shift is determined by the sign of d. 𝑑<0 means that the scores are shifted to the left of 𝜇. 𝑑>0 means that the scores are shifted to the right of 𝜇. Larger absolute values of d mean that there is a larger effect. There are standard conventions for describing effect size, but take these with a grain of salt. Small effect: 𝑑 <0.2 Medium effect: 0.2< 𝑑 <0.8 Large effect: 𝑑 >0.8

4 Example A developmental psychologist claims that a training program he developed according to a theory should improve problem- solving ability. For a population of 7-year-olds, the mean score 𝜇 on a standard problem-solving test is known to be 80 with a standard deviation of 10. To test the training program, 26 7-year-olds are selected at random, and their mean score is found to be 82. Let’s assume the population of scores is normally distributed. Can we conclude, at an 𝛼=.05 level of significance, that the program works? Assume the sd of scores after the training program is also 10. We hypothesize that the test improves problem-solving. What are the null and alternative hypotheses? 𝐻 0 :𝜇=80 vs. 𝐻 1 :𝜇>80 The data are normally distributed, so 𝑋 ~𝑁 𝜇, 𝜎 2 𝑛 A z-statistic can then be formed: 𝑧= 𝑥 − 𝜇 0 𝜎 𝑛 = 82− ≈1.0198 Critical value method: 𝛼=.05→ 𝑧 𝛼 = Since 𝑧=1.0198< = 𝑧 𝛼 , then we cannot reject 𝐻 0 . Substantively, this means that a mean score of 82 is likely to occur by chance, so we cannot say that the training program improved problem-solving skills in 7-year-olds. What is the effect size as measured by Cohen’s d? 𝑑= 𝑥 −𝜇 𝜎 = 82−80 10 =0.20 Small effect size Show on the board the duality of using p-values and using the critical value method.

5 Basic Hypothesis Testing
Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇=5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created a one-week tutorial to help players increase their performance. In order to find out if it actually works, they administer the tutorial to a random sample of 𝑛= 100 players, whose average score is calculated to be 𝑥 =5200 after one week. Does the tutorial actually work, or did those players happen to get an average score as high as 5200 just by chance? 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇>5000 𝑧= 𝑥 −𝜇 𝜎/ 𝑛 = 5200− / =2 p-value =𝑃 𝑍>𝑧 =𝑃 𝑍>2 =0.0228 Interpretation of p-value: Given that the tutorial actually doesn’t work ( 𝐻 0 ), there is only a 2.28% chance that that a random sample of players gets an average score as high as 5200 (or higher). How low should the p-value be before we are convinced that the tutorial does work (reject 𝐻 0 and accept 𝐻 1 )? This consideration is somewhat arbitrary, but common standards are 𝛼=0.05 or 𝛼=0.01 Note that 𝛼 is the cutoff p-value (below which we reject 𝐻 0 , above which we fail to reject 𝐻 0 ) If we use 𝛼=0.05, then (p-value =0.0228) < (𝛼=0.05), so reject 𝐻 0 and accept 𝐻 1 (tutorial does work) If we use 𝛼=0.01, then (p-value =0.0228) > (𝛼=0.01), so fail to reject 𝐻 0 and accept 𝐻 0 (tutorial doesn’t work)

6 Type I Error Let’s say that in reality 𝐻 0 is true (i.e., 𝐻 1 is false, or tutorial doesn’t work) Suppose we repeat the experiment over and over again (repeatedly draw random samples of 100 players and put them through the tutorial) and conduct a hypothesis test each time using 𝛼=0.05 We would falsely reject 𝐻 0 (mistakenly decide that the tutorial works) 5% of the time just due to chance In other words, 𝛼 is the probability of rejecting 𝐻 0 when in reality it is true (incorrectly deciding that the tutorial works when it actually doesn’t) Type I Error: rejecting 𝐻 0 when, in fact, it is true 𝛼=𝑃(Type I Error) Ultimately, 𝛼 is the probability of Type I Error we are willing to live with 𝑧 𝛼 = 𝑥 𝛼 −𝜇 𝜎/ 𝑛 → 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 +𝜇 𝛼=0.05 → 𝑧 𝛼 = 𝑧 0.05 =1.645 → = 𝑥 − / → 𝑥 = =5164.5 𝛼=0.05 𝑥 =5164.5

7 Type II Error Let’s say that in reality 𝐻 0 is false (i.e., 𝐻 1 is true, or tutorial does work) Under the false premise of 𝐻 0 , the population mean of players who take the tutorial is 𝜇 0 =5000 (tutorial doesn’t work) and the standard deviation of the sampling distribution is 𝜎 𝑛 = =100 Suppose that, in fact, the tutorial actually increases a player’s score by 300 points on average Under the true premise of 𝐻 1 , the population mean of players who take the tutorial is 𝜇 1 =5300 (tutorial increases mean score by 300), but assume the standard error stays the same at 𝜎 𝑛 = =100 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100

8 Type II Error Type II Error: failing to reject 𝐻 0 when, in fact, it is false (incorrectly deciding tutorial doesn’t work when it actually does) 𝛽=𝑃(Type II Error) 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 = =5164.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = − =−1.355 𝛽=𝑃 𝑍< 𝑧 𝛽 =𝑃 𝑍<−1.355 =0.0877 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝛼=0.05 𝑥 =5164.5

9 Power Power: probability of rejecting 𝐻 0 when, in fact, it is false (correctly deciding tutorial works when it actually does) Power is the complement of 𝛽 Power = 1−𝛽=1−0.0877=0.9123 𝐻 0 𝐻 1 Power =𝟎.𝟗𝟏𝟐𝟑 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝑥 =5164.5


Download ppt "Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018."

Similar presentations


Ads by Google