Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017
Measuring Effect Size: Cohen’s d Simply finding whether a hypothesis test is significant or not tells us very little. It only tells whether or not an effect exists in a population. It does not tell us how much of an effect there actually is! Definition: Effect size is a statistical measure of the size of an effect—how far the sample statistic and a population parameter—in a population. This allows researchers to describe how far scores have shifted in the population, or the percent of variability in scores that can be explained by a given variable.
Measuring Effect Size: Cohen’s d One common measure of effect size is Cohen’s d. 𝑑= 𝑥 −𝜇 𝜎 The direction of shift is determined by the sign of d. 𝑑<0 means that the scores are shifted to the left of 𝜇. 𝑑>0 means that the scores are shifted to the right of 𝜇. Larger absolute values of d mean that there is a larger effect. There are standard conventions for describing effect size, but take these with a grain of salt. Small effect: 𝑑 <0.2 Medium effect: 0.2< 𝑑 <0.8 Large effect: 𝑑 >0.8
Example A developmental psychologist claims that a training program he developed according to a theory should improve problem- solving ability. For a population of 7-year-olds, the mean score 𝜇 on a standard problem-solving test is known to be 80 with a standard deviation of 10. To test the training program, 26 7-year-olds are selected at random, and their mean score is found to be 82. Let’s assume the population of scores is normally distributed. Can we conclude, at an 𝛼=.05 level of significance, that the program works? Assume the sd of scores after the training program is also 10. We hypothesize that the test improves problem-solving. What are the null and alternative hypotheses? 𝐻 0 :𝜇=80 vs. 𝐻 1 :𝜇>80 The data are normally distributed, so 𝑋 ~𝑁 𝜇, 𝜎 2 𝑛 A z-statistic can then be formed: 𝑧= 𝑥 − 𝜇 0 𝜎 𝑛 = 82−80 10 26 ≈1.0198 Critical value method: 𝛼=.05→ 𝑧 𝛼 =1.644854 Since 𝑧=1.0198<1.644854= 𝑧 𝛼 , then we cannot reject 𝐻 0 . Substantively, this means that a mean score of 82 is likely to occur by chance, so we cannot say that the training program improved problem-solving skills in 7-year-olds. What is the effect size as measured by Cohen’s d? 𝑑= 𝑥 −𝜇 𝜎 = 82−80 10 =0.20 Small effect size Show on the board the duality of using p-values and using the critical value method.
Basic Hypothesis Testing Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇 =5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created a one-week tutorial to help players increase their performance. In order to find out if it actually works, they administer the tutorial to a random sample of 𝑛=100 players, whose average score is calculated to be 𝑥 =5200 after one week. Does the tutorial actually work, or did those players happen to get an average score as high as 5200 just by chance? 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇>5000 𝑧= 𝑥 −𝜇 𝜎/ 𝑛 = 5200−5000 1000/ 100 =2 p-value =𝑃 𝑍>𝑧 =𝑃 𝑍>2 =0.0228 Interpretation of p-value: Given that the tutorial actually doesn’t work ( 𝐻 0 ), there is only a 2.28% chance that that a random sample of 100 players gets an average score as high as 5200 (or higher). How low should the p-value be before we are convinced that the tutorial does work (reject 𝐻 0 and accept 𝐻 1 )? This consideration is somewhat arbitrary, but common standards are 𝛼=0.05 or 𝛼=0.01 Note that 𝛼 is the cutoff p-value (below which we reject 𝐻 0 , above which we fail to reject 𝐻 0 ) If we use 𝛼=0.05, then (p-value =0.0228) < (𝛼=0.05), so reject 𝐻 0 and accept 𝐻 1 (tutorial does work) If we use 𝛼=0.01, then (p-value =0.0228) > (𝛼=0.01), so fail to reject 𝐻 0 and accept 𝐻 0 (tutorial doesn’t work)
Type I Error Let’s say that in reality 𝐻 0 is true (i.e., 𝐻 1 is false, or tutorial doesn’t work) Suppose we repeat the experiment over and over again (repeatedly draw random samples of 100 players and put them through the tutorial) and conduct a hypothesis test each time using 𝛼=0.05 We would falsely reject 𝐻 0 (mistakenly decide that the tutorial works) 5% of the time just due to chance In other words, 𝛼 is the probability of rejecting 𝐻 0 when in reality it is true (incorrectly deciding that the tutorial works when it actually doesn’t) Type I Error: rejecting 𝐻 0 when, in fact, it is true 𝛼=𝑃(Type I Error) Ultimately, 𝛼 is the probability of Type I Error we are willing to live with 𝑧 𝛼 = 𝑥 𝛼 −𝜇 𝜎/ 𝑛 → 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 +𝜇 𝛼=0.05 → 𝑧 𝛼 = 𝑧 0.05 =1.645 → 1.645= 𝑥 0.05 −5000 1000/ 100 → 𝑥 0.05 =1.645 1000 100 +5000=5164.5 𝛼=0.05 𝑥 0.05 =5164.5
Type II Error Let’s say that in reality 𝐻 0 is false (i.e., 𝐻 1 is true, or tutorial does work) Under the false premise of 𝐻 0 , the population mean of players who take the tutorial is 𝜇 0 =5000 (tutorial doesn’t work) and the standard deviation of the sampling distribution is 𝜎 𝑛 = 1000 100 =100 Suppose that, in fact, the tutorial actually increases a player’s score by 300 points on average Under the true premise of 𝐻 1 , the population mean of players who take the tutorial is 𝜇 1 =5300 (tutorial increases mean score by 300), but assume the standard error stays the same at 𝜎 𝑛 = 1000 100 =100 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100
Type II Error Type II Error: failing to reject 𝐻 0 when, in fact, it is false (incorrectly deciding tutorial doesn’t work when it actually does) 𝛽=𝑃(Type II Error) 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.05 =1.645 100 +5000=5164.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 5164.5−5300 100 =−1.355 𝛽=𝑃 𝑍< 𝑧 𝛽 =𝑃 𝑍<−1.355 =0.0877 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝛼=0.05 𝑥 0.05 =5164.5
Power Power: probability of rejecting 𝐻 0 when, in fact, it is false (correctly deciding tutorial works when it actually does) Power is the complement of 𝛽 Power = 1−𝛽=1−0.0877=0.9123 𝐻 0 𝐻 1 Power =𝟎.𝟗𝟏𝟐𝟑 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝑥 0.05 =5164.5
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇=5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛=100 players and perform a hypothesis test on the sample mean using 𝛼=0.05. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.05 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.05 =−1.645 100 +5000=4835.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4835.5−4700 100 =1.355 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>1.355 =0.0877 𝐻 1 𝐻 0 𝜇 1 =4700 𝜎/ 𝑛 =100 𝜇 0 =5000 𝜎/ 𝑛 =100 𝜶=𝟎.𝟎𝟓 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝑥 0.05 =4835.5
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇 =5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛 =100 players and perform a hypothesis test on the sample mean using 𝛼=0.05. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.05 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.05 =−1.645 100 +5000=4835.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4835.5−4700 100 =1.355 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>1.355 =0.0877 Power = 1−𝛽=1−0.0877=0.9123 𝐻 1 𝐻 0 Power =𝟎.𝟗𝟏𝟐𝟑 𝜇 1 =4700 𝜎/ 𝑛 =100 𝜇 0 =5000 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝑥 0.05 =4835.5
Same Example, Smaller 𝛼 Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇=5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛=100 players and perform a hypothesis test on the sample mean using 𝛼=0.01. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.01 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.01 =−2.326 100 +5000=4767.4 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4767.4−4700 100 =0.674 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>0.674 =0.2502 𝐻 1 𝐻 0 𝜇 1 =4700 𝜎/ 𝑛 =100 𝜇 0 =5000 𝜎/ 𝑛 =100 𝜷=𝟎.𝟐𝟓𝟎𝟐 𝜶=𝟎.𝟎𝟏 𝑥 0.01 =4767.4
Same Example, Smaller 𝛼 Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇 =5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛 =100 players and perform a hypothesis test on the sample mean using 𝛼=0.01. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.01 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.01 =−2.326 100 +5000=4767.4 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4767.4−4700 100 =0.674 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>0.674 =0.2502 Power = 1−𝛽=1−0.2502=0.7498 𝐻 1 𝐻 0 Power =𝟎.𝟕𝟒𝟗𝟖 𝜇 1 =4700 𝜎/ 𝑛 =100 𝜇 0 =5000 𝜎/ 𝑛 =100 𝜷=𝟎.𝟐𝟓𝟎𝟐 𝑥 0.01 =4767.4
Same Example, Larger n Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇=5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛=200 players and perform a hypothesis test on the sample mean using 𝛼=0.01. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.01 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.01 =−2.326 1000 200 +5000=4835.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4835.5−4700 1000/ 200 =1.916 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>1.916 =0.0277 𝐻 1 𝐻 0 𝜇 1 =4700 𝜎/ 𝑛 =70.71 𝜇 0 =5000 𝜎/ 𝑛 =70.71 𝜶=𝟎.𝟎𝟏 𝜷=𝟎.𝟎𝟐𝟕𝟕 𝑥 0.01 =4835.5
Same Example, Larger n Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇 =5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created an updated version and want to know if it is more difficult than the original version. They plan to conduct a trial of the updated version on a random sample of 𝑛 =200 players and perform a hypothesis test on the sample mean using 𝛼=0.01. 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇<5000 What is the probability of a Type I error (rejecting 𝐻 0 when, in fact, it is true)? P(Type I Error) =𝛼=0.01 Suppose the updated version actually is more difficult, and the true mean is 𝜇 1 =4700. What is the probability of a Type II error (accepting 𝐻 0 when, in fact, it is false)? 𝑥 𝛼 = −𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.01 =−2.326 1000 200 +5000=4835.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 4835.5−4700 1000/ 200 =1.916 𝛽=𝑃 𝑍> 𝑧 𝛽 =𝑃 𝑍>1.916 =0.0277 Power = 1−𝛽=1−0.2502=0.9723 𝐻 1 𝐻 0 Power =𝟎.𝟗𝟕𝟐𝟑 𝜇 1 =4700 𝜎/ 𝑛 =70.71 𝜇 0 =5000 𝜎/ 𝑛 =70.71 𝜷=𝟎.𝟎𝟐𝟕𝟕 𝑥 0.01 =4835.5
Summary
Summary
Summary
Practical Implications Type I Error, Type II Error, and Power are crucial considerations in any hypothesis testing study For a more consequential example, consider a new medical treatment developed to cure a life threatening disease. The treatment is tested on a random sample of ill patients. Type I Error: Deciding that the treatment works when, in fact, it doesn’t. This would be a dangerous mistake to make, because patients would be given this ineffective treatment. Type II Error: Deciding that the treatment does not work when, in fact, it does. This would be a tragic mistake, since this life- saving treatment would not be given to patients. Power: Probability of deciding that the treatment works when, in fact, it does. This would be the correct decision that we want to make. Ideally, we want to minimize both Type I error and Type II error (and maximize power). However, as explained earlier, the probabilities of Type I error and Type II error are inversely related: as one goes down, the other goes up. Therefore, a compromise must be made in choosing a low enough probability of Type I error (𝛼) while keeping the probability of Type II error (𝛽) in check. For any given study, the sample size should be large enough to have adequate power (or low 𝛽). If the sample size is too small, we would most likely not be able to detect a significant effect even it does exist in reality.
Confidence Intervals Another way to make inference about population parameters that is related to hypothesis testing is using the methods of point and interval estimation. Point estimation is a statistical procedure that involves the use of a sample statistic to estimate a population parameter. Example: A sample mean 𝑥 is an estimate of a population mean 𝜇. A sample variance 𝑠 2 is an estimate of a population mean 𝜎 2 . Interval estimation is a statistical procedure in which a sample of data is used to find the interval or range of possible values within which a population parameter is likely to be contained. The estimated interval with a given level of confidence is called a confidence interval.
Motivation Suppose we are interested in the average high score of millions all over the world who play a very popular computer game. Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire population (true mean 𝜇). However, all individuals do know their own high score, and we also happen to know (albeit unrealistically) that the population standard deviation is 1,000 (𝜎=1000). Since it would be infeasible to collect the scores of everyone, could we at least get an idea of the true mean? Say we choose a random sample of 100 gamers (𝑛 = 100) Their average high score turns out to be 5,000 ( 𝑥 = 5000) 𝑥 = 5000 is a point estimate of the true population high score 𝜇 based on our sample This sample estimate certainly gives us a fair idea of 𝜇, but also note that if we took another random sample of 100 gamers, we would almost surely get a slightly different sample mean Could we get a better idea of 𝜇 than a single point estimate 𝑥 ? As the name suggests, we can find an interval around 𝑥 in which we believe 𝜇 falls in with a certain level of confidence For instance, maybe we can be 95% confident that 𝜇 is between 4800 and 5200 How do we actually determine such an interval for a given level of confidence?
First Look How can we be 95% sure that 𝜇 (population average high score) is between two high scores? In other words, how can we determine an interval of high scores such that we are 95% confident that 𝜇 is within that interval? 95% confidence interval for 𝜇 We start with our estimate of 𝑥 = 5000 based on our sample of 100 players from a population with 𝜎=1000 According to the central limit theorem (CLT), 𝑋 has a normal distribution with an unknown mean 𝜇 and standard deviation 𝜎 𝑛 = 1000 100 =100 Note that since 𝑛=100≥30, 𝑋 is approximately normally distributed regardless of the population distribution of high scores It turns out that constructing a 95% confidence interval for 𝜇 is equivalent to determining all values of 𝜇 such that: 𝑃 −𝑧< 𝑥 −𝜇 𝜎/ 𝑛 <𝑧 =0.95 → 𝑃 −𝑧< 5000−𝜇 100 <𝑧 =0.95
First Look 𝑃 −𝑧< 5000−𝜇 100 <𝑧 =0.95 𝑃 −𝑧<𝑍<𝑧 +𝑃 𝑍<−𝑧 +𝑃 𝑍>𝑧 =1 0.95+𝑃 𝑍<−𝑧 +𝑃 𝑍>𝑧 =1 𝑃 𝑍<−𝑧 =𝑃 𝑍>𝑧 0.95+2𝑃 𝑍<−𝑧 =1 𝑃 𝑍<−𝑧 =0.025 𝑃 𝑍>𝑧 =0.025 𝑧= 𝑧 0.025 =1.96 −𝑧=−1.96 𝑃 −𝑧<𝑍<𝑧 =0.95 𝑃 𝑍<−𝑧 =0.025 𝑃 𝑍>𝑧 =0.025 So now we have a goal. How do we find z? −𝑧=−1.96 𝑧=1.96
First Look 𝑃 −𝑧< 5000−𝜇 100 <𝑧 =0.95 𝑃 −1.96< 5000−𝜇 100 <1.96 =0.95 −1.96< 5000−𝜇 100 <1.96 5000+ −1.96 100 <𝜇<5000+(1.96)(100) 4804<𝜇<5196 95% Confidence Interval We are 95% confident that the true average high score 𝝁 is between 4804 and 5196
Confidence Interval for 𝜇 (𝑧) 𝑐% Confidence Level → 𝑝 𝑐 = 𝑐 100 → 𝛼=1− 𝑝 𝑐 =1− 𝑐 100 𝑝 𝑐 =𝑃 − 𝑧 𝛼/2 < 𝑥 −𝜇 𝜎/ 𝑛 < 𝑧 𝛼/2 → − 𝑧 𝛼/2 < 𝑥 −𝜇 𝜎/ 𝑛 < 𝑧 𝛼/2 → 𝑥 − 𝑧 𝛼/2 𝜎 𝑛 <𝜇< 𝑥 + 𝑧 𝛼/2 𝜎 𝑛 𝑝 𝑐 𝛼 2 𝛼 2 𝑐% Confidence Interval − 𝑧 𝛼/2 𝑧 𝛼/2 𝜇 𝐿 <𝜇< 𝜇 𝑈 𝑧 𝛼/2 : critical value 𝜎 𝑛 : standard error 𝑧 𝛼/2 𝜎 𝑛 : margin of error 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ± 𝑧 𝛼/2 𝜎 𝑛
95% Confidence Interval for 𝜇 (𝑧) 95% Confidence Level → 𝑝 𝑐 = 95 100 =0.95 → 𝛼=1− 𝑝 𝑐 =0.05 𝑝 𝑐 =𝑃 − 𝑧 0.05/2 < 𝑥 −𝜇 𝜎/ 𝑛 < 𝑧 0.05/2 → − 𝑧 0.025 < 𝑥 −𝜇 𝜎/ 𝑛 < 𝑧 0.025 → 𝑥 −1.96 𝜎 𝑛 <𝜇< 𝑥 +1.96 𝜎 𝑛 𝑝 𝑐 =0.95 𝛼 2 =0.025 𝛼 2 =0.025 95% Confidence Interval − 𝑧 0.025 =−1.96 𝑧 0.025 =1.96 𝜇 𝐿 <𝜇< 𝜇 𝑈 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.96 𝜎 𝑛
90% Confidence Interval for 𝜇 (𝑧) 90% Confidence Level → 𝑝 𝑐 = 90 100 =0.90 𝑧 𝛼/2 = 𝑧 0.05 =1.645 𝑥 − 𝑧 𝛼/2 𝜎 𝑛 <𝜇< 𝑥 + 𝑧 𝛼/2 𝜎 𝑛 → 𝑥 −1.645 𝜎 𝑛 <𝜇< 𝑥 +1.645 𝜎 𝑛 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ± 𝑧 𝛼/2 𝜎 𝑛 → 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.645 𝜎 𝑛
99% Confidence Interval for 𝜇 (𝑧) 99% Confidence Level → 𝑝 𝑐 = 99 100 =0.99 𝑧 𝛼/2 = 𝑧 0.005 =2.576 𝑥 − 𝑧 𝛼/2 𝜎 𝑛 <𝜇< 𝑥 + 𝑧 𝛼/2 𝜎 𝑛 → 𝑥 −2.576 𝜎 𝑛 <𝜇< 𝑥 +2.576 𝜎 𝑛 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ± 𝑧 𝛼/2 𝜎 𝑛 → 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±2.576 𝜎 𝑛
Confidence Interval for 𝜇 (𝑧) 𝑥 −1.645 𝜎 𝑛 <𝜇< 𝑥 +1.645 𝜎 𝑛 → 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.645 𝜎 𝑛 95% Confidence Interval for 𝜇 𝑥 −1.960 𝜎 𝑛 <𝜇< 𝑥 +1.960 𝜎 𝑛 → 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.960 𝜎 𝑛 99% Confidence Interval for 𝜇 𝑥 −2.576 𝜎 𝑛 <𝜇< 𝑥 +2.576 𝜎 𝑛 → 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±2.576 𝜎 𝑛 100% Confidence Interval for 𝜇? → −∞,∞ → 100% confident that 𝜇 is any value! 0% Confidence Interval for 𝜇? → 𝑥 , 𝑥 → 0% confident that 𝜇 is exactly 𝑥 !
Confidence Interval for 𝜇 (𝑧) Suppose we are interested in the average high score of millions all over the world who play a very popular computer game. Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire population (true mean 𝜇). However, all individuals do know their own high score, and we also happen to know (albeit unrealistically) that the population standard deviation is 1,000. We take a random sample of 100 players and calculate their mean high score to be 5000. What are the 90%, 95%, and 99% confidence intervals for 𝝁? Population distribution not known if normal 𝜎=1000 (𝜎 known) 𝑛=100 (𝑛≥30) 𝑥 =5000 90% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.645 𝜎 𝑛 =5000±1.645 1000 100 =[4835.5, 5164.5] We are 90% confident that the true average high score 𝜇 is between 4835.5 and 5164.5 95% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.960 𝜎 𝑛 =5000±1.960 1000 100 =[4804, 5196] We are 95% confident that the true average high score 𝜇 is between 4804 and 5196 99% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±2.576 𝜎 𝑛 =5000±2.576 1000 100 =[4742.4, 5257.6] We are 99% confident that the true average high score 𝜇 is between 4742.4 and 5257.6 𝑥 −𝜇 𝜎/ 𝑛 is standard normal 𝑧 according to CLT
Confidence Interval for 𝜇 (𝑧) Suppose we are interested in the average high score of millions all over the world who play a very popular computer game. Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire population (true mean 𝜇). However, all individuals do know their own high score, and we also happen to know that the population high scores are normally distributed with a standard deviation of 1,000. We take a random sample of 25 players and calculate their mean high score to be 5000. What are the 90%, 95%, and 99% confidence intervals for 𝝁? Population distribution normal 𝜎=1000 (𝜎 known) 𝑛=25 (𝑛<30) 𝑥 =5000 90% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.645 𝜎 𝑛 =5000±1.645 1000 25 =[4671, 5329] We are 90% confident that the true average high score 𝜇 is between 4671 and 5329 95% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±1.960 𝜎 𝑛 =5000±1.960 1000 25 =[4608, 5392] We are 95% confident that the true average high score 𝜇 is between 4608 and 5392 99% 𝐶𝐼: 𝜇 𝐿 , 𝜇 𝑈 = 𝑥 ±2.576 𝜎 𝑛 =5000±2.576 1000 25 =[4484.8, 5515.2] We are 99% confident that the true average high score 𝜇 is between 4484.8 and 5515.2 𝑥 −𝜇 𝜎/ 𝑛 is standard normal 𝑧
Link between hypothesis testing and confidence intervals In a real sense, hypothesis testing and confidence intervals are two sides of the same coin. How can we see this? Consider a two-sided hypothesis test of the mean. 𝐻 0 :𝜇= 𝜇 0 𝐻 1 :𝜇≠ 𝜇 0 A test with a given level of 𝛼 does not reject the null hypothesis if: −𝑧 𝛼/2 < 𝑥 − 𝜇 0 𝜎 𝑛 =𝑧 𝑜𝑏𝑠 < 𝑧 𝛼/2 Therefore, the test does not reject the null hypothesis if: 𝑥 − 𝑧 𝛼/2 𝜎 𝑛 < 𝜇 0 < 𝑥 + 𝑧 𝛼/2 𝜎 𝑛 That is, if the (100×𝛼)% confidence interval contains the null hypothesized value, then you do not reject the null hypothesis. Conversely, this means that if the confidence interval does not contain the null hypothesized value, then you reject the null hypothesis!
Confidence Intervals in General In general, a confidence interval always follows this pattern: 𝑃𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒±𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 The margin of error is a measure of how much error we are going to allow, given a level of confidence. The standard error of the estimate is the measure of standard deviation in the distribution of the point estimate. The appropriate multiplier is determined by the distribution of the point estimate and the given level of confidence. It essentially is the number of standard deviations from the point estimate that is necessary to reach a given level of confidence. 𝑎𝑝𝑝𝑟𝑜𝑝𝑟𝑖𝑎𝑡𝑒 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟×𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒