Statistics 200 Objectives: Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2 Objectives: • Define standard error, relate it to both standard deviation and sampling distribution ideas. • Describe the sampling distribution of a sample proportion. • Reformulate confidence interval formula using general idea of estimate plus/minus (multiplier × standard error) • Interpret confidence level as a relative frequency • Calculate new values of the multiplier for new confidence levels other than 95%
We now begin a strong focus on Inference Means Proportions One population mean One population proportion Two population proportions Difference between Means Mean difference This week
Motivation Eventual Goal: Use statistical inference to answer the question “What is the percentage of Creamery customers who prefer chocolate ice cream over vanilla?” Strategy: Get a random sample of 90 individuals and ask them this question. Use the answers to perform a hypothesis test to answer the question.
Comparison of Binomial-based statistics Variable Notation Mean St. Dev. Count of successes Chapter 8 Proportion of successes Chapter 9 and beyond
Binomial Distribution vs Binomial Distribution vs. approximate p-hat sampling distribution: n = 100 & p = 0.70
A better confidence interval Conservative margin of error: OLD: ME = (multiplier)*(standard error) NEW:
New formula for margin of error ME = (multiplier) × (standard error) Estimate of the ________________ of the sampling distribution of p-hat Z* Standard deviation Related to ____________. Expresses level of confidence that the interval includes the _________. Empirical rule parameter
Z*-multiplier n*p > 10 n*(1-p) > 10 1 1 Use when the normal approximation is appropriate, i.e. when _________ and _____________. n*p > 10 n*(1-p) > 10 Confidence level Multiplier (z*) 90% 1.65 95% 1.96 2 98% 2.33 99% 2.58 The z-multiplier for a 68% confidence level would be _______, because we must go _____ standard deviation from the mean to capture 68% of the area. 1 1 0.95 0.98 0.90
Three Factors affect the width of a confidence interval Page 382 textbook 1. Level of confidence 2. Sample size sample size Level of confidence ME Z* ME
The scatterplot shows the variation is… largest when p-hat = 1.0 largest when p-hat = 0.5 largest when p-hat = 0.25 smallest when p-hat = 0.9 smallest when p-hat = 0.2
Factor 3: Value of p-hat impacts width of C.I. At a given level of confidence and sample size, the confidence interval is the widest when p-hat equals ______ and it becomes narrower as p-hat moves away from _______ in either direction. 0.5 0.5
Confidence Intervals: Population Proportion Conservative Method: Chapter 1 & 5 Normal Approximation: Chapter 10 Exact (Binomial) When normal conditions aren’t met, use this option Need a computer to calculate the interval. Does not include a M.E. Minitab: provides both options Pages 389 & 390 in the textbook
Binomial distributions n fixed at 10, p increasing p fixed at 0.02, n increasing Values of n and p determine whether binomial is normal in shape
What does it mean to be 95% confident? Before the sample is drawn: We can say that P(conf. int. contains the true parameter) = 0.95. After the sample is drawn: There is no more randomness! (Both the CI and the parameter are now fixed.) So we cannot talk of “probability” any longer.
Interpreting 95% confidence: An example Suppose we have a sample of 200 students in STAT 100 and find that 28 of them are left handed. Our sample proportion is: We now find the ME and construct a 95% CI. 0.14
Hence, z* times the standard error = 2×.025 = .05 Find the standard error: That is, estimate the standard deviation of the sample proportion based on a sample of size 200: Hence, z* times the standard error = 2×.025 = .05 On the following two slides, we'll pretend that the true population proportion is 0.12.
The green curve is the true distrtibution of p-hat. Of course, ordinarily we don't know where it lies, but at least we know its approximate standard deviation. Thus, we can build a confidence interval around our 14% estimate (in red). If we take another sample, the red line will move but the green curve will not!
If we repeat the sampling over and over, 95% of our confidence intervals will contain the true proportion of 0.12. This is why we use the term "95% confidence interval".
Definition of "95% confidence interval for the true population proportion": An interval of values computed from a sample that will cover the true but unknown population proportion for 95% of the possible samples. To find a 95% CI: • The center is at p-hat. • The margin of error is 2 times the S.E., where… • …the S.E. is the square root of [p-hat(1-p-hat)/n].
What does it mean to be 95% confident? There is a 95% probability that the one interval that I calculated contains the true value for the parameter. If I get 100 such intervals, about 95 of them will contain the true value for the parameter. The sample estimate has a 95% chance of being inside the calculated interval. The p-value has a 95% chance of being inside the interval.
If you understand today’s lecture… 9.25, 9.33, 9.35, 9.37, 10.1, 10.3, 10.7, 10.9, 10.11, 10.13, 10.15, 10.19, 10.21, 10.23, 10.25, 10.27, 10.33, 10.45 Objectives: • Define standard error, relate it to both standard deviation and sampling distribution ideas. • Describe the sampling distribution of a sample proportion. • Reformulate confidence interval formula using general idea of estimate plus/minus (multiplier × standard error) • Interpret confidence level as a relative frequency • Calculate new values of the multiplier for new confidence levels other than 95%