AP Statistics Confidence intervals for Proportions Chapter 19
Objectives: Standard Error Confidence Interval One-proportion z-interval Margin of Error Critical Value
Introduction Statistical Inference Involves methods of using information from a sample to draw conclusions regarding the population. In formal statistical inference, we use probability to express the strength of our conclusions.
Two Most Common Types of Formal Statistical Inference Confidence Intervals Estimate the value of a population parameter. Tests of Significance Assess the evidence for a claim about a population.
Conference Intervals and Tests of Significance Both types of inference are based on the sampling distributions of a sample statistic. That is, both report probabilities that state what would happen if we used the inference method many times.
Confidence Interval
The sample proportion We now study categorical data and draw inference on the proportion, or percentage, of the population with a specific characteristic. If we call a given categorical characteristic in the population “success,” then the sample proportion of successes, ,is: We choose 50 people in an undergrad class, and find that 10 of them are Hispanic: = (10)/(50) = 0.2 (proportion of Hispanics in sample) You treat a group of 120 Herpes patients given a new drug; 30 get better: = (30)/(120) = 0.25 (proportion of patients improving in sample)
Sampling distribution of The sampling distribution of is never exactly normal. But as the sample size increases, the sampling distribution of becomes approximately normal.
Concept of Confidence Intervals As we discussed in sampling distributions, the existence of sampling variation affects the accuracy of a sample statistic as an estimator of a population parameter. The unbiased estimators calculated using a sampling distribution can be described as point estimators – specific numbers that are estimates of the parameter. In this section, we will develop the idea of a different type of estimate, an interval estimate, which incorporates the sampling variability of the point estimators.
Standard Error Both of the sampling distributions we’ve looked at are Normal. For proportions For means
Standard Error When we don’t know p or σ, (which we normally don’t, because they are population parameters) we’re stuck, right? Nope. We will use sample statistics to estimate these population parameters. Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.
Standard Error For a sample proportion, the standard error is For the sample mean, the standard error is
A Confidence Interval Recall that the sampling distribution model of is centered at p, with standard deviation . Since we don’t know p, we can’t find the true standard deviation of the sampling distribution model, so we need to find the standard error:
Confidence Interval Definition Confidence Interval is a range of values used to estimate the true value of a population parameter.
What Does “95% Confidence” Really Mean? Each confidence interval uses a sample statistic to estimate a population parameter. But, since samples vary, the statistics we use, and thus the confidence intervals we construct, vary as well.
What Does “95% Confidence” Really Mean? The figure to the right shows that some of our confidence intervals (from 20 random samples) capture the true proportion (the green horizontal line), while others do not:
What Does “95% Confidence” Really Mean? Our confidence is in the process of constructing the interval, not in any one interval itself. Thus, we expect 95% of all 95% confidence intervals to contain the true parameter that they are estimating.
What Does “95% Confidence” Really Mean? Returning to our pervious example. 20 samples from the same population gave these 95% confidence intervals. In the long run, 95% of all samples give an interval that contains the population proportion p.
A Level C Confidence Interval has Two Parts: An interval calculated from the data, usually of the form estimate ± margin of error Example: estimate – margin of error – how accurate we believe our estimate is, based on the variability of the estimate. For a 95% confidence interval the margin of error would be .
Explaining a _% Confidence Interval We are _% confident that the true proportion (in context of the problem) is between (lower limit) and (upper limit).
A Confidence Level C, which gives the probability that the interval will capture the true parameter value in repeated samples. Example: 95% confidence interval – normally use confidence level of 90% or higher (want to be sure of our conclusions).
Explaining a _% Confidence Level _% confidence means that, in the long run, __ out of 100 intervals calculated using the same procedure would capture the true population proportion.
Margin of Error: Certainty vs. Precision We can claim, with 95% confidence, that the interval contains the true population proportion. The extent of the interval on either side of is called the margin of error (ME). In general, confidence intervals have the form estimate ± ME. The more confident we want to be, the larger our ME needs to be, making the interval wider.
Margin of Error: Certainty vs. Precision To be more confident, we wind up being less precise. We need more values in our confidence interval to be more certain. Because of this, every confidence interval is a balance between certainty and precision. The tension between certainty and precision is always there. Fortunately, in most cases we can be both sufficiently certain and sufficiently precise to make useful statements.
Margin of Error: Certainty vs. Precision The choice of confidence level is somewhat arbitrary, but keep in mind this tension between certainty and precision when selecting your confidence level. The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used).
Critical Values - z* The critical value z* is the number (z-score) on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur, for a given confidence level. For any confidence level, we can find the corresponding critical value (the number of SEs that corresponds to our confidence interval level).
Example: Confidence Level Lower Critical Value Upper Critical
Critical Values Example: For a 90% confidence interval, the critical value is 1.645:
z* is the same for any normal distribution for a given confidence level
Problem: Find the critical value z* for a confidence level of 94%? invNorm(.94)=1.555
Your Turn: Find the critical value z* for a confidence level of 73%?
Assumptions and Conditions Here are the assumptions and the corresponding conditions you must check before creating a confidence interval for a proportion: Independence Assumption: We first need to Think about whether the Independence Assumption is plausible. It’s not one you can check by looking at the data. Instead, we check two conditions to decide whether independence is reasonable.
Assumptions and Conditions Randomization Condition: Were the data sampled at random or generated from a properly randomized experiment? Proper randomization can help ensure independence. 10% Condition: Is the sample size no more than 10% of the population? Sample Size Assumption: The sample needs to be large enough for us to be able to use the CLT. Success/Failure Condition: We must expect at least 10 “successes” and at least 10 “failures.”
One-Proportion z-Interval When the conditions are met, we are ready to find the confidence interval for the population proportion, p. The confidence interval is where The critical value, z*, depends on the particular confidence level, C, that you specify.
Procedure: Confidence Interval for a Population Proportion P A N I C P – parameter A – assumptions N – name the interval I – interval C – conclusion
Procedure: Confidence Interval for a Population Proportion P arameter: Identify the population of interest and the parameter you want to draw conclusions about (population proportion p). A ssumptions: Verify the conditions for using the selected procedure. Conditions population proportion; Random condition 10% condition Success/Failure condition
N ame: One sample proportions z interval. Confidence interval (CI) CI = estimate ± margin of error In general CI = estimate ± z* • SE I nterval: Calculate the confidence interval. For population proportion p Estimate = z*: calculated based on the confidence level
C onclusion in context: Answer the question in the context of the problem. “We are _% confident that the true proportion (in context of the problem) is between (lower limit) and (upper limit)”.
Example - Medication side effects Arthritis is a painful, chronic inflammation of the joints. An experiment on the side effects of pain relievers examined arthritis patients to find the proportion of patients who suffer side effects. It was found that 23 out of 440 arthritis patients suffered side effects. Calculate a 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms.”
Solution - PANIC Parameter P: proportion of arthritis patients who suffer some “adverse symptoms.” Assumptions Check Conditions Randomization Condition; assume the 440 arthritis patients were randomly selected. 10% Condition; it is reasonable to assume there are more than 4,400 total arthritis patients. Success/Failure Condition: 𝑛 𝑝 = (440)(23/440) = 23 and 𝑛 𝑞 = (440)(317/440) = 317, both are greater than 10.
Name the interval: One sample proportions z interval Name the interval: One sample proportions z interval. 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms.” Interval: For a 90% confidence level, z* = 1.645. The Standard error is, (0.029, 0.075) Conclusion in context: We are 90% confident, that between 2.9% and 7.5% of arthritis patients taking this pain medication experience some adverse symptoms.
Your Turn: In a random sample of 50 Philadelphia families with children of preschool age, 35 had children enrolled in preschool. Find a 95% confidence interval for the true proportion of Philadelphia families with children enrolled in preschool.
Confidence Intervals on the TI-83/84 Press STAT key, choose TESTS, and then choose A: 1-PropZInterval… Adjust the settings; x: the number selected (not ) n: the sample size C-level: confidence level Then choose “Calculate”
Solve Using the TI-84 In a random sample of 50 Philadelphia families with children of preschool age, 35 had children enrolled in preschool. Find a 95% confidence interval for the true proportion of Philadelphia families with children enrolled in preschool. 1-PropZInterval x: 35 n: 50 C-level: .95 Calculate
Solve Using the TI-84 Solution: Conclusion: We are 95% confident that the true proportion of Philadelphia families with children enrolled in preschool is between 57.3% and 82.7%.
Your Turn: Using TI -84 In 2000, the GSS asked subjects if they would be willing to pay much higher prices in order to protect the environment. Of n = 1154 respondents, 518 indicated a willingness to do so. Find a 95% confidence interval for the population proportion of adult Americans willing to do so at the time of that survey. Interpret the results.
How Confidence Intervals Behave The user chooses the confidence level, and the margin of error follows from this choice. The higher the confidence level, the greater the margin of error and hence, the larger the confidence interval. The lower the confidence level, the lesser the margin of error and hence, the smaller the confidence interval. We would like high confidence and a small margin of error. High confidence says that our method almost always gives correct answers. A small margin of error says that we know the parameter more precisely.
Margin of Error The margin of error gets smaller when; Z* gets smaller. Trade-off between confidence level and margin of error. To obtain a smaller margin of error from the same data, you must be willing to accept a lower confidence level. gets smaller. measures the variation in the population. It is easier to pin down p when is small (less variation). n gets larger. Increasing the sample size n reduces the margin of error for a fixed confidence level (large sample size means less variation). Because n appears under the radical, we must increase the sample size by a factor of four to cut the margin of error in half.
Choosing the Sample Size We can arrange to have both high confidence and a small margin of error by taking a large enough sample. To determine the sample size n that will yield a confidence interval for a population proportion with a specified margin of error me. (margin of error wanted) Solve for n:
Example: At the end of every school year, the state administers a reading test to a SRS drawn from a population of 100,000 third graders. Over the last five years, students who took the test correctly answered 75% of the test questions. What sample size should you use to achieve a margin of error equal to 4%, with a confidence level of 95%?
Solution Sample size should be 451 third graders.
Choosing Your Sample Size To determine the sample size, choose a Margin of Error (me) and a Confidence Interval Level. The formula requires which we don’t have yet because we have not taken the sample. A good estimate for , which will yield the largest value for (and therefore for n) is 0.50. Solve the formula for n.
Example: At the end of every school year, the state administers a reading test to a SRS drawn from a population of 100,000 third graders. What sample size should you use to achieve a margin of error equal to 4%, with a confidence level of 95%?
Solution: Sample size should be 601 third graders.
Your Turn: Suppose the U.S. President wants an estimate of the proportion of the population who support his current policy toward revisions in the Social Security System. The president wants the estimate to be within .04 of the true proportion. Assume a 90 percent level confidence. How large a sample is required?
Sample Size In practice, taking samples costs time and money. The required sample size may be impossibly expensive. Notice once again that it is the size of the sample that determines the margin of error. The size of the population (as long as the population is much larger than the sample) does not influence the sample size we need.
The Perfect Confidence Interval