Confidence Intervals Copyright 2008 by The McGraw-Hill Companies. This material is intended for educational purposes by licensed users of LearningStats. It may not be copied or resold for profit.
What Is a Confidence Interval? Population m = 70 s = 2 N = 23,000 Sample X = 72.36 s = 2.179 n = 13 When we estimate a population parameter from a sample, we should surround the estimate with an interval. The width of the interval reflects our uncertainty due to sampling error. We set the desired confidence level (e.g., 95%) and use statistical methods to construct interval limits with the desired confidence level.
What Does “Confidence” Mean? The confidence level is your chance of success. For example, a 95% confidence interval's lower and upper bounds have a 95% chance of enclosing the unknown parameter. If you took 100 samples and used the same procedure each time, you would expect approximately 95 of your confidence intervals to enclose the parameter. But since you only take one sample, your confidence interval either does or doesn’t enclose the true parameter of interest. You will never know if it did or not.
What Is a Confidence Level? Typical confidence levels are 90%, 95%, and 99%. Shouldn't we always use a very high confidence level like 99%? Not necessarily. To gain confidence, we must make the interval wider, so it may be less useful.
What Can Affect the Width of a C.I.? Sample size Confidence level Sample data Population size You can usually choose these two But you can't control these two!
Example of a Confidence Interval An FAA-mandated inspection of a sample of 341 Boeing 737s revealed 236 with dangerously chafed wiring. With 95% confidence, we conclude that the proportion of all Boeing 737s with chafed wiring is 0.692 0.049 or between 0.643 and 0.741. That is, the true percent with chafed wiring may be anywhere from 64.3% to 74.1%. Source: Detroit Free Press, May 27, 1998, p. 6E.
Common Confidence Intervals Sensitive to non-normality Normal population Chi-square Variance (s2) Binomial would be used if sample is small Large sample: np>5 and n(1-p) > 5 Normal Proportion (p) Student’s t resembles z for large samples Population normal (or at least not too skewed) Student’s t Mean (m) with unknown s and small sample s typically is unknown Population normal (or large sample) Mean (m) with known s or large sample Comments Assumptions Distribution Parameter
Confidence Interval for m If s is known: Infinite population Finite population Assume an infinite population if n/N < 0.05. If s is unknown: Infinite population Finite population
Examples: 95% Confidence Interval for m If s is given: If s is given: Rare Typical When n is large, the z and t values are similar. We ignore the finite population correction when N is not given but may be assumed large.
Common Z Values 90% 98% 95% 99% z = 1.645 90% confidence z = 2.326
Student’s t Distribution d.f. 90% CI 95% CI 99% CI 5 2.015 2.571 4.032 10 1.812 2.228 3.169 20 1.725 2.086 2.845 60 1.671 2.000 2.660 1.645 1.960 2.576 Tip 1 The abbreviation d.f. stands for “degrees of freedom.” For a mean, d.f. = n1 Tip 2 As d.f. increases, the t value approaches the corresponding z value.
Student’s t Distribution 90% 98% Tip 1 Student’s t is bell-shaped and symmetric, but is flatter than a standard normal. Tip 2 Student’s t resembles a normal curve for large d.f. Visual Statistics screens 2001 by McGraw-Hill. Used with permission.
Should I Use z or t? Just use t if you don't know s But if we used z = 1.96 for 95% confidence, it would be about the same, right? Right. That's why some books say it's O.K. to use z when n exceeds 30. But it's not the conservative thing to do, because it makes your interval slightly narrower than is justified. When n is large, the z and t values are similar. But it’s safer to use t.
The Bottom Line Since the population standard deviation is generally unknown, we mostly use the Student's t when constructing confidence intervals for m.
Sample Size Generally speaking, larger n leads to narrower confidence intervals. But the width also depends on s when s is unknown.
Confidence Interval for p Infinite Population: Finite population: Tip 1 It is safe to assume normality if np > 10 and n(1p) > 10. If not, use the binomial (Minitab). Tip 2 Assume an infinite population if n/N < 0.05, i.e. if you can reasonably assume the population is 20 times as large as the sample.
Example: 95% Confidence Interval for p Problem A clinic saw 12,186 patients last year. In a random sample of 52 clinic patients, 12 indicated that they had to wait too long. Find the 95% confidence interval for the true proportion of customers who waited too long. Solution The sample proportion is p = x/n = 12/52 = 0.23077. Since n/N = 52/12186 = 0.004 < 0.05 we can ignore the finite population correction factor. Since np = 12 and n(1-p) = 40 (both exceed 10) we can assume normality. The confidence interval says that the true percentage who waited too long is between 11.6% and 34.5%.
Confidence Interval for s General formula: Tip 1 If you have a sample of raw data x1, x2, …, xn you can use Minitab’s Stat > Descriptive Statistics > Graphical Summary to make a confidence interval without using this formula. Tip 2 If you have s but no raw data, use this formula and find c2 in a chi-square table for d.f. = n1 .
Example: 95% Confidence Interval for s Problem A sample of 24 male patients over age 65 at a heart clinic showed a mean cholesterol reading of 200 with a standard deviation of 19. Find the 95% confidence interval for the standard deviation for all male patients over age 65. Solution With 95% confidence, the population standard deviation is between 14.8 and 26.7.
Confidence Intervals Vary Example Here are 100 samples of n = 15 items from a population with m = 794 and s = 3. From each sample, we constructed a 95% confidence interval for m. We would expect that about 95 of these confidence intervals would enclose m, while about 5 would not. In this simulation, 7 of the 100 intervals didn’t enclose m = 794 (shown in red). Visual Statistics screens 2001 by McGraw-Hill. Used with permission.
You May Be Wrong Example This is a quality control application in which spaghetti sauce jars are to be filled to exactly 794 grams. In 5 samples (confidence interval to the right of m = 794) we would conclude the jars were being overfilled, while in 2 samples (confidence interval to the left of m = 794) we would conclude the jars were being underfilled. The observed variation is random. Visual Statistics screens 2001 by McGraw-Hill. Used with permission.
Confidence Level Quantifies Your Risk Tradeoffs In a 90% confidence interval, we have a good chance of being “right.” If we increase this to 99%, we have a better chance of being “right,” but the resulting confidence interval may be too wide to be useful. The 95% confidence interval is used often, as a “middle ground” choice.
Assumptions Mean Proportion The formulas shown for the sample mean assume a normal population, but are robust to departures from normality. Only if the sample is small or badly skewed will there be reason for concern. In such cases, a bootstrap method would be appropriate. Proportion The formulas shown for the sample proportion assume that the sample is large enough to assume a normal distribution for the sample proportion p. More exact limits may be constructed using the binomial distribution (Minitab does this). Further Reading Michael Henderson and Mary C. Meyer, “Exploring the Confidence Interval for a Binomial Parameter in a First Course in Statistical Computing,” The American Statistician, Vol. 55, No. 4, November, 2001, pp. 337-344.
Other Issues The scientific aura of confidence intervals may divert attention from more fundamental issues such as: Goals of the proposed study Quality of the underlying data Interpretation and use of the intervals However, the formulas can also help structure the dialogue between statistician and client in planning a study.