10.2 ESTIMATING A POPULATION MEAN
QUESTION: How do we construct a confidence interval for an unknown population mean when we don’t know the population standard deviation? In the previous section, we made the unrealistic assumption that we knew the standard deviation of the population. Now we must estimate sigma from the data. This changes our computations but not the interpretation of a confidence interval.
CONDITIONS FOR INFERENCE ABOUT A POPULATION MEAN SRS: Our data are a SRS from the population OR come from a randomized experiment. Normality: Our population is normal. In practice, it will be sufficient that the distribution is single-peaked and symmetric unless the sample is very small. Independence: If the population size is at least 10 times the sample size (N > 10n), our calculations will be reasonably accurate when sampling without replacement.
STANDARD ERROR
T DISTRIBUTIONS When we have to substitute the standard error for the standard deviation of the sampling distribution, the distribution of the resulting statistic, t, is not Normal. It is a t distribution. There is a different t distribution for each sample size, n. We specify a particular t distribution by giving its degrees of freedom (df). The appropriate degrees of freedom is df = n – 1.
T DISTRIBUTIONS VS. Z DISTRIBUTIONS The density curves of the t distributions are similar in shape to the standard Normal curve. They are symmetric about zero, single-peaked, and bell- shaped. The spread of the t distributions is a bit greater than that of the standard Normal distribution. As the degrees of freedom increase, the density curve approaches the Normal curve ever more closely.
FINDING T* Table C in the back of the book gives critical values t* for the t distributions. Each row in the table contains critical values for one of the t distributions; the degrees of freedom appear at the left of the row. Example Suppose you wanted to construct a 95% confidence interval for the mean, μ, of a population based on a SRS of size n=12. What critical value t* should you use?
THE ONE-SAMPLE t CONFIDENCE INTERVALS
Example Environmentalists, government officials, and vehicle manufacturers are all interested in studying the auto exhaust emissions produced by motor vehicles. The major pollutants in auto exhaust from gasoline engines are hydrocarbons, monoxide, and nitrogen oxide (NOX). The table below gives the (NOX) levels (in grams per mile) for a random sample of light duty engines of the same type.
ASSIGNMENT Page 648 – 650, problems – 10.32
PAIRED t PROCEDURES
IS CAFFEINE DEPENDENCE REAL? Our subjects are 11 people diagnosed as being dependent on caffeine. Each subject was barred from coffee, colas, and other substances containing caffeine. Instead they took capsules containing their normal caffeine intake. During a different time period, they took the placebo capsules. The order in which the subjects took caffeine and the placebo was randomized. Table 10.3 (on the next slide) contains data on two of several tests given to the subjects. Higher scores show more signs of depression. “Beats” is the number of beats per minute the subject achieved when asked to press a button 200 times as quickly as possible. We are interested in whether being deprived of caffeine affects these outcomes. Construct and interpret a 90% confidence interval for the mean change in depression score.
IS CAFFEINE DEPENDENCE REAL?
NOTES Many studies that require the use of paired t procedures involve individuals who are not chosen at random from the population of interest. In such cases, we may not be able to generalize our findings to the population of interest. Random selection of individuals for a statistical study allows us to generalize the results of that study to a larger population. By randomly assigning treatments, however, we can help ensure that the mean difference in measurements can be attributed to the treatment. Random assignment of treatments to subjects in an experiment allows us to compare treatments to investigate whether there is evidence of a treatment effect, which might suggest that the treatment caused the observed difference.
ROBUSTNESS OF t PROCEDURES An inference procedure is called robust if the probability calculations involved in that procedure remain fairly accurate when a condition for use of the procedure is violated. For confidence intervals, this means that the stated confidence interval is fairly accurate. If outliers are present in the sample data, then the population may not be Normal. The t procedures are not robust against outliers.
USING THE t PROCEDURES Except in the case of small samples, the assumption that the date are an SRS from the population of interest is more important than the assumption that the population distribution is Normal. Sample size less than 15. Use t procedures if the data are close to Normal. If the data are clearly non-Normal or if outliers are present, do not use t procedures. Sample size at least 15. The t procedures can be used except in the presence of outliers or strong skewness. Large samples. The t procedures can be used even for clearly skewed distributions when the sample is large, say n > 30.
ASSIGNMENT Page 657 – 658, problems – Page 659 – 661, problems – 10.44