Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of .0002 grams. How many weighings must be done to get a margin of error of ±.0001 with a 99% confidence level?
When is Unknown: The t Distributions The t distribution is a probability distribution that is used to estimate population parameters when the sample size is small and/or when the population variance is unknown. It does not have a normal distribution.
So why use the t-distribution? Recall that according to the Central Limit Theorem, a sampling distribution will follow a normal distribution as long as the sample size is sufficiently large. So, when we know the standard deviation of the population, we can compute the z-score and use the normal distribution to evaluate probabilities with the sample mean. Now, that is all well and good, however what happens when the sample size is small and or we do not know the standard deviation of the population? When either of these issues occurs, we can use the t-distribution (t-score). The t distribution allows us to conduct statistical analyses on certain data sets that are not appropriate for analysis using the normal distribution.
So what are degrees of freedom? The particular form of the t-distribution is determined by its degrees of freedom. The degrees of freedom is equal to the sample size minus one. For example, if the sample size is 8, what are the degrees of freedom?
Estimating a Population Mean The t Distributions; Degrees of Freedom When comparing the density curves of the standard Normal distribution and t distributions, several facts are apparent: Estimating a Population Mean The density curves of the t distributions are similar in shape to the standard Normal curve. The spread of the t distributions is a bit greater than that of the standard Normal distribution. The t distributions have more probability in the tails and less in the center than does the standard Normal. As the degrees of freedom increase, the t density curve approaches the standard Normal curve ever more closely. We can use Table B in the back of the book to determine critical values t* for t distributions with different degrees of freedom.
Estimating a Population Mean Using Table B to Find Critical t* Values Suppose you want to construct a 95% confidence interval for the mean µ of a Normal population based on an SRS of size n = 12. What critical t* should you use? Estimating a Population Mean Upper-tail probability p df .05 .025 .02 .01 10 1.812 2.228 2.359 2.764 11 1.796 2.201 2.328 2.718 12 1.782 2.179 2.303 2.681 z* 1.645 1.960 2.054 2.326 90% 95% 96% 98% Confidence level C In Table B, we consult the row corresponding to df = n – 1 = 11. We move across that row to the entry that is directly above 95% confidence level. The desired critical value is t * = 2.201.
Examples Find the following t* values: 90% confidence level based on an SRS of size n = 18. 99% confidence level based on an SRS of size n = 6. 98% confidence level based on an SRS of size n = 50.
Estimating a Population Mean Constructing a Confidence Interval for µ Estimating a Population Mean Definition: To construct a confidence interval for µ, Use critical values from the t distribution with n - 1 degrees of freedom in place of the z critical values. That is,
Estimating a Population Mean One-Sample t Interval for a Population Mean The one-sample t interval for a population mean is similar in both reasoning and computational detail to the one-sample z interval for a population proportion. As before, we have to verify three important conditions before we estimate a population mean. Estimating a Population Mean •Random: The data come from a random sample of size n from the population of interest or a randomized experiment. • Normal: The population has a Normal distribution or the sample size is large (n ≥ 30). • Independent: The method for calculating a confidence interval assumes that individual observations are independent. To keep the calculations reasonably accurate when we sample without replacement from a finite population, we should check the 10% condition: verify that the sample size is no more than 1/10 of the population size. Conditions for Inference about a Population Mean Choose an SRS of size n from a population having unknown mean µ. A level C confidence interval for µ is where t* is the critical value for the tn – 1 distribution. Use this interval only when: the population distribution is Normal or the sample size is large (n ≥ 30), the population is at least 10 times as large as the sample. The One-Sample t Interval for a Population Mean
Estimating a Population Mean Example #1 Suppose that we form a simple random sample of 40 third graders from a large school district of several thousand and have each of them take a basic reading test. The sample has mean score of 27 with standard deviation of 5. Determine with 95% confidence the mean score for the population of all third grade students in the district. Use the 4-step process. Estimating a Population Mean
Estimating a Population Mean Example #2 Construct a 98% Confidence Interval based on the following data: 45, 55, 67, 45, 68, 79, 98, 87, 84, 82. Estimating a Population Mean
Estimating a Population Mean Using t Procedures Wisely The stated confidence level of a one-sample t interval for µ is exactly correct when the population distribution is exactly Normal. No population of real data is exactly Normal. The usefulness of the t procedures in practice therefore depends on how strongly they are affected by lack of Normality. Estimating a Population Mean Definition: An inference procedure is called robust if the probability calculations involved in the procedure remain fairly accurate when a condition for using the procedures is violated. Fortunately, the t procedures are quite robust against non-Normality of the population except when outliers or strong skewness are present. Larger samples improve the accuracy of critical values from the t distributions when the population is not Normal.
Estimating a Population Mean Using t Procedures Wisely Except in the case of small samples, the condition that the data come from a random sample or randomized experiment is more important than the condition that the population distribution is Normal. Here are practical guidelines for the Normal condition when performing inference about a population mean. Estimating a Population Mean Using One-Sample t Procedures: The Normal Condition • Sample size less than 15: Use t procedures if the data appear close to Normal (roughly symmetric, single peak, no outliers). If the data are clearly skewed or if outliers are present, do not use t. • Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. • Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n ≥ 30.