Chapter 10 Estimating Means and Proportions Stat-Slide-Show, Copyright 1994-95 by Quant Systems Inc.
How do you estimate these unknown parameters? The Problem Process or Population m = ? s2 = ? p = ? How do you estimate these unknown parameters?
Definition Using properly drawn sample data to draw conclusions about the population is called statistical inference. Process or Population m = ? Sample is a sample estimate of m .
This is an estimate of the population mean m . Definitions An estimator is a strategy or rule that is used to estimate a population parameter. For example, use to estimate m s2 to estimate s2 If the rule is applied to a specific set of data, the result is an estimate. Example: = 33.2 and s2 are estimators. This is an estimate of the population mean m .
Statistical Inference Statistical inference permits the estimation of statistical measures (means, proportions, etc.) with a known degree of confidence. This ability to assess the quality (confidence) of estimates is one of the significant benefits statistics brings to decision making and problem solving.
Randomly Selected Samples If samples are selected randomly, the uncertainty of the inference is measurable. The ability to measure the confidence associated with a statistical inference is the value received for drawing random samples. If samples are not selected randomly, there will be no known relationship between the sample results and the population.
The One-sample Problem This chapter is devoted to the one-sample problem. That is, a sample consisting of n measurements, x1, x2,..., xn, of some population characteristic will be analyzed with the objective of making inferences about the population or process. m = ? s2 = ? p = ?
Estimation
Judgment Estimates Many estimates are subjective, that is, a person with experience in the field is utilized to estimate an unknown population value. The problem with judgment estimates is that their degree of accuracy or inaccuracy cannot be determined. Even if experts exist, statistics offers estimates with known reliability.
Point Estimation of the Population Mean
How can you tell a good estimator from a bad one? Good estimators conform to the rules of horse shoes: the closer to the true population measure, the better. Since the objective in this instance is to estimate the population mean, closeness is measured in terms of the distance the estimate is from the actual population mean.
Estimate Accuracy How can you judge how accurate your estimate is without knowing the true value of the population parameter? It’s similar to shooting an arrow at the bull's-eye without being able to see the bull’s-eye. If you can’t see the bull's-eye, how do you know how close you were?
Mean Squared Error An estimator’s average squared distance from the true parameter is referred to as its mean squared error (MSE). The mean squared error for the sample mean is given by:
Finding an Estimator A perfect estimator would have a mean squared error of zero, but there is no such thing as a perfect estimator. Since statistical estimators depend on data which is randomly drawn, they are random variables and cannot always be equal to the true population characteristic. The goal is to find an estimator whose average squared error is the smallest.
Restricting Estimators There are an infinite number of possible estimators and without restricting the kinds of estimators that will be considered, very little progress can be made.
Unbiasedness On desirable restriction is unbiasedness. To be an unbiased estimator, the expected value of the estimator must be equal to the parameter that is being estimated. For example, is an unbiased estimator of the population mean since
Unbiased Estimators There are many estimators that are unbiased estimators of the population mean: including the sample mean, sample median, or any single sample value. Among unbiased estimators the mean squared error is equal to the variance of the estimator. Among unbiased estimators, the sample mean has the smallest mean squared error. Consequently, there is no other unbiased estimator that can consistently do a better job of estimating the population mean.
Interval Estimation of the Population Mean
Precision of the Estimate One of the limitations of simply reporting a point estimate is the lack of information concerning the estimator’s accuracy. Example: If 33.2 is a point estimate of the population mean, how good is this estimate? Interval estimates, however, are constructed to provide additional information about the precision of the estimate.
Constructing an Interval estimator An interval estimator is made by developing an upper and a lower boundary for an interval that will hopefully contain the population parameter. It would be easy to construct an interval estimator that would definitely contain a population parameter, namely minus infinity to positive infinity.
Constructing an Interval estimator However, this particular interval estimator would not contain any useful information about the location of the population parameter. In interval estimation, the smaller the interval for a given amount of confidence, the better.
Central Limit Theorem Recall that if the sample size is reasonable large (n > 30), the central limit theorem ensures that has an approximate normal distribution with mean, m, and variance, . m
Example 1 The sampling distribution can be used to develop an interval estimator. For the standard normal random variable, P(-2.17 < z < 2.17) = .97.
Example 1 Since can be transformed in the standard normal random variable by using the z-transform, then by substitution, and with some algebraic manipulation we obtain
Example 1 The expression above suggests a specific form for the interval. The population mean will fall within the interval: 97% of the time.
Example 1 After the sample is selected, the sample mean is no longer a random variable. is a random variable, but = 33.2 is the sample mean for a particular sample. Suppose a sample has been drawn from a population with a standard deviation of 200, and the following characteristics have been observed: n = 100, and = 150. Note:
Example 1 The resulting interval would be That is, [ ] 150 [ ] 150 106.6 193.4
Example 1 Is the population mean (m) inside this interval? [ ] 150 106.6 193.4 Is the population mean (m) inside this interval?
Example 1 Even though the interval is calculated using a technique that captures the population mean 97% of the time, it would not be appropriate, from a relative frequency point of view, to state that P(106.6 < m < 193.4) = .97 since the population mean is an unknown but constant quantity.
Example 1 Either m will always be inside the interval or will always be outside the interval. What information do we have about the interval?
Example 1 Since it was constructed from a technique that will include the true population mean in the interval .97 of the time, we are 97% confident in the technique. Confidence is one way of expressing a subjective probability. Hence, the term confidence interval is used to describe the method of construction rather than a particular interval.
Example 1 A 97% confidence interval can be interpreted to mean that if all possible samples of a given size are taken from a population, 97% of the samples would produce intervals that captured the true population mean and 3% would not. The idea of the confidence of a confidence interval is a general one and can be extended to any specified degree of confidence. 80%, 85%, 88%, 95%, 98%, ...
Confidence Interval for the Population Mean Definition: If n>30 or if s is known, and the population being sampled is normal, a (1 - a) confidence interval for the population mean is given by If s is unknown and n>30, s can be used as an approximation for s.
Confidence Interval for the Population Mean The expression, , creates the interval shown below. The term represents the z-value required to obtain an area of 1 - a centered under the standard normal curve. [ ]
Various Z-values The z-values for obtaining various (1 - a) areas centered under the standard normal curve are given in the table below.
Graphs of the Various Z-values -1.28 0 1.28 -1.645 0 1.645 (1 - a) = .95 (1 - a) = .99 -1.96 0 1.96 -2.58 0 2.58
To achieve more confidence we must pay a price. For a fixed sample size, the larger the desired confidence, the greater the number of standard deviations that must be used to form the boundary points for the confidence interval. When the interval becomes wider, the resulting information provides a less precise location of the population mean.
Error of Estimation We can also think about the confidence interval as a means of describing the quality of a point estimate. point estimate maximum error of estimation with a specific level of confidence (1 - a)
Example 2 Find for the following levels of a. 1. a = .02 2. a = .08
Example 2 - Solution 1. a = .02 .49 .01
Example 2 - Solution 2. a = .08 .46 .04
Example 3 Find for the following confidence levels: 1. 96% 2. 88%
Example 3 - Solution 1. 1 - a = .96 a = .04 .48 .02
Example 3 - Solution 2. 1 - a = .88 a = .12 .44 .06
Example 4 A paint manufacturer is developing a new type of paint. Thirty panels were exposed to various corrosive conditions to measure the protective ability of the paint. The mean life for the samples was 168 hours before corrosive failure.
Example 4 The life of paint samples is assumed to be normally distributed with population standard deviation of 30 hours. Find the 95% confidence interval for the mean life of the paint.
Example 4 - Solution We are given: X = time before corrosive failure of the paint has a normal distribution, s = 30, n = 30, = 168, and the confidence level = .95.
Example 4 - Solution 1 - a = .95 a = .05 .475 .025
Example 4 - Solution We want to determine a 95% confidence interval for the true mean life before corrosive failure. Since X is normal and s is known, the confidence interval is given by