Basic Properties of Confidence Intervals

Basic Properties of Confidence Intervals
The basic concepts and properties of confidence intervals (CIs) are most easily introduced by first focusing on a simple, albeit somewhat unrealistic, problem situation. Suppose that the parameter of interest is a population mean  and that The population distribution is normal 2. The value of the population standard deviation  is known Normality of the population distribution is often a reasonable assumption.

However, if the value of  is unknown, it is typically implausible that the value of  would be available (knowledge of a population’s center typically precedes information concerning spread). The actual sample observations x1, x2, …, xn are assumed to be the result of a random sample X1, …, Xn from a normal distribution with mean value  and standard deviation .

Irrespective of the sample size n, the sample mean X is normally distributed with expected value  and standard deviation Standardizing X by first subtracting its expected value and then dividing by its standard deviation yields the standard normal variable (7.1)

Because the area under the standard normal curve between –1.96 and 1.96 is .95, Now let’s manipulate the inequalities inside the parentheses in (7.2) so that they appear in the equivalent form l < < , where the endpoints l and u involve X and This is achieved through the following sequence of operations, each yielding inequalities equivalent to the original ones. (7.2)

1. Multiply through by Subtract X from each term:

3. Multiply through by –1 to eliminate the minus sign in front of  (which reverses the direction of each inequality): that is, (7.3)

This CI can be expressed either as or as

Interpreting a Confidence Level
But by substituting x = 80.0 for X, all randomness disappears; the interval (79.3, 80.7) is not a random interval, and  is a constant (unfortunately unknown to us). It is therefore incorrect to write the statement P( lies in (79.3, 80.7)) = A correct interpretation of “95% confidence” relies on the long-run relative frequency interpretation of probability: To say that an event A has probability .95 is to say that if the experiment on which A is defined is performed over and over again, in the long run A will occur 95% of the time.

This is illustrated in Figure 7.3, where the vertical line cuts the measurement axis at the true (but unknown) value of . One hundred 95% CIs (asterisks identify intervals that do not include ). Figure 7.3

Notice that 7 of the 100 intervals shown fail to contain . In the long run, only 5% of the intervals so constructed would fail to contain . According to this interpretation, the confidence level 95% is not so much a statement about any particular interval such as (79.3, 80.7). Instead it pertains to what would happen if a very large number of like intervals were to be constructed using the same CI formula.

Although this may seem unsatisfactory, the root of the difficulty lies with our interpretation of probability—it applies to a long sequence of replications of an experiment rather than just a single replication. There is another approach to the construction and interpretation of CIs that uses the notion of subjective probability and Bayes’ theorem, but the technical details are beyond the scope of this text; the book by DeGroot, et al. is a good source.

Other Levels of Confidence
As Figure 7.4 shows, a probability of 1 –  is achieved by using z/2 in place of 1.96. P(–z/2  Z < z/2) = 1 –  Figure 7.4

Other Levels of Confidence
Definition A 100(1 – )% confidence interval for the mean  of a normal population when the value of  is known is given by or, equivalently, by The formula (7.5) for the CI can also be expressed in words as point estimate of   (z critical value) (standard error of the mean). (7.5)

Confidence Level, Precision, and Sample Size
Why settle for a confidence level of 95% when a level of 99% is achievable? Because the price paid for the higher confidence level is a wider interval. Since the 95% interval extends 1.96  to each side of x, the width of the interval is 2(1.96)  = 3.92  Similarly, the width of the 99% interval is 2(2.58)  = 5.16  That is, we have more confidence in the 99% interval precisely because it is wider. The higher the desired degree of confidence, the wider the resulting interval will be.

If we think of the width of the interval as specifying its precision or accuracy, then the confidence level (or reliability) of the interval is inversely related to its precision. A highly reliable interval estimate may be imprecise in that the endpoints of the interval may be far apart, whereas a precise interval may entail relatively low reliability. Thus it cannot be said unequivocally that a 99% interval is to be preferred to a 95% interval; the gain in reliability entails a loss in precision.

A general formula for the sample size n necessary to ensure an interval width w is obtained from equating w to 2  z/2  and solving for n. The sample size necessary for the CI (7.5) to have a width w is The smaller the desired width w, the larger n must be. In addition, n is an increasing function of  (more population variability necessitates a larger sample size) and of the confidence level 100(1 – ) (as  decreases, z/2 increases).

The half-width of the 95% CI is sometimes called the bound on the error of estimation associated with a 95% confidence level. That is, with 95% confidence, the point estimate x will be no farther than this from . Before obtaining data, an investigator may wish to determine a sample size for which a particular value of the bound is achieved.

Large-Sample Confidence Intervals for a Population Mean and Proportion
Earlier we have come across the CI for  which assumed that the population distribution is normal with the value of  known. We now present a large-sample CI whose validity does not require these assumptions. After showing how the argument leading to this interval generalizes to yield other large-sample intervals, we focus on an interval for a population proportion p.

A Large-Sample Interval for 
Let X1, X2, , Xn be a random sample from a population having a mean  and standard deviation . Provided that n is large, the Central Limit Theorem (CLT) implies that has approximately a normal distribution whatever the nature of the population distribution. It then follows that has approximately a standard normal distribution, so that

We have know that an argument parallel yields as a large-sample CI for  with a confidence level of approximately 100(1 – )%. That is, when n is large, the CI for  given previously remains valid whatever the population distribution, provided that the qualifier “approximately” is inserted in front of the confidence level. A practical difficulty with this development is that computation of the CI requires the value of , which will rarely be known. Consider the standardized variable , in which the sample standard deviation S has replaced .

Previously, there was randomness only in the numerator of Z by virtue of . In the new standardized variable, both and S vary in value from one sample to another. So it might seem that the distribution of the new variable should be more spread out than the z curve to reflect the extra variation in the denominator. This is indeed true when n is small. However, for large n the subsititution of S for  adds little extra variability, so this variable also has approximately a standard normal distribution. Manipulation of the variable in a probability statement, as in the case of known , gives a general large-sample CI for .

Proposition If n is sufficiently large, the standardized variable has approximately a standard normal distribution. This implies that is a large-sample confidence interval for  with confidence level approximately 100(1 – )%. This formula is valid regardless of the shape of the population distribution. (7.8)

In words, the CI (7.8) is point estimate of   (z critical value) (estimated standard error of the mean). Generally speaking, n > 40 will be sufficient to justify the use of this interval. This is somewhat more conservative than the rule of thumb for the CLT because of the additional variability introduced by using S in place of .

A General Large-Sample Confidence Interval
The large-sample intervals and are special cases of a general large-sample CI for a parameter . Suppose that is an estimator satisfying the following properties: (1) It has approximately a normal distribution; (2) it is (at least approximately) unbiased; and (3) an expression for , the standard deviation of , is available.

For example, in the case  = , = is an unbiased estimator whose distribution is approximately normal when n is large and Standardizing yields the rv , which has approximately a standard normal distribution. This justifies the probability statement Suppose first that does not involve any unknown parameters (e.g., known  in the case  = ). (7.9)

Then replacing each < in (7.9) by = results in , so the lower and upper confidence limits are and , respectively. Now suppose that does not involve  but does involve at least one other unknown parameter. Let be the estimate of obtained by using estimates in place of the unknown parameters (e.g., estimates ). Under general conditions (essentially that be close to for most samples), a valid CI is The large-sample interval is an example.

A Confidence Interval for a Population Proportion
Let p denote the proportion of “successes” in a population, where success identifies an individual or object that has a specified property (e.g., individuals who graduated from college, computers that do not need warranty service, etc.). A random sample of n individuals is to be selected, and X is the number of successes in the sample. Provided that n is small compared to the population size, X can be regarded as a binomial rv with E(X) = np and Furthermore, if both np  10 and nq  10, (q = 1 – p), X has approximately a normal distribution.

The natural estimator of p is = X/n, the sample fraction of successes. Since is just X multiplied by the constant 1/n, also has approximately a normal distribution. As we know that, E( ) = p (unbiasedness) and The standard deviation involves the unknown parameter p. Standardizing by subtracting p and dividing by then implies that

If the sample size n is very large, then z2/2n is generally quite negligible (small) compared to and z2/n is quite negligible compared to 1, from which In this case z2/4n2 is also negligible compared to pq/n (n2 is a much larger divisor than is n); as a result, the dominant term in the  expression is and the score interval is approximately This latter interval has the general form of a large-sample interval suggested in the last subsection. (7.11)

Intervals Based on a Normal Population Distribution
The CI for  presented in earlier section is valid provided that n is large. The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked, however, when n is small. In this case, one way to proceed is to make a specific assumption about the form of the population distribution and then derive a CI tailored to that assumption. For example, we could develop a CI for  when the population is described by a gamma distribution, another interval for the case of a Weibull distribution, and so on.

Statisticians have indeed carried out this program for a number of different distributional families. Because the normal distribution is more frequently appropriate as a population model than is any other type of distribution, we will focus here on a CI for this situation. Assumption The population of interest is normal, so that X1, … , Xn constitutes a random sample from a normal distribution with both  and  unknown.

The key result underlying the interval in earlier section was that for large n, the rv has approximately a standard normal distribution. When n is small, S is no longer likely to be close to s, so the variability in the distribution of Z arises from randomness in both the numerator and the denominator. This implies that the probability distribution of will be more spread out than the standard normal distribution.

The result on which inferences are based introduces a new family of probability distributions called t distributions. Theorem When is the mean of a random sample of size n from a normal distribution with mean , the rv has a probability distribution called a t distribution with n – 1 degrees of freedom (df). (7.13)

Properties of t Distributions
Before applying this theorem, a discussion of properties of t distributions is in order. Although the variable of interest is still , we now denote it by T to emphasize that it does not have a standard normal distribution when n is small. We know that a normal distribution is governed by two parameters; each different choice of  in combination with  gives a particular normal distribution. Any particular t distribution results from specifying the value of a single parameter, called the number of degrees of freedom, abbreviated df.

We’ll denote this parameter by the Greek letter . Possible values of  are the positive integers 1, 2, 3, . So there is a t distribution with 1 df, another with 2 df, yet another with 3 df, and so on. For any fixed value of , the density function that specifies the associated t curve is even more complicated than the normal density function. Fortunately, we need concern ourselves only with several of the more important features of these curves.

Let t denote the t distribution with  df. 1. Each t curve is bell-shaped and centered at 0. 2. Each t curve is more spread out than the standard normal (z) curve. 3. As  increases, the spread of the corresponding t curve decreases. 4. As  , the sequence of t curves approaches the standard normal curve (so the z curve is often called the t curve with df = ).

Figure 7.7 illustrates several of these properties for selected values of . t and z curves Figure 7.7

The number of df for T in (7.13) is n – 1 because, although S is based on the n deviations implies that only n – 1 of these are “freely determined.” The number of df for a t variable is the number of freely determined deviations on which the estimated standard deviation in the denominator of T is based. The use of t distribution in making inferences requires notation for capturing t-curve tail areas analogous to for the curve. You might think that t would do the trick. However, the desired value depends not only on the tail area captured but also on df.

Notation Let t, = the number on the measurement axis for which the area under the t curve with  df to the right of t, is ; t, is called a t critical value. For example, t.05,6 is the t critical value that captures an upper-tail area of .05 under the t curve with 6 df. The general notation is illustrated in Figure 7.8. Illustration of a t critical value Figure 7.8

Because t curves are symmetric about zero, –t, captures lower-tail area . Appendix Table A.5 gives t, for selected values of  and . This table also appears inside the back cover. The columns of the table correspond to different values of . To obtain t.05,15, go to the  =.05 column, look down to the  = 15 row, and read t.05,15 = Similarly, t.05,22 = (.05 column,  = 22 row), and t.01,22 =

The values of t, exhibit regular behavior as we move across a row or down a column. For fixed , t, increases as  decreases, since we must move farther to the right of zero to capture area  in the tail. For fixed , as  is increased (i.e., as we look down any particular column of the t table) the value of t, decreases. This is because a larger value of  implies a t distribution with smaller spread, so it is not necessary to go so far from zero to capture tail area .

Furthermore, t, decreases more slowly as  increases. Consequently, the table values are shown in increments of 2 between 30 df and 40 df and then jump to = 50, 60, 120 and finally Because is the standard normal curve, the familiar values appear in the last row of the table. The rule of thumb suggested earlier for use of the large-sample CI (if n > 40) comes from the approximate equality of the standard normal and t distributions for  40.

The One-Sample t Confidence Interval
The standardized variable T has a t distribution with n – 1 df, and the area under the corresponding t density curve between –t/2,n – 1 and t/2,n – 1 is 1 –  (area /2 lies in each tail), so P(–t/2,n – 1 < T < t/2,n – 1) = 1 –  Expression (7.14) differs from expressions in previous sections in that T and t/2,n – 1 are used in place of Z and but it can be manipulated in the same manner to obtain a confidence interval for . (7.14)

Proposition Let and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean . Then a 100(1 – )% confidence interval for  is or, more compactly (7.15)

An upper confidence bound for  is and replacing + by – in this latter expression gives a lower confidence bound for , both with confidence level 100(1 – )%.

The values of t, exhibit regular behavior as we move across a row or down a column. For fixed , t, increases as  decreases, since we must move farther to the right of zero to capture area  in the tail. For fixed , as  is increased (i.e., as we look down any particular column of the t table) the value of t, decreases. This is because a larger value of  implies a t distribution with smaller spread, so it is not necessary to go so far from zero to capture tail area .

A Prediction Interval for a Single Future Value
Proposition A prediction interval (PI) for a single observation to be selected from a normal population distribution is The prediction level is 100(1 – )%. A lower prediction bound results from replacing t/2 by t and discarding the + part of (7.16); a similar modification gives an upper prediction bound. We skip Sec 7.4 (7.16)

Basic Properties of Confidence Intervals

Similar presentations

Presentation on theme: "Basic Properties of Confidence Intervals"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic Properties of Confidence Intervals

Similar presentations

Presentation on theme: "Basic Properties of Confidence Intervals"— Presentation transcript:

Similar presentations

About project

Feedback