Copyright © Cengage Learning. All rights reserved. 7 Statistical Intervals Based on a Single Sample.

Slides:



Advertisements
Similar presentations
Chapter 7. Statistical Intervals Based on a Single Sample
Advertisements

Copyright © Cengage Learning. All rights reserved. 7 Statistical Intervals Based on a Single Sample
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Introduction to Statistics: Chapter 8 Estimation.
Chapter 7 Estimation: Single Population
Estimation 8.
The Analysis of Variance
Copyright © Cengage Learning. All rights reserved. 7 Statistical Intervals Based on a Single Sample.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Statistical Intervals Based on a Single Sample.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Business Statistics: Communicating with Numbers
Objectives (BPS chapter 14)
Introduction to Statistical Inferences
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Estimating a Population Mean
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Chapter 7 Confidence Intervals (置信区间)
Statistical Intervals Based on a Single Sample
Chap 8-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Business Statistics: A First Course.
Lecture 14 Sections 7.1 – 7.2 Objectives:
Chapter 7. Statistical Intervals Based on a Single Sample Weiqi Luo ( 骆伟祺 ) School of Software Sun Yat-Sen University : Office.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 8: Confidence.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Confidence intervals: The basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Chapter 7. Statistical Intervals Based on a Single Sample Weiqi Luo ( 骆伟祺 ) School of Software Sun Yat-Sen University : Office.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Copyright © Cengage Learning. All rights reserved. 7 Statistical Intervals Based on a Single Sample.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chapter 8: Estimating with Confidence
Statistical Intervals Based on a Single Sample
Chapter 8: Estimating with Confidence
Basic Properties of Confidence Intervals
Confidence Interval Estimation
Statistical Intervals Based on a Single Sample
Hypotheses and test procedures
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Joint Probability Distributions and Random Samples
Sampling Distributions and Estimation
Chapter 8 Confidence Interval Estimation.
Copyright © Cengage Learning. All rights reserved.
Chapter 8: Estimating with Confidence
Chapter 7 Estimation: Single Population
Chapter 8: Estimating with Confidence
Chapter 8 Confidence Intervals.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
2/5/ Estimating a Population Mean.
Distribution-Free Procedures
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Copyright © Cengage Learning. All rights reserved. 7 Statistical Intervals Based on a Single Sample

Copyright © Cengage Learning. All rights reserved. 7.1 Basic Properties of Confidence Intervals

3 The basic concepts and properties of confidence intervals (CIs) are most easily introduced by first focusing on a simple, albeit somewhat unrealistic, problem situation. Suppose that the parameter of interest is a population mean  and that 1. The population distribution is normal 2. The value of the population standard deviation  is known Normality of the population distribution is often a reasonable assumption.

4 Basic Properties of Confidence Intervals However, if the value of  is unknown, it is typically implausible that the value of  would be available (knowledge of a population’s center typically precedes information concerning spread). The actual sample observations x 1, x 2, …, x n are assumed to be the result of a random sample X 1, …, X n from a normal distribution with mean value  and standard deviation .

5 Basic Properties of Confidence Intervals Irrespective of the sample size n, the sample mean X is normally distributed with expected value  and standard deviation Standardizing X by first subtracting its expected value and then dividing by its standard deviation yields the standard normal variable (7.1)

6 Basic Properties of Confidence Intervals Because the area under the standard normal curve between –1.96 and 1.96 is.95, Now let’s manipulate the inequalities inside the parentheses in (7.2) so that they appear in the equivalent form l <  < , where the endpoints l and u involve X and This is achieved through the following sequence of operations, each yielding inequalities equivalent to the original ones. (7.2)

7 Basic Properties of Confidence Intervals 1. Multiply through by 2. Subtract X from each term:

8 Basic Properties of Confidence Intervals 3. Multiply through by –1 to eliminate the minus sign in front of  (which reverses the direction of each inequality): that is, The equivalence of each set of inequalities to the original set implies that (7.3)

9 Basic Properties of Confidence Intervals The event inside the parentheses in (7.3) has a somewhat unfamiliar appearance; previously, the random quantity has appeared in the middle with constants on both ends, as in a  Y  b. In (7.3) the random quantity appears on the two ends, whereas the unknown constant  appears in the middle. To interpret (7.3), think of a random interval having left endpoint X – 1.96  and right endpoint X  In interval notation, this becomes (7.4)

10 Basic Properties of Confidence Intervals The interval (7.4) is random because the two endpoints of the interval involve a random variable. It is centered at the sample mean X and extends 1.96 to each side of X. Thus the interval’s width is 2  (1.96) , which is not random; only the location of the interval (its midpoint X) is random (Figure 7.2). Figure 7.2 The random interval (7.4) centered at X

11 Basic Properties of Confidence Intervals Now (7.3) can be paraphrased as “the probability is.95 that the random interval (7.4) includes or covers the true value of .” Before any experiment is performed and any data is gathered, it is quite likely that  will lie inside the interval (7.4). Definition If, after observing X 1 = x 1, X 2 = x 2, …, X n = x n, we compute the observed sample mean x and then substitute x into (7.4) in place of X, the resulting fixed interval is called a 95% confidence interval for .

12 Basic Properties of Confidence Intervals This CI can be expressed either as or as A concise expression for the interval is x  1.96 , where – gives the left endpoint (lower limit) and + gives the right endpoint (upper limit).

13 Example 2 The quantities needed for computation of the 95% CI for true average preferred height are  = 2.0, n = 31, and x = The resulting interval is That is, we can be highly confident, at the 95% confidence level, that 79.3 <  < This interval is relatively narrow, indicating that  has been rather precisely estimated.

14 Interpreting a Confidence Level

15 Interpreting a Confidence Level The confidence level 95% for the interval just defined was inherited from the probability.95 for the random interval (7.4). Intervals having other levels of confidence will be introduced shortly. For now, though, consider how 95% confidence can be interpreted. Because we started with an event whose probability was.95—that the random interval (7.4) would capture the true value of  —and if CI (79.3, 80.7), it is tempting to conclude that  is within this fixed interval with probability.95.

16 Interpreting a Confidence Level But by substituting x = 80.0 for X, all randomness disappears; the interval (79.3, 80.7) is not a random interval, and  is a constant (unfortunately unknown to us). It is therefore incorrect to write the statement P(  lies in (79.3, 80.7)) =.95. A correct interpretation of “95% confidence” relies on the long-run relative frequency interpretation of probability: To say that an event A has probability.95 is to say that if the experiment on which A is defined is performed over and over again, in the long run A will occur 95% of the time.

17 Interpreting a Confidence Level Suppose we obtain another sample of typists’ preferred heights and compute another 95% interval. Then we consider repeating this for a third sample, a fourth sample, a fifth sample, and so on. Let A be the event that X – 1.96  <  < X , Since P(A) =.95, in the long run 95% of our computed CIs will contain .

18 Interpreting a Confidence Level This is illustrated in Figure 7.3, where the vertical line cuts the measurement axis at the true (but unknown) value of . Figure 7.3 One hundred 95% CIs (asterisks identify intervals that do not include  ).

19 Interpreting a Confidence Level Notice that 7 of the 100 intervals shown fail to contain . In the long run, only 5% of the intervals so constructed would fail to contain . According to this interpretation, the confidence level 95% is not so much a statement about any particular interval such as (79.3, 80.7). Instead it pertains to what would happen if a very large number of like intervals were to be constructed using the same CI formula.

20 Interpreting a Confidence Level Although this may seem unsatisfactory, the root of the difficulty lies with our interpretation of probability—it applies to a long sequence of replications of an experiment rather than just a single replication. There is another approach to the construction and interpretation of CIs that uses the notion of subjective probability and Bayes’ theorem, but the technical details are beyond the scope of this text; the book by DeGroot, et al. is a good source.

21 Interpreting a Confidence Level The interval presented here (as well as each interval presented subsequently) is called a “classical” CI because its interpretation rests on the classical notion of probability.

22 Other Levels of Confidence

23 Other Levels of Confidence The confidence level of 95% was inherited from the probability.95 for the initial inequalities in (7.2). If a confidence level of 99% is desired, the initial probability of.95 must be replaced by.99, which necessitates changing the z critical value from 1.96 to A 99% CI then results from using 2.58 in place of 1.96 in the formula for the 95% CI. In fact, any desired level of confidence can be achieved by replacing 1.96 or 2.58 with the appropriate standard normal critical value.

24 Other Levels of Confidence As Figure 7.4 shows, a probability of 1 –  is achieved by using z  /2 in place of Figure 7.4 P(–z  /2  Z < z  /2 ) = 1 – 

25 Other Levels of Confidence Definition A 100(1 –  )% confidence interval for the mean  of a normal population when the value of  is known is given by or, equivalently, by The formula (7.5) for the CI can also be expressed in words as point estimate of   (z critical value) (standard error of the mean). (7.5)

26 Example 3 The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushings on the housings was normal with a standard deviation of.100 mm. It is believed that the modification has not affected the shape of the distribution or the standard deviation, but that the value of the mean diameter may have changed. A sample of 40 housing units is selected and hole diameter is determined for each one, resulting in a sample mean diameter of mm.

27 Example 3 Let’s calculate a confidence interval for true average hole diameter using a confidence level of 90%. This requires that 100(1 –  ) = 90, from which  =.10 and z  /2 = z.05 = (corresponding to a cumulative z-curve area of.9500). The desired interval is then With a reasonably high degree of confidence, we can say that <  < This interval is rather narrow because of the small amount of variability in hole diameter (  =.100). cont’d

28 Confidence Level, Precision, and Sample Size

29 Confidence Level, Precision, and Sample Size Why settle for a confidence level of 95% when a level of 99% is achievable? Because the price paid for the higher confidence level is a wider interval. Since the 95% interval extends 1.96  to each side of x, the width of the interval is 2(1.96)  = 3.92 . Similarly, the width of the 99% interval is 2(2.58)  = 5.16 . That is, we have more confidence in the 99% interval precisely because it is wider. The higher the desired degree of confidence, the wider the resulting interval will be.

30 Confidence Level, Precision, and Sample Size If we think of the width of the interval as specifying its precision or accuracy, then the confidence level (or reliability) of the interval is inversely related to its precision. A highly reliable interval estimate may be imprecise in that the endpoints of the interval may be far apart, whereas a precise interval may entail relatively low reliability. Thus it cannot be said unequivocally that a 99% interval is to be preferred to a 95% interval; the gain in reliability entails a loss in precision.

31 Confidence Level, Precision, and Sample Size An appealing strategy is to specify both the desired confidence level and interval width and then determine the necessary sample size.

32 Example 4 Extensive monitoring of a computer time-sharing system has suggested that response time to a particular editing command is normally distributed with standard deviation 25 millisec. A new operating system has been installed, and we wish to estimate the true average response time  for the new environment. Assuming that response times are still normally distributed with  = 25, what sample size is necessary to ensure that the resulting 95% CI has a width of (at most) 10?

33 Example 4 The sample size n must satisfy Rearranging this equation gives = 2  (1.96)(25)/10 = 9.80 So n = (9.80) 2 = Since n must be an integer, a sample size of 97 is required. cont’d

34 Confidence Level, Precision, and Sample Size A general formula for the sample size n necessary to ensure an interval width w is obtained from equating w to 2  z  /2  and solving for n. The sample size necessary for the CI (7.5) to have a width w is The smaller the desired width w, the larger n must be. In addition, n is an increasing function of  (more population variability necessitates a larger sample size) and of the confidence level 100(1 –  ) (as  decreases, z  /2 increases).

35 Confidence Level, Precision, and Sample Size The half-width 1.96 of the 95% CI is sometimes called the bound on the error of estimation associated with a 95% confidence level. That is, with 95% confidence, the point estimate x will be no farther than this from . Before obtaining data, an investigator may wish to determine a sample size for which a particular value of the bound is achieved.

36 Confidence Level, Precision, and Sample Size For example, with  representing the average fuel efficiency (mpg) for all cars of a certain type, the objective of an investigation may be to estimate  to within 1 mpg with 95% confidence. More generally, if we wish to estimate  to within an amount B (the specified bound on the error of estimation) with 100(1 –  ) % confidence, the necessary sample size results from replacing 2/w by 1/B in the formula in the preceding box.

37 Deriving a Confidence Interval

38 Deriving a Confidence Interval Let X 1, X 2, …, X n denote the sample on which the CI for a parameter  is to be based. Suppose a random variable satisfying the following two properties can be found: 1. The variable depends functionally on both X 1, …,X n and . 2. The probability distribution of the variable does not depend on  or on any other unknown parameters.

39 Deriving a Confidence Interval Let h(X 1, X 2, …, X n ;  ) denote this random variable. For example, if the population distribution is normal with known  and  = , the variable h(X 1, …, X n ;  ) = (X –  )/( ) satisfies both properties; it clearly depends functionally on , yet has the standard normal probability distribution, which does not depend on . In general, the form of the h function is usually suggested by examining the distribution of an appropriate estimator

40 Deriving a Confidence Interval For any  between 0 and 1, constants a and b can be found to satisfy P(a < h(X 1, …, X n ;  ) < b = 1 –  Because of the second property, a and b do not depend on . In the normal example, a = –z  /2 and b = z  /2. Now suppose that the inequalities in (7.6) can be manipulated to isolate , giving the equivalent probability statement. P(l(X 1, X 2, …, X n ) <  < u(X 1, X 2, …, X n )) = 1 –  (7.6)

41 Deriving a Confidence Interval Then l(x 1, x 2, …, x n ) and u(x 1, …, x n ) are the lower and upper confidence limits, respectively, for a 100(1 –  )% CI. In the normal example, we saw that l(x 1, …, x n ) = X – z  /2  and u(X 1, …, X n ) = X + z  /2 .

42 Example 5 A theoretical model suggests that the time to breakdown of an insulating fluid between electrodes at a particular voltage has an exponential distribution with parameter. A random sample of n = 10 breakdown times yields the following sample data (in min): x 1 = 41.53, x 2 = 18.73, x 3 = 2.99, x 4 = 30.34, x 5 = 12.33, x 6 = , x 7 = 73.02, x 8 = , x 9 = 4.00, x 10 = A 95% CI for and for the true average breakdown time are desired.

43 Example 5 Let h(X 1, X 2, …, X n ; ) = 2 X i. It can be shown that this random variable has a probability distribution called a chi-squared distribution with 2n degrees of freedom (df) ( v = 2n, where v is the parameter of a chi-squared distribution). Appendix Table A.7 pictures a typical chi-squared density curve and tabulates critical values that capture specified tail areas. The relevant number of df here is 2(10) = 20. The v = 20 row of the table shows that captures upper-tail area.025 and captures lower-tail area.025 (upper-tail area.975). cont’d

44 Example 5 Thus for n = 10, P(9.591 < 2 X i, ) =.95 Division by 2  X i isolates, yielding P(9.591/(2  X i ) < < (34.170/(2  X i )) =.95 The lower limit of the 95% CI for is 9.591/(2  X i ), and the upper limit is /(2  X i ). cont’d

45 Example 5 For the given data,  x i = , giving the interval (.00871,.03101). The expected value of an exponential rv is  = 1/. Since P((2  X i / < 1/ < (2  X i /9.591) =.95 the 95% CI for true average breakdown time is (2  x i /34.170, 2  x i /9.591) = (32.24, ). This interval is obviously quite wide, reflecting substantial variability in breakdown times and a small sample size. cont’d

46 Bootstrap Confidence Intervals

47 Bootstrap Confidence Intervals We have learned the bootstrap technique as a way of estimating It can also be applied to obtain a CI for . Consider again estimating the mean  of a normal distribution when  is known. Let’s replace  by  and use = X as the point estimator. Notice that 1.96 is the 97.5th percentile of the distribution of –  [that is, P(X –  < 1.96 ) = P(Z < 1.96) =.9750].

48 Bootstrap Confidence Intervals Similarly, –1.96 is the 2.5th percentile, so.95 = P(2.5th percentile < – < 97.5th percentile) = P( – 2.5th percentile > > – 97.5th percentile) That is, with l = – 97.5th percentile of – u = – 2.5th percentile of – the CI for is (l, u). (7.7)

49 Bootstrap Confidence Intervals In many cases, the percentiles in (7.7) cannot be calculated, but they can be estimated from bootstrap samples. Suppose we obtain B = 1000 bootstrap samples and calculate and followed by the 1000 differences The 25th largest and 25th smallest of these differences are estimates of the unknown percentiles in (7.7).