Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance.

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance

Sample Variance It is possible, in fact easy to estimate the mean of a sample because it make sense to expect the sample mean to be close to the population mean. This is true even for small samples. However, when it comes to the variance, it is found that the value changes quite a lot, depending on the size of the sample. It has been found that, if the size of a sample is n, then the ratio variance of sample variance s 2 to population variance  2 follows the  2 -distribution of degree n–1. I.e. (n–1)s 2 /  2 ~  2 n-1.

Introducing  2 Statisticians found that the value of variance in a sample follows a certain distribution, called  2 - distribution. This distribution is highly skewed to the right. Its value depends very much on n, the size of the sample. It is very seldom that we are interested in the probability of a sample variance being of a certain value. Rather, we are usually more interested in, say, the lower and upper limit of the variance.

The Graph of  2

 2 tables The UTM  2 tables list the values of  2 at 0.001, 0.005, 0.010, 0.025, 0.05, 0.10, 0.25, 0.5, 0.70, 0.90, 0.95, 0.975, 0.99, 0.995 for n = 1, 2, …,120. Unlike the t-distributions,  2 distributions are highly skewed. The table gives the probabilities on the left and right ends of distributions.

 2 table: P(  2  k) =  Case  < 0.5. Value of k large. Case  > 0.5. Value of k small.

Interpreting  2 values  0.001 0.005 0.010 0.025 … 0.975 0.990 0.995 = 1 10.827 7.879 6.635 5.024 … 0.001 0.000 0.000 3 16.266 12.838 11.345 9.348 … 0.216 0.115 0.072 6 22.457 18.548 16.812 14.449 … 1.237 0.872 0.676 Example: The values of  2 are read separately for  near 0 and  near 1. = 1, P(  2 > 7.879) = 0.005; P(  2 > 0.000) = 0.995. = 3, P(  2 > 12.838) = 0.005; P(  2 > 0.072) = 0.995. = 6, P(  2 > 16.812) = 0.010; P(  2 > 0.872) = 0.990.

Example 1 The standard deviation of the life of a certain car battery is 5.8 months. A repair shop just received a sample of 7 batteries. Find the 95% confidence interval of the standard deviation of the sample. Solution: (7–1)s 2 /  2 ~  2 6. At 95% confidence,  /2 = 0.025 and 1–  /2 = 0.975. From the table, we read  2 0.025,6 = 14.449 and  2 0.975,6 = 1.237. So 1.237  6s 2 /  2  14.449  2.63  s  9.00.

Example 2 A pharmaceutical company claims that the standard deviation of its 200 mg Vitamin C tablets is 24 mg or less. If we check 21 such tablets, what is the 90% confidence interval of the standard deviation? Solution: (21-1)s 2 /  2 ~  2 20. At 90% confidence,  /2 = 0.05 and 1–  /2 = 0.95. From the table, we read  2 0.05,20 = 32.671 and  2 0.95,20 = 11.591. So 11.591  20s 2 /  2  32.671  18.27 mg  s  30.67 mg.

Example 3 The annual report of HHH Restaurant shows the mean sale of its franchises for the last quarter is RM 5.6 m, with standard deviation of RM 1.25 m. Estimate the 95% confidence intervals for the (i)mean, and (ii)standard deviation for 20 restaurants managed by Ali & Co.

Example 3 (i) Solution Since the mean and standard deviation from the population are given, so we shall use the normal distribution X~N(5.6, 1.25 2 /20) to model the sample mean. At 95% confidence,  /2 = 0.025. Z 0.025 = 1.96. So the interval for the sample mean is 5.6 – 1.96×  [1.25 2 /20]  X  5.6 + 1.96×  [1.25 2 /20]  5.05  X  6.15 The range is from RM 5.05 m to RM 6.15 m.

Example 3 (ii) Solution We model the variance using (n–1)s 2 /  2 ~  2 19. At 95% confidence,  /2 = 0.025 and 1–  /2 = 0.975. From the  2 -table, we have  2 0.025,19 = 32.852 and  2 0.975,19 = 8.907. So the inequality is 8.907  19×s 2 /1.25 2  32.852  0.7325  s 2  2.7016  0.856  s  1.644. The range for the standard deviation is RM 0.856 m to RM 1.644 m.

Population variance from Sample variance As for the mean, we usually need to estimate the variance of the population from a sample. In this case, we use the same  2 n–1 distribution for (n–1) s 2 /  2. The calculation of  2 can be obtained directly from the inequality  2  /2,n–1  (n–1)s 2 /  2   2 1–  /2,n–1 ; or we can use the inverse inequality 1/  2 1–  /2,n–1   2 /(n–1)s 2  1/  2  /2,n–1. The result is the same.

Example 4 – Using (n–1)s 2 /  2 From a sample of 10 food samples, it was found that the mean content of a certain poison is 13.5  g with standard deviation 3.7  g. At 90% level, find the confidence intervals of the mean and SD of the poison content for the food. Solution: For mean, we shall use the t-distribution with 9 (=10–1) degrees of freedom since the sample size is small.

Mean: At 90% level,  = 0.1,  /2 = 0.05. Referring to Table 7, t 0.05,9 = 1.833. Hence the mean should lie between 13.5–1.833×3.7/  10 to 13.5+1.833×3.7/  10. So we conclude that the mean is between 11.356  g and 15.64  g. Variance: Using the standard symbols, we have (n– 1)s 2 /  2 ~  2 9. At 90% level,  /2 = 0.05 and 1–  /2 = 0.95. From the table, we find  2 0.05,9 = 16.919 and  2 0.95,9 = 3.325. So 3.325  9×3.7 2 /  2  16.919. From this, we obtain 2.70  g    6.09  g.

Example 4 – Using  2 / (n-1)s 2 Instead of using the distribution for (n–1)s 2 /  2, we can use the form  2 /(n–1)s 2 ~ 1/  2 9. This will then give us at 90% level,  /2 = 0.05 and 1–  /2 = 0.95. The relation for the interval is 1/  2 0.05,9   2 /(n-1)s 2  1/  2 0.95,9. From the table, we have 1/3.325.   2 /9×3.7 2  1/16.919. From the inequality, we obtain the same range 2.70  g    6.09  g.

Example 5 Consumers complain that the price of food vary a lot depending on where you live. In order to ascertain the variation of the price of a plate of fried rice, a survey is made at 75 stalls across the country. The standard deviation turns out to be 68 sen. Based on this survey, estimate the standard deviation of the price of a plate of fried rice for the whole country at the level of 95%.

Example 5 (Solution) We model the standard deviation using the  2 74 distribution: (n – 1)s 2 /  2 ~  2 74. However, the table does not provide for  2 74. The alternative is to use the nearest value, which is  2 75. At 95%,  /2 = 0.025 and 1–  /2 = 0.975. From the table, we read  2 0.025,75 = and  2 0.975,75 = Note: When is large, there is little difference between  2 of one value from another. Using  2 75 instead of  2 74 will not cause any discrepancies.

Example 6 In a health screening, 14 student have their weights and heights taken. Thee BMI are calculated as follows: 32252622182835 25182229332026 Based on this set of data, find the 90% confidence interval for the mean and standard deviation of the BMI for all students. Note: In this case, the raw sample data are given. We are to infer on the population parameters based on the sample data.

Example 6 (Solution) We first find the mean and SD using the calculator as follows: First put your calculator in SD mode. Enter each number using the M+ (called DATA) key. After that, Tap SHIFT, 2. You see three displays: X, X  n and X  n–1. The two SD are called population and sample SD respectively. Since you obtain the data from the sample, you need to choose the sample SD. Thus: mean: X = 25.64, standard deviation: s = 5.37.

Example 6 (Solution) I. Confidence interval for the mean: The sample size 14 is small; so we model the population mean µ using the t-distribution of degree 13. At 90% confidence, α=0.1, α/2=0.05. t 0.05,13 = 1.771. So the confidence interval for the mean is 25.64 – 1.771×  (5.37 2 /14) to 25.64 + 1.771×  (5.37 2 /14)  23.10 to 28.18.

Example 6 (contd) II. Confidence interval for the standard deviation (using the variance) For the variance, the model is (n–1)s 2 /  2 ~  2. which in this case is 13×5.37 2 /  2 ~  2 13. At 90% confidence, α=0.1, α/2=0.05, and 1 – α/2 = 0.95.  2 0.05,13 = 22.362,  2 0.95,13 = 5.892. This means that 5.892 ≤ 13×5.37 2 /  2 ≤ 22.362  1.767 ≤  ≤ 7.977.

Ratio of two variances When variances s 1 2 and s 2 2 are obtained from two samples, either of the same population, or from two comparable populations, then the ratio of the variances s 1 2 /s 2 2 follows the F-distribution of degrees 1 and 2 degrees: s 1 2 / s 2 2 ~F 1, 2

F-distribution F-distribution has two parameters, 1 called the numerator and 2 the denominator. Because the F-distributions are very wildly skewed, depending on the degrees of freedom, table are given only for the right tail of  = 0.001, 0.01, 0.025, and 0.05. We need to determine F values for 0.999, 0.099, 0.975 and 0.95 ourselves, using the fact that if s 1 2 /s 2 2 ~ F 1, 2, then s 2 2 /s 1 2 ~ F 2, 1.

Reading the F-distribution table To find the value of F 0.05,6,7, say, you first look for the 0.05 table. Next you read the top row. This shows the numerator values. Locate 5. On the left, the first column shows the denominators. Locate 7. On this row, under the numerator 5, we see 3.97. So F 0.05,6,7 =3.97.

Obtaining F-value of 1–  We note thats 1 2 / s 2 2 ~F 1, 2  s 2 2 /s 1 2 ~F 2, 1. From this relation, we obtain F 1, 2,1-  as 1/F 2, 1, . For example F 0.01,5,7 = 10.46, so F 0.99,7,5 = 1/10.46 = 0.0956. Conversely, if you need F 0.95,8,4, then read F 0.05,4,8 = 6.04.  F 0.95,8,4 = 1/6.04 = 0.166.

Example 7 (i)The standard deviation of sugar levels among men is 1.23 units. Find the 95% confidence interval of the standard deviation for the sugar levels for a sample of 24 men. (ii)The standard deviation of sugar levels among a sample of 16 men is 1.23 units. Find the 95% confidence interval of the standard deviation for the sugar levels for another sample of 26 men.

7 (i) Solution This is a revision example of the  2 - distribution. We note that the population variance  2 is 1.23 2. By theory, (24–1)s 2 /  2 ~  2 23. At 95% confidence, α=0.05, α/2=0.025, and 1– α/2 = 0.975.  2 0.025,23 = 39.364, and  2 0.975,23 = 12.401. So 12.401  23  s 2 /1.23 2  39.364.  0.903  s  2.589.

7 (ii) Solution Here 1.23 2 is the first sample variance s 1 2. By theory, s 1 2 /s 2 2 ~ F 15,25. At 95% confidence, α=0.05, α/2=0.025, and 1–α/2 = 0.975. F 0.025,15,25 = 2.41; To calculate F 0.975,15,25, we first read F 0.025,25,15 = 2.69. Hence F 0.975,15,25 = 1/2.69 = 0.372. So 0.372  1.23 2 /s 2 2  2.41.  0.792  s 2  2.017.

Example 8 (i)The standard deviation for the monthly pays of a group of 10 workers in a factory is RM 115.65. What is the 90% confidence of the standard deviation of the monthly pays of 8 workers in a similar factory? (ii)The Tourism Council finds the standard deviation of the spending among 15 tourists at a resort to be RM 223.45. Find the 98% confidence interval for the standard deviation of spending among 20 tourists in a similar resort.

8 (i) Solution We take 115.65 2 as the second sample variance s 2 2, and we seek s 1. By theory, s 1 2 /s 2 2 ~ F 7,9. At 90% confidence, α=0.10, α/2=0.05, and 1 – α/2 = 0.95. F 0.05,7,9 = 3.29; For F 0.95,7,9 we read F 0.05,9,7 = 3.68  F 0.95,7,9 = 1/3.68 = 0.272. So 0.272  s 1 2 /115.65 2  3.29  60.34  s 1  209.84.

8 (ii) Solution Again we take 223.45 2 as the second sample variance s 2 2, and we seek s 1. By theory, s 1 2 /s 2 2 ~ F 19,14. At 98% confidence, α=0.02, α/2=0.01, and 1 – α/2 = 0.99. Unfortunately, the F-table does not give values for F 19,14, and neither do we have F 14,19. In this case, we take the nearest value, i.e. F 0.01,20,15 = 3.37, and for F 0.99,19, 14, we read F 0.01,15,20 = 3.09,  F 0.99,19,14 = 1/3.09 = 0.3236. Hence 0.3236  s 1 2 /115.65 2  3.37.  65.79  s 1  212.31. At 98% confidence, the range of the standard deviation is RM 65.79 to RM 212.31.

Wide range for Variance We note that, unlike the mean, the confidence intervals for the variance is rather wide. Increasing the size of sample does not significantly reduce the range of the variance. This is the nature of things in that while variations in values cancel each other, leading to the mean closer to the expected value, the variation in values will remain, thus causing large variances. In fact, when we have an unexpectedly small range of variance, we should suspect that some unusual factors have caused values to converge. This means the data are not natural and are suspect.

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance.

Similar presentations

Presentation on theme: "Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance.

Similar presentations

Presentation on theme: "Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3B - Variance."— Presentation transcript:

Similar presentations

About project

Feedback