Modeling with the normal distribution The normal distribution is often used as a model for practical situations. In the example that follows, you need to translate the given information into the language of the normal distribution in order to solve the problem.
The table provides data on the leaf example on page 134 of your textbook: Calculate the mean and standard deviation for the above data. Use the original data on page 133. 𝜇=61.4 𝜎=16.8 𝜎 2 =282.16 Relative Class Relative frequency Length (mm) f Frequency Boundaries Width density 30-39 3 0.060 29.5-39.5 10 0.006 40-49 9 0.180 39.5-49.5 0.018 50-59 15 0.300 49.5-59.5 0.030 60-69 59.5-69.5 70-79 6 0.120 69.5-79.5 0.012 80-89 4 0.080 79.5-89.5 0.008 90-99 89.5-99.5 100-109 1 0.020 99.5-109.5 0.002 Total: 50.000 1.000
𝑓 𝑥 𝑖 𝑝 𝑖 = 𝑓 𝑛 𝑥 𝑖 × 𝑝 𝑖 𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62 𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62 961.000 19.22 34.000 0.68 1156.000 23.12 39.000 0.78 1521.000 30.42 40.000 0.8 1600.000 32 2 42.000 0.040 1.68 1764.000 70.56 43.000 0.86 1849.000 36.98 44.000 0.88 1936.000 38.72 46.000 0.92 2116.000 42.32 47.000 1.88 2209.000 88.36 48.000 0.96 2304.000 46.08 50.000 2500.000 100 52.000 2.08 2704.000 108.16 53.000 2.12 2809.000 112.36 54.000 2.16 2916.000 116.64 56.000 1.12 3136.000 62.72 5 57.000 0.100 5.7 3249.000 324.9 58.000 1.16 3364.000 67.28 60.000 2.4 3600.000 144 63.000 2.52 3969.000 158.76 66.000 1.32 4356.000 87.12 67.000 2.68 4489.000 179.56 68.000 2.72 4624.000 184.96 70.000 2.8 4900.000 196 72.000 2.88 5184.000 207.36 78.000 1.56 6084.000 121.68 79.000 1.58 6241.000 124.82 83.000 1.66 6889.000 137.78 85.000 3.4 7225.000 289 89.000 1.78 7921.000 158.42 90.000 1.8 8100.000 162 94.000 8836.000 176.72 99.000 1.98 9801.000 196.02 102.000 2.04 10404.000 208.08 50 61.4 4052.120 Total (Mean) 282.160 Var. 16.798 Std. Dev.
Suppose that a random variable 𝐿 with 𝑁 61. 4, 16 Suppose that a random variable 𝐿 with 𝑁 61.4, 16.8 2 distribution, is a model for the grouped frequency table (pg. 134). How can we verify the normal distribution is a good model for the coffee tree leaf lengths? The way to do this is to calculate the total theoretical probability (expected frequency in a normal distribution) and compare it the total relative frequency. If there is a small difference between the theoretical probability and relative frequency (empirical probability), then the 𝑁 61.4, 16.8 2 distribution is a good model. A small difference α<0.05, where α= 𝒑 𝒊 − 𝒇 𝒊 , 𝑝 𝑖 is the theoretical probability and 𝑓 𝑖 is the relative frequency (empirical probability), is usually a good confidence level. Thus, if α<0.05, then we can assume that the normal distribution is a good model.
The expected frequency in the interval 59. 5≤𝑙≤69 The expected frequency in the interval 59.5≤𝑙≤69.5 can be calculated as follows: Given that 𝐿~𝑁 61.4, 16.8 2 , let 𝑍= 𝐿−61.4 16.8 so that 𝑍~𝑁(0,1). 𝑃 59.5≤𝑙≤69.5 =𝑃 59.5−61.4 16.8 ≤𝑍≤ 69.5−61.4 16.8 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.113≤𝑍≤0.482 𝑃 59.5≤𝑙≤69.5 =𝜙 0.482)−𝜙(−0.113 =𝜙 0.482)− 1−𝜙(0.113) 𝜙 0.482)− 1−𝜙(0.113) =0.6851− 1−0.545 =0.2301 This means that the expected frequency for the class 59.5≤𝑙≤69.5 is 50×0.2301=11.5 (correct to 1 decimal place). Therefore, in a group of 50 leaves you would expect about 11 or 12 leaves to have lengths in the class 59.5≤𝑙≤69.5. The observed frequency was actually 9. Does this mean that the 𝑁 61.4, 16.8 2 distribution is a poor model for these data? To answer this question sensibly, you really need to calculate the expected frequencies for all eight classes.
Using the method shown in the previous slide, find the expected frequencies for the remaining seven classes and then review the results to consider whether the 𝑁 61.4, 16.8 2 is a suitable distribution model for these data. 𝑃 29.5≤𝑙≤39.5 =𝑃 −1.8988≤𝑍≤−1.3036 = 𝜙 −1.3036)−𝜙(−1.8988 𝜙 −1.3036)−𝜙(−1.8988 =1−𝜙 1.3036 − 1−𝜙 1.8988 =0.0962−0.0288=𝟎.𝟎𝟔𝟕𝟒×50=𝟑.𝟑𝟕 𝑃 39.5≤𝑙≤49.5 =𝑃 −1.3036≤𝑍−0.7083 = 𝜙 −0.7083)−𝜙(−1.3036 𝜙 −0.7083)−𝜙(−1.3036 =1−𝜙 0.7083 − 1−𝜙 1.3036 =0.2394−0.0962=𝟎.𝟏𝟒𝟑𝟐×50=𝟕.𝟏𝟔 𝑃 49.5≤𝑙≤59.5 =𝑃 −0.7083≤𝑍−0.1131 = 𝜙 −0.1131)−𝜙(−0.7083 𝜙 −0.1131)−𝜙(−0.7083 =1−𝜙 0.1131 − 1−𝜙 0.7083 =0.455−0.2394=𝟎.𝟐𝟏𝟓𝟔×50=𝟏𝟎.𝟕𝟖 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.1131≤𝑍≤0.4821 =𝜙 0.4821)−𝜙(−0.1131 𝜙 0.4821)−𝜙(−0.1131 =𝜙 0.4821 − 1−𝜙 0.1131 =0.6851−0.455=𝟎.𝟐𝟑𝟎𝟏×50=𝟏1.5 𝑃 69.5≤𝑙≤79.5 =𝑃 0.4821≤𝑍≤1.0774 = 𝜙 1.0774)−𝜙(0.4821 𝜙 1.0774)−𝜙(0.4821 =0.8594−0.6851=𝟎.𝟏𝟕𝟒𝟑×50=𝟖.𝟕𝟏𝟓 𝑃 79.5≤𝑙≤89.5 =𝑃 1.0774≤𝑍≤1.6726 = 𝜙 1.6726)−𝜙(1.0774 𝜙 1.6726)−𝜙(1.0774 =0.9528−0.8594=𝟎.𝟎𝟗𝟑𝟒×50=𝟒.𝟔𝟕 𝑃 89.5≤𝑙≤99.5 =𝑃 1.6726≤𝑍≤2.2679 = 𝜙 2.2679)−𝜙(1.6726 𝜙 2.2679)−𝜙(1.6726 =0.9883−0.9528=𝟎.𝟎𝟑𝟓𝟓×50=𝟏.𝟕𝟖 𝑃 99.5≤𝑙≤109.5 =𝑃 2.2679≤𝑍≤2.8631 = 𝜙 2.8631)−𝜙(2.2679 𝜙 2.8631)−𝜙(2.2679 =0.9979−0.9883=𝟎.𝟎𝟎𝟗𝟔×50=𝟎.𝟒𝟖 Expected frequencies are given in bold red font. Probabilities in bold blue font.
Observed Relative Freq. Theoretical Frequency Empirical Prob. Probability 3 0.06 3 or 4 0.0674 9 0.18 7 or 8 0.1432 15 0.30 10 or 11 0.2156 11 or 12 0.2301 6 0.12 8 or 9 0.1743 4 0.08 4 or 5 0.0934 1 or 2 0.0355 1 0.02 0 or 1 0.0096 50 1.00 0.97 As you can see, 𝛼= 0.97−1 =0.03<0.05 and therefore the normal distribution is a good model. In S2, you will learn about hypothesis testing and more reliable methods of determining whether or not a model is good, based on a statistic such as the mean. The following slide is a foretaste.
In our coffee leaf example, the population mean can be estimated by summing the products of the theoretical probabilities and theoretical frequency for each group using the sample data: 0.0674×34.5 = 2.3253 0.1432×44.5 = 6.3724 0.2156×55.5 =11.9658 0.2301×65.5 = 15.0716 0.1743×75.5 =13.1597 0.0934×85.5 = 7.9857 0.0355×95.5 = 3.3903 0.0096×105.5= 1.0128 Total = 61.28 A simple 𝑧 - hypothesis test involves seeing how far the sample mean 𝑥 is from the population mean 𝜇. 𝑧= 𝑥 −𝜇 𝜎 𝑛 𝑧 is called a test statistic and it indicates the difference between two means. where 𝜎 is the standard deviation of the sample and 𝑛 is the size of the sample. 𝜎 𝑛 is the standard error of the mean. In our coffee leaf example, 𝑥 =61.4, 𝜇=61.28, 𝜎=16.8 and 𝑛=50. 𝑧= 𝑥 −𝜇 𝜎 𝑛 = 61.4−61.28 16.8 50 =0.0505 So, 𝜙 0.0505 =.5201 (also known as 𝑝-value).
So 𝑧 is very close to the population mean of 0 and probability 0.5. In hypothesis testing, a 𝑝-value that is generated from the statistic 𝑧 is significant, only if it is less than the confidence level. Suppose the confidence level is 5%. Then, in the example, 𝑝=0.5>0.05. Therefore we can assume the sample mean is not much different from the population mean, and consequently the data can be modelled by a normal distribution.
Example 1. Two friends Sarah and Hannah often go to the Post Office together. They travel on Sarah’s scooter. Sarah always drives Hannah to the Post Office and drops her off there. Sarah then drives around until she is ready to pick Hannah up some time later. Their experience has been that the time Hannah takes in the Post Office can be approximated by a normal distribution with mean 6 minutes and standard deviation 1.3 minutes. How many minutes after having dropped Hannah off should Sarah return if she wants to be at least 95% certain that Hannah will not keep her waiting? Let 𝑇 be the time Hannah takes in the Post Office on a randomly chosen trip. Then 𝑇~𝑁(6, 1.3 2 ). Let 𝑡 be the number of minutes after dropping Hannah off when Sarah returns; we then need to find 𝑡 such that 𝑃(𝑇≤𝑡)≥0.95. After standardising, the expression 𝑃(𝑇≤𝑡)≥0.95 becomes 𝑃 𝑍≤ 𝑡−6 1.3 ≥0.95 or 𝜙 𝑡−6 1.3 ≥0.95 So, 𝑡−6 1.3 ≥ 𝜙 −1 0.95 =1.645 → 𝑡≥8.1385 Thus, Sarah should not return for at least 8.14 minutes to be at least 95% sure Hannah will not keep her waiting.
Example 2. A biologist has been collecting data on the heights of a particular species of cactus. He has observed that 34.2% of the cacti are below 12 cm in height and 18.4% of the cacti are above 16 cm in height. He assumes that the heights are normally distributed. Find the mean and standard deviation of the distribution. Let the mean and standard deviation of the distribution be 𝜇 and 𝜎 respectively. Then if 𝐻 is the height of a randomly chosen cactus of this species, 𝐻~𝑁(𝜇, 𝜎 2 ). The biologist’s observations can now be written 𝑃 𝐻<12 =0.342 and 𝑃 𝐻>16 =0.184. After standardising using 𝑍= 𝐻−𝜇 𝜎 , these equations become: 𝑃 𝑍< 12−𝜇 𝜎 =0.342 and 𝑃 𝑍> 16−𝜇 𝜎 =0.184 Letting 12−𝜇 𝜎 =𝑠 and 16−𝜇 𝜎 =𝑡, we have: 𝑃 𝑍<𝑠 =0.342 and 𝑃 𝑍>𝑡 =0.184 So, 𝜙 𝑠 =0.342 and 1−𝜙 𝑡 =0.184
Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0. 658, 𝑣=0 Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0.658, 𝑣=0.407 and 𝑠=−0.407. Since 1−𝜙 𝑡 =0.184, 𝜙 𝑡 =0.816 giving 𝑡=0.9. Therefore, 𝑠= 12−𝜇 𝜎 =−0.407 and 𝑡= 16−𝜇 𝜎 =0.9 So we have two equations: 12−𝜇=−0.407𝜎 and 16−𝜇=0.9𝜎 Which we can solve simultaneously to give 𝜇=13.2 and 𝜎=3.06.
Now do Exercise 9C.