Download presentation
Presentation is loading. Please wait.
1
Modeling with the normal distribution
The normal distribution is often used as a model for practical situations. In the example that follows, you need to translate the given information into the language of the normal distribution in order to solve the problem.
2
The table provides data on the leaf example on page 134 of your textbook:
Calculate the mean and standard deviation for the above data. Use the original data on page 133. 𝜇= 𝜎= 𝜎 2 =282.16 Relative Class Relative frequency Length (mm) f Frequency Boundaries Width density 30-39 3 0.060 10 0.006 40-49 9 0.180 0.018 50-59 15 0.300 0.030 60-69 70-79 6 0.120 0.012 80-89 4 0.080 0.008 90-99 1 0.020 0.002 Total: 50.000 1.000
3
𝑓 𝑥 𝑖 𝑝 𝑖 = 𝑓 𝑛 𝑥 𝑖 × 𝑝 𝑖 𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62
𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62 19.22 34.000 0.68 23.12 39.000 0.78 30.42 40.000 0.8 32 2 42.000 0.040 1.68 70.56 43.000 0.86 36.98 44.000 0.88 38.72 46.000 0.92 42.32 47.000 1.88 88.36 48.000 0.96 46.08 50.000 100 52.000 2.08 108.16 53.000 2.12 112.36 54.000 2.16 116.64 56.000 1.12 62.72 5 57.000 0.100 5.7 324.9 58.000 1.16 67.28 60.000 2.4 144 63.000 2.52 158.76 66.000 1.32 87.12 67.000 2.68 179.56 68.000 2.72 184.96 70.000 2.8 196 72.000 2.88 207.36 78.000 1.56 121.68 79.000 1.58 124.82 83.000 1.66 137.78 85.000 3.4 289 89.000 1.78 158.42 90.000 1.8 162 94.000 176.72 99.000 1.98 196.02 2.04 208.08 50 61.4 Total (Mean) Var. 16.798 Std. Dev.
4
Suppose that a random variable 𝐿 with 𝑁 61. 4, 16
Suppose that a random variable 𝐿 with 𝑁 61.4, distribution, is a model for the grouped frequency table (pg. 134). How can we verify the normal distribution is a good model for the coffee tree leaf lengths? The way to do this is to calculate the total theoretical probability (expected frequency in a normal distribution) and compare it the total relative frequency. If there is a small difference between the theoretical probability and relative frequency (empirical probability), then the 𝑁 61.4, distribution is a good model. A small difference α<0.05, where α= 𝒑 𝒊 − 𝒇 𝒊 , 𝑝 𝑖 is the theoretical probability and 𝑓 𝑖 is the relative frequency (empirical probability), is usually a good confidence level. Thus, if α<0.05, then we can assume that the normal distribution is a good model.
5
The expected frequency in the interval 59. 5≤𝑙≤69
The expected frequency in the interval 59.5≤𝑙≤69.5 can be calculated as follows: Given that 𝐿~𝑁 61.4, , let 𝑍= 𝐿− so that 𝑍~𝑁(0,1). 𝑃 59.5≤𝑙≤69.5 =𝑃 59.5− ≤𝑍≤ 69.5− 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.113≤𝑍≤0.482 𝑃 59.5≤𝑙≤69.5 =𝜙 0.482)−𝜙(−0.113 =𝜙 0.482)− 1−𝜙(0.113) 𝜙 0.482)− 1−𝜙(0.113) =0.6851− 1−0.545 =0.2301 This means that the expected frequency for the class ≤𝑙≤69.5 is 50×0.2301=11.5 (correct to 1 decimal place). Therefore, in a group of 50 leaves you would expect about 11 or 12 leaves to have lengths in the class ≤𝑙≤69.5. The observed frequency was actually 9. Does this mean that the 𝑁 61.4, distribution is a poor model for these data? To answer this question sensibly, you really need to calculate the expected frequencies for all eight classes.
6
Using the method shown in the previous slide, find the expected frequencies for the remaining seven classes and then review the results to consider whether the 𝑁 61.4, is a suitable distribution model for these data. 𝑃 29.5≤𝑙≤39.5 =𝑃 −1.8988≤𝑍≤− = 𝜙 −1.3036)−𝜙(−1.8988 𝜙 −1.3036)−𝜙(− =1−𝜙 − 1−𝜙 =0.0962−0.0288=𝟎.𝟎𝟔𝟕𝟒×50=𝟑.𝟑𝟕 𝑃 39.5≤𝑙≤49.5 =𝑃 −1.3036≤𝑍− = 𝜙 −0.7083)−𝜙(−1.3036 𝜙 −0.7083)−𝜙(− =1−𝜙 − 1−𝜙 =0.2394−0.0962=𝟎.𝟏𝟒𝟑𝟐×50=𝟕.𝟏𝟔 𝑃 49.5≤𝑙≤59.5 =𝑃 −0.7083≤𝑍− = 𝜙 −0.1131)−𝜙(−0.7083 𝜙 −0.1131)−𝜙(− =1−𝜙 − 1−𝜙 =0.455−0.2394=𝟎.𝟐𝟏𝟓𝟔×50=𝟏𝟎.𝟕𝟖 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.1131≤𝑍≤ =𝜙 )−𝜙(−0.1131 𝜙 )−𝜙(− =𝜙 − 1−𝜙 =0.6851−0.455=𝟎.𝟐𝟑𝟎𝟏×50=𝟏1.5 𝑃 69.5≤𝑙≤79.5 =𝑃 ≤𝑍≤ = 𝜙 )−𝜙(0.4821 𝜙 )−𝜙( =0.8594−0.6851=𝟎.𝟏𝟕𝟒𝟑×50=𝟖.𝟕𝟏𝟓 𝑃 79.5≤𝑙≤89.5 =𝑃 ≤𝑍≤ = 𝜙 )−𝜙(1.0774 𝜙 )−𝜙( =0.9528−0.8594=𝟎.𝟎𝟗𝟑𝟒×50=𝟒.𝟔𝟕 𝑃 89.5≤𝑙≤99.5 =𝑃 ≤𝑍≤ = 𝜙 )−𝜙(1.6726 𝜙 )−𝜙( =0.9883−0.9528=𝟎.𝟎𝟑𝟓𝟓×50=𝟏.𝟕𝟖 𝑃 99.5≤𝑙≤109.5 =𝑃 ≤𝑍≤ = 𝜙 )−𝜙(2.2679 𝜙 )−𝜙( =0.9979−0.9883=𝟎.𝟎𝟎𝟗𝟔×50=𝟎.𝟒𝟖 Expected frequencies are given in bold red font. Probabilities in bold blue font.
7
Observed Relative Freq. Theoretical Frequency Empirical Prob. Probability 3 0.06 3 or 4 0.0674 9 0.18 7 or 8 0.1432 15 0.30 10 or 11 0.2156 11 or 12 0.2301 6 0.12 8 or 9 0.1743 4 0.08 4 or 5 0.0934 1 or 2 0.0355 1 0.02 0 or 1 0.0096 50 1.00 0.97 As you can see, 𝛼= 0.97−1 =0.03<0.05 and therefore the normal distribution is a good model. In S2, you will learn about hypothesis testing and more reliable methods of determining whether or not a model is good, based on a statistic such as the mean. The following slide is a foretaste.
8
In our coffee leaf example, the population mean can be estimated by summing the products of the theoretical probabilities and theoretical frequency for each group using the sample data: 0.0674×34.5 = 0.1432×44.5 = 0.2156×55.5 = 0.2301×65.5 = 0.1743×75.5 = 0.0934×85.5 = 0.0355×95.5 = 0.0096×105.5= Total = 61.28 A simple 𝑧 - hypothesis test involves seeing how far the sample mean 𝑥 is from the population mean 𝜇. 𝑧= 𝑥 −𝜇 𝜎 𝑛 𝑧 is called a test statistic and it indicates the difference between two means. where 𝜎 is the standard deviation of the sample and 𝑛 is the size of the sample. 𝜎 𝑛 is the standard error of the mean. In our coffee leaf example, 𝑥 =61.4, 𝜇=61.28, 𝜎=16.8 and 𝑛=50. 𝑧= 𝑥 −𝜇 𝜎 𝑛 = 61.4− = So, 𝜙 =.5201 (also known as 𝑝-value).
9
So 𝑧 is very close to the population mean of 0 and probability 0.5.
In hypothesis testing, a 𝑝-value that is generated from the statistic 𝑧 is significant, only if it is less than the confidence level. Suppose the confidence level is 5%. Then, in the example, 𝑝=0.5>0.05. Therefore we can assume the sample mean is not much different from the population mean, and consequently the data can be modelled by a normal distribution.
10
Example 1. Two friends Sarah and Hannah often go to the Post Office together. They travel on Sarah’s scooter. Sarah always drives Hannah to the Post Office and drops her off there. Sarah then drives around until she is ready to pick Hannah up some time later. Their experience has been that the time Hannah takes in the Post Office can be approximated by a normal distribution with mean 6 minutes and standard deviation 1.3 minutes. How many minutes after having dropped Hannah off should Sarah return if she wants to be at least 95% certain that Hannah will not keep her waiting? Let 𝑇 be the time Hannah takes in the Post Office on a randomly chosen trip. Then 𝑇~𝑁(6, ). Let 𝑡 be the number of minutes after dropping Hannah off when Sarah returns; we then need to find 𝑡 such that 𝑃(𝑇≤𝑡)≥0.95. After standardising, the expression 𝑃(𝑇≤𝑡)≥ becomes 𝑃 𝑍≤ 𝑡− ≥ or 𝜙 𝑡− ≥0.95 So, 𝑡−6 1.3 ≥ 𝜙 − = → 𝑡≥8.1385 Thus, Sarah should not return for at least minutes to be at least 95% sure Hannah will not keep her waiting.
11
Example 2. A biologist has been collecting data on the heights of a particular species of cactus. He has observed that 34.2% of the cacti are below 12 cm in height and 18.4% of the cacti are above 16 cm in height. He assumes that the heights are normally distributed. Find the mean and standard deviation of the distribution. Let the mean and standard deviation of the distribution be 𝜇 and 𝜎 respectively. Then if 𝐻 is the height of a randomly chosen cactus of this species, 𝐻~𝑁(𝜇, 𝜎 2 ). The biologist’s observations can now be written 𝑃 𝐻<12 = and 𝑃 𝐻>16 =0.184. After standardising using 𝑍= 𝐻−𝜇 𝜎 , these equations become: 𝑃 𝑍< 12−𝜇 𝜎 = and 𝑃 𝑍> 16−𝜇 𝜎 =0.184 Letting −𝜇 𝜎 =𝑠 and 16−𝜇 𝜎 =𝑡, we have: 𝑃 𝑍<𝑠 =0.342 and 𝑃 𝑍>𝑡 =0.184 So, 𝜙 𝑠 = and 1−𝜙 𝑡 =0.184
12
Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0. 658, 𝑣=0
Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0.658, 𝑣= and 𝑠=−0.407. Since 1−𝜙 𝑡 =0.184, 𝜙 𝑡 = giving 𝑡=0.9. Therefore, 𝑠= 12−𝜇 𝜎 =− and 𝑡= 16−𝜇 𝜎 =0.9 So we have two equations: 12−𝜇=−0.407𝜎 and 16−𝜇=0.9𝜎 Which we can solve simultaneously to give 𝜇= and 𝜎=3.06.
13
Now do Exercise 9C.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.