Modeling with the normal distribution

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Sample size computations Petter Mostad
Evaluating Hypotheses
Estimating a Population Proportion
Estimation 8.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 13: Inference in Regression
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Introduction to Data Analysis Probability Distributions.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
The Central Limit Theorem © Christine Crisp “Teach A Level Maths” Statistics 1.
Inferential Statistics Part 1 Chapter 8 P
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
© 2010 Pearson Prentice Hall. All rights reserved Chapter Sampling Distributions 8.
Example A population has a mean of 200 and a standard deviation of 50. A random sample of size 100 will be taken and the sample mean x̄ will be used to.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Lesson 18: Sampling Variability and the Effect of Sample Size Students use data from a random sample to estimate a population mean. Students know that.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright © Cengage Learning. All rights reserved. 8 PROBABILITY DISTRIBUTIONS AND STATISTICS.
Yandell – Econ 216 Chap 8-1 Chapter 8 Confidence Interval Estimation.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
TESTING STATISTICAL HYPOTHESES
Inference about a Population Mean
CHAPTER 8 Estimating with Confidence
Chapter Eight Estimation.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Normal Distribution.
Random Variables Random variables assigns a number to each outcome of a random circumstance, or equivalently, a random variable assigns a number to each.
Sampling Distributions
Random Variables Random variables assigns a number to each outcome of a random circumstance, or equivalently, a random variable assigns a number to each.
Lecture Slides Elementary Statistics Twelfth Edition
Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.
Estimating the Population Mean Income of Lexus Owners
Copyright © Cengage Learning. All rights reserved.
Confidence Interval Estimation for a Population Proportion
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 8: Inference for Proportions
Sampling Distributions
Inferences About Means from Two Groups
CONCEPTS OF HYPOTHESIS TESTING
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
CONCEPTS OF ESTIMATION
Copyright © Cengage Learning. All rights reserved.
Daniela Stan Raicu School of CTI, DePaul University
CHAPTER 22: Inference about a Population Proportion
MATH 2311 Section 4.4.

Confidence Interval Estimation
Independent Samples: Comparing Means
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
Lecture 7 Sampling and Sampling Distributions
Hypothesis Testing and Confidence Intervals
Chapter 8: Estimating with Confidence
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 8: Estimating with Confidence
Chapter Outline Inferences About the Difference Between Two Population Means: s 1 and s 2 Known.
Sampling Distributions (§ )
Daniela Stan Raicu School of CTI, DePaul University
8.3 Estimating a Population Mean
Estimating a Population Mean:  Known
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

Modeling with the normal distribution The normal distribution is often used as a model for practical situations. In the example that follows, you need to translate the given information into the language of the normal distribution in order to solve the problem.

The table provides data on the leaf example on page 134 of your textbook: Calculate the mean and standard deviation for the above data. Use the original data on page 133. 𝜇=61.4 𝜎=16.8 𝜎 2 =282.16 Relative Class Relative frequency Length (mm) f Frequency Boundaries Width density 30-39 3 0.060 29.5-39.5 10 0.006 40-49 9 0.180 39.5-49.5 0.018 50-59 15 0.300 49.5-59.5 0.030 60-69 59.5-69.5 70-79 6 0.120 69.5-79.5 0.012 80-89 4 0.080 79.5-89.5 0.008 90-99 89.5-99.5 100-109 1 0.020 99.5-109.5 0.002 Total: 50.000 1.000

𝑓 𝑥 𝑖 𝑝 𝑖 = 𝑓 𝑛 𝑥 𝑖 × 𝑝 𝑖 𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62 𝑥 𝑖 2 𝑥 𝑖 2 × 𝑝 𝑖 1 31.000 0.020 0.62 961.000 19.22 34.000 0.68 1156.000 23.12 39.000 0.78 1521.000 30.42 40.000 0.8 1600.000 32 2 42.000 0.040 1.68 1764.000 70.56 43.000 0.86 1849.000 36.98 44.000 0.88 1936.000 38.72 46.000 0.92 2116.000 42.32 47.000 1.88 2209.000 88.36 48.000 0.96 2304.000 46.08 50.000 2500.000 100 52.000 2.08 2704.000 108.16 53.000 2.12 2809.000 112.36 54.000 2.16 2916.000 116.64 56.000 1.12 3136.000 62.72 5 57.000 0.100 5.7 3249.000 324.9 58.000 1.16 3364.000 67.28 60.000 2.4 3600.000 144 63.000 2.52 3969.000 158.76 66.000 1.32 4356.000 87.12 67.000 2.68 4489.000 179.56 68.000 2.72 4624.000 184.96 70.000 2.8 4900.000 196 72.000 2.88 5184.000 207.36 78.000 1.56 6084.000 121.68 79.000 1.58 6241.000 124.82 83.000 1.66 6889.000 137.78 85.000 3.4 7225.000 289 89.000 1.78 7921.000 158.42 90.000 1.8 8100.000 162 94.000 8836.000 176.72 99.000 1.98 9801.000 196.02 102.000 2.04 10404.000 208.08 50 61.4 4052.120 Total (Mean) 282.160 Var. 16.798 Std. Dev.

Suppose that a random variable 𝐿 with 𝑁 61. 4, 16 Suppose that a random variable 𝐿 with 𝑁 61.4, 16.8 2 distribution, is a model for the grouped frequency table (pg. 134). How can we verify the normal distribution is a good model for the coffee tree leaf lengths? The way to do this is to calculate the total theoretical probability (expected frequency in a normal distribution) and compare it the total relative frequency. If there is a small difference between the theoretical probability and relative frequency (empirical probability), then the 𝑁 61.4, 16.8 2 distribution is a good model. A small difference α<0.05, where α= 𝒑 𝒊 − 𝒇 𝒊 , 𝑝 𝑖 is the theoretical probability and 𝑓 𝑖 is the relative frequency (empirical probability), is usually a good confidence level. Thus, if α<0.05, then we can assume that the normal distribution is a good model.

The expected frequency in the interval 59. 5≤𝑙≤69 The expected frequency in the interval 59.5≤𝑙≤69.5 can be calculated as follows: Given that 𝐿~𝑁 61.4, 16.8 2 , let 𝑍= 𝐿−61.4 16.8 so that 𝑍~𝑁(0,1). 𝑃 59.5≤𝑙≤69.5 =𝑃 59.5−61.4 16.8 ≤𝑍≤ 69.5−61.4 16.8 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.113≤𝑍≤0.482 𝑃 59.5≤𝑙≤69.5 =𝜙 0.482)−𝜙(−0.113 =𝜙 0.482)− 1−𝜙(0.113) 𝜙 0.482)− 1−𝜙(0.113) =0.6851− 1−0.545 =0.2301 This means that the expected frequency for the class 59.5≤𝑙≤69.5 is 50×0.2301=11.5 (correct to 1 decimal place). Therefore, in a group of 50 leaves you would expect about 11 or 12 leaves to have lengths in the class 59.5≤𝑙≤69.5. The observed frequency was actually 9. Does this mean that the 𝑁 61.4, 16.8 2 distribution is a poor model for these data? To answer this question sensibly, you really need to calculate the expected frequencies for all eight classes.

Using the method shown in the previous slide, find the expected frequencies for the remaining seven classes and then review the results to consider whether the 𝑁 61.4, 16.8 2 is a suitable distribution model for these data. 𝑃 29.5≤𝑙≤39.5 =𝑃 −1.8988≤𝑍≤−1.3036 = 𝜙 −1.3036)−𝜙(−1.8988 𝜙 −1.3036)−𝜙(−1.8988 =1−𝜙 1.3036 − 1−𝜙 1.8988 =0.0962−0.0288=𝟎.𝟎𝟔𝟕𝟒×50=𝟑.𝟑𝟕 𝑃 39.5≤𝑙≤49.5 =𝑃 −1.3036≤𝑍−0.7083 = 𝜙 −0.7083)−𝜙(−1.3036 𝜙 −0.7083)−𝜙(−1.3036 =1−𝜙 0.7083 − 1−𝜙 1.3036 =0.2394−0.0962=𝟎.𝟏𝟒𝟑𝟐×50=𝟕.𝟏𝟔 𝑃 49.5≤𝑙≤59.5 =𝑃 −0.7083≤𝑍−0.1131 = 𝜙 −0.1131)−𝜙(−0.7083 𝜙 −0.1131)−𝜙(−0.7083 =1−𝜙 0.1131 − 1−𝜙 0.7083 =0.455−0.2394=𝟎.𝟐𝟏𝟓𝟔×50=𝟏𝟎.𝟕𝟖 𝑃 59.5≤𝑙≤69.5 =𝑃 −0.1131≤𝑍≤0.4821 =𝜙 0.4821)−𝜙(−0.1131 𝜙 0.4821)−𝜙(−0.1131 =𝜙 0.4821 − 1−𝜙 0.1131 =0.6851−0.455=𝟎.𝟐𝟑𝟎𝟏×50=𝟏1.5 𝑃 69.5≤𝑙≤79.5 =𝑃 0.4821≤𝑍≤1.0774 = 𝜙 1.0774)−𝜙(0.4821 𝜙 1.0774)−𝜙(0.4821 =0.8594−0.6851=𝟎.𝟏𝟕𝟒𝟑×50=𝟖.𝟕𝟏𝟓 𝑃 79.5≤𝑙≤89.5 =𝑃 1.0774≤𝑍≤1.6726 = 𝜙 1.6726)−𝜙(1.0774 𝜙 1.6726)−𝜙(1.0774 =0.9528−0.8594=𝟎.𝟎𝟗𝟑𝟒×50=𝟒.𝟔𝟕 𝑃 89.5≤𝑙≤99.5 =𝑃 1.6726≤𝑍≤2.2679 = 𝜙 2.2679)−𝜙(1.6726 𝜙 2.2679)−𝜙(1.6726 =0.9883−0.9528=𝟎.𝟎𝟑𝟓𝟓×50=𝟏.𝟕𝟖 𝑃 99.5≤𝑙≤109.5 =𝑃 2.2679≤𝑍≤2.8631 = 𝜙 2.8631)−𝜙(2.2679 𝜙 2.8631)−𝜙(2.2679 =0.9979−0.9883=𝟎.𝟎𝟎𝟗𝟔×50=𝟎.𝟒𝟖 Expected frequencies are given in bold red font. Probabilities in bold blue font.

Observed Relative Freq. Theoretical Frequency Empirical Prob. Probability 3 0.06 3 or 4 0.0674 9 0.18 7 or 8 0.1432 15 0.30 10 or 11 0.2156 11 or 12 0.2301 6 0.12 8 or 9 0.1743 4 0.08 4 or 5 0.0934 1 or 2 0.0355 1 0.02 0 or 1 0.0096 50 1.00 0.97 As you can see, 𝛼= 0.97−1 =0.03<0.05 and therefore the normal distribution is a good model. In S2, you will learn about hypothesis testing and more reliable methods of determining whether or not a model is good, based on a statistic such as the mean. The following slide is a foretaste.

In our coffee leaf example, the population mean can be estimated by summing the products of the theoretical probabilities and theoretical frequency for each group using the sample data: 0.0674×34.5 = 2.3253 0.1432×44.5 = 6.3724 0.2156×55.5 =11.9658 0.2301×65.5 = 15.0716 0.1743×75.5 =13.1597 0.0934×85.5 = 7.9857 0.0355×95.5 = 3.3903 0.0096×105.5= 1.0128 Total = 61.28 A simple 𝑧 - hypothesis test involves seeing how far the sample mean 𝑥 is from the population mean 𝜇. 𝑧= 𝑥 −𝜇 𝜎 𝑛 𝑧 is called a test statistic and it indicates the difference between two means. where 𝜎 is the standard deviation of the sample and 𝑛 is the size of the sample. 𝜎 𝑛 is the standard error of the mean. In our coffee leaf example, 𝑥 =61.4, 𝜇=61.28, 𝜎=16.8 and 𝑛=50. 𝑧= 𝑥 −𝜇 𝜎 𝑛 = 61.4−61.28 16.8 50 =0.0505 So, 𝜙 0.0505 =.5201 (also known as 𝑝-value).

So 𝑧 is very close to the population mean of 0 and probability 0.5. In hypothesis testing, a 𝑝-value that is generated from the statistic 𝑧 is significant, only if it is less than the confidence level. Suppose the confidence level is 5%. Then, in the example, 𝑝=0.5>0.05. Therefore we can assume the sample mean is not much different from the population mean, and consequently the data can be modelled by a normal distribution.

Example 1. Two friends Sarah and Hannah often go to the Post Office together. They travel on Sarah’s scooter. Sarah always drives Hannah to the Post Office and drops her off there. Sarah then drives around until she is ready to pick Hannah up some time later. Their experience has been that the time Hannah takes in the Post Office can be approximated by a normal distribution with mean 6 minutes and standard deviation 1.3 minutes. How many minutes after having dropped Hannah off should Sarah return if she wants to be at least 95% certain that Hannah will not keep her waiting? Let 𝑇 be the time Hannah takes in the Post Office on a randomly chosen trip. Then 𝑇~𝑁(6, 1.3 2 ). Let 𝑡 be the number of minutes after dropping Hannah off when Sarah returns; we then need to find 𝑡 such that 𝑃(𝑇≤𝑡)≥0.95. After standardising, the expression 𝑃(𝑇≤𝑡)≥0.95 becomes 𝑃 𝑍≤ 𝑡−6 1.3 ≥0.95 or 𝜙 𝑡−6 1.3 ≥0.95 So, 𝑡−6 1.3 ≥ 𝜙 −1 0.95 =1.645 → 𝑡≥8.1385 Thus, Sarah should not return for at least 8.14 minutes to be at least 95% sure Hannah will not keep her waiting.

Example 2. A biologist has been collecting data on the heights of a particular species of cactus. He has observed that 34.2% of the cacti are below 12 cm in height and 18.4% of the cacti are above 16 cm in height. He assumes that the heights are normally distributed. Find the mean and standard deviation of the distribution. Let the mean and standard deviation of the distribution be 𝜇 and 𝜎 respectively. Then if 𝐻 is the height of a randomly chosen cactus of this species, 𝐻~𝑁(𝜇, 𝜎 2 ). The biologist’s observations can now be written 𝑃 𝐻<12 =0.342 and 𝑃 𝐻>16 =0.184. After standardising using 𝑍= 𝐻−𝜇 𝜎 , these equations become: 𝑃 𝑍< 12−𝜇 𝜎 =0.342 and 𝑃 𝑍> 16−𝜇 𝜎 =0.184 Letting 12−𝜇 𝜎 =𝑠 and 16−𝜇 𝜎 =𝑡, we have: 𝑃 𝑍<𝑠 =0.342 and 𝑃 𝑍>𝑡 =0.184 So, 𝜙 𝑠 =0.342 and 1−𝜙 𝑡 =0.184

Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0. 658, 𝑣=0 Using the 𝑧-table, after writing 𝑠=−𝑣, gives 𝜙 𝑣 =0.658, 𝑣=0.407 and 𝑠=−0.407. Since 1−𝜙 𝑡 =0.184, 𝜙 𝑡 =0.816 giving 𝑡=0.9. Therefore, 𝑠= 12−𝜇 𝜎 =−0.407 and 𝑡= 16−𝜇 𝜎 =0.9 So we have two equations: 12−𝜇=−0.407𝜎 and 16−𝜇=0.9𝜎 Which we can solve simultaneously to give 𝜇=13.2 and 𝜎=3.06.

Now do Exercise 9C. 