STT 315 This lecture is based on Chapter Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their slides.
Normal Distribution 2
Normal random variable A normal random variable X – is a continuous random variable – has a probability distribution which is bell-shaped, i.e., unimodal, symmetric. In many data-sets, the histogram is bell-shaped. These data-sets can be modeled using normal distribution. 3
Normal distribution 4
Computing normal probabilities 5
6 Approximately what percent of U.S. women do you expect to be between 66 in and 67 in tall? Heights of adult women are normally distributed with – mean of 63.6 in, – standard deviation of 2.5 in. Use TI 83/84 Plus. – Press [2nd] & [VARS] (i.e. [DISTR]) – Select 2: normalcdf – Format of command: normalcdf(lower bound, upper bound, mean, std.dev.) For this problem: normalcdf(66, 67, 63.6, 2.5) = i.e. about 8.2% of adult U.S. women have heights between 66 in and 67 in.
7 Approximately what percent of U.S. women do you expect to be less than 64 in tall? Heights of adult women are normally distributed with – mean of 63.6 in, – standard deviation of 2.5 in. Note that here upper bound is 64, but there is no mention of lower bound. So take a very small value for lower bound, say For this problem normalcdf(-1000, 64, 63.6, 2.5) = i.e. about 56.4% of adult U.S. women have heights less than 64 in.
8 Approximately what percent of U.S. women do you expect to be more than 58 in tall? Heights of adult women are normally distributed with – mean of 63.6 in, – standard deviation of 2.5 in. Note that here lower bound is 58, but there is no mention of upper bound. So take a very high value for upper bound, say For this problem normalcdf(58, 1000, 63.6, 2.5) = i.e. 98.7% of adult U.S. women have heights more than 58 in.
9 What about men’s height? Heights of adult men are normally distributed with – mean of 69 in, – standard deviation of 2.8 in. normalcdf(60, 1000, 69, 2.8) = Hence 99.9% adult male will have height more than 60 in. normalcdf(64, 1000, 69, 2.8) = So 96.3% adult male will have height more than 64 in. Thus for U.S. Army height restriction for women is more restrictive compared to men. But for U.S. Marine height restriction for men is more restrictive compared to women.
10 Below what height 80% of U.S. men do have their heights? Heights of adult men are normally distributed with – mean of 69 in, – standard deviation of 2.8 in. The question is to find the height x such that {Percent of men’s height < x } = 80% = 0.8. Use TI 83/84 Plus. – Press [2nd] & [VARS] (i.e. [DISTR]) – Select 3: invNorm – Format of command: invNorm(fraction, mean, std.dev.) For this problem: invNorm(0.8, 69, 2.8) = i.e. 80% of U.S. men have heights less than in.
11 Remark: invNorm invNorm only considers percentage or fraction in the lower tail of normal distribution. For example, suppose the question is “Above what height 10% of U.S. men do have their heights?” Notice here the question is find the height x such that {Percent of men’s height > x } = 10% = 0.1. This means {Percent of men’s height < x } = (100-10)% = 90% = 0.9. For this problem: invNorm(0.9, 69, 2.8) = i.e. 90% of U.S. men have heights less than in, i.e. 10% of U.S. men have heights more than in.
Normal approximation of binomial distribution 12
How large n should be? 13
Example 14
Example 15
Sum of independent random variables 16
Combining Random Variables 17
Example 18
Example Suppose X and Y are two independent random variables with E(X) = 4, V(X) = 2, E(Y) = -3, V(Y) = 4. Then E(3X-2Y) = E(3X) - E(2Y) = 3E(X) - 2E(Y) = 3×4 - 2×(-3) = =
Example Suppose X and Y are two independent random variables with E(X) = 4, V(X) = 2, E(Y) = -3, V(Y) = 4. Then V(3X-2Y) = V(3X) + V(2Y) = 3 2 V(X) V(Y) = 9×2 + 4×4 = = 34. σ(3X-2Y) = std. dev. of (3X-2Y) = √ V(3X-2Y) = √ 34 =
Example 21 Random variables ExpectationsVariances X-42 Y26 Z94
Another Example 22 Random variables ExpectationsStandard deviations X01 Y0.23 Z2.45
Sum of independent normal random variables 23
Example 24
Example 25
Uniform distribution 26
Uniform distribution 27
Example 28
Example 29
Exponential distribution 30
Exponential distribution 31
Example 32
Sampling distributions 33
34 Remember Population is the complete set of all items that we are interested in studying. Parameters are the values we calculate from the population data. Population mean (for quantitative variables), population proportion (categorical variables) etc. are the examples of parameters. A sample is a subset of the population. Statistics are values we compute from sample data. Sample mean, sample proportion etc. are the examples of statistics. Our goal is to make inference on parameters based on relevant statistics.
35 An example Consider a population with 10 individuals with the following smoking habit: Individual #: Smoking habit: NNNNSSNNSN So 3 out of 10 people in the population is smoker. Here the population proportion of smoker is: where S = smoker, and N = non-smoker.
36 An example Suppose we decide to estimate population proportion on the basis of a sample proportion. Suppose simple random samples of size 4 (with replacement) are considered. Individuals selectedSmoking habitSample proportion (2, 4, 4, 9)(N, N, N, S)1/4 = 0.25 (4, 7, 8, 10)(N, N, N, N)0/4 = 0 (5, 6, 8, 8)(S, S, N, N)2/4 = 0.5 Notice that the sample proportion’s value depends on the sample selected, but the population proportion’s value is fixed.
37 Few questions Can we justify the use of sample proportion as an estimator of population proportion? What can we expect about the value of sample proportion when population proportion (p) is 0.3? Does this behavior depend on the value of p? What is the “margin of error”, if we estimate p with sample proportion? (To be answered in a later lecture.) As sample proportion is a variable, what is its distribution?
38 Few questions Does it matter how the sample is selected? Does the sample size matter? Is this a problem of population proportion only? Or do we face it for other parameters also? This is a problem for all parameters, which are fixed in value for a particular population. The value of any statistic changes with the sample selected.
39 Sampling Distribution As any statistic’s value changes with the selected sample, so statistic is a itself a random variable. The probability distribution of a sample statistic is called the sampling distribution of the statistic. In this course we shall study sampling distributions of sample proportion and sample mean.
40 Sampling method and sample size Samples must be independent. Simple random sampling “with replacement” ensures independence. Holds (approximately) also for “without replacement” sampling as long as the sample size is smaller than 10% of the population size. Sample size must be “large enough”. What is “large enough” depends on the statistic we are considering, i.e. different rules of “large enough” for sample proportion and sample mean. It is the “sample size” what is important, NOT what fraction of population is sampled.
41
42 Sampling distribution of sample proportion
43 Sampling distribution of sample proportion
44 Sampling distribution of sample proportion
45 Example One Is the independence condition met? Most likely NO, because the cars moving at the same time may influence each others behavior. Of all the cars on the highway, about 80% exceed the speed limit. If we clock the next 50 cars that pass, what might we expect to find? Suppose we randomly select 50 cars that pass. Is the independence condition met? Yes.
46 Example One Because, np = 50×0.8 = 40 > 9, and n(1-p) = 50×0.2 = 10 > 9. Of all the cars on the highway, about 80% exceed the speed limit. Suppose we randomly select 50 cars that pass. Is sample size large enough condition met? a) Yes b) No
47 Example One Of all the cars on the highway, about 80% exceed the speed limit. Suppose we randomly select 50 cars that pass. What is the expected proportion of cars in the sample to exceed the speed limit? A. 20% B. 80% C. 2.83% D %
48 Example One Of all the cars on the highway, about 80% exceed the speed limit. Suppose we randomly select 50 cars that pass. What is the standard deviation of the sample proportion of cars exceeding the speed limit? A. 20 B. 80 C D
49 Example One Of all the cars on the highway, about 80% exceed the speed limit. Suppose we randomly select 50 cars that pass. What is the chance that more than 90% of cars in the sample exceeded the speed limit? A B C D E normalcdf(0.9,100,0.8,0.057) =
50
51 Sampling distribution of sample mean
52 Central Limit Theorem (CLT)
53 Example Two What is the expected value of sample mean? A.34 lb B.7.2 lb C.7.8 lb D.2.1 lb At birth, babies average 7.8 pounds, with a standard deviation of 2.1 pounds. A random sample of 34 babies born to mothers living near a factory that might be polluting the air and water shows a mean birth-weight of only 7.2 pounds.
54 Example Two What is the standard deviation of sample mean? A.1.23 lb B.7.2 lb C.0.36 lb D.2.1 lb At birth, babies average 7.8 pounds, with a standard deviation of 2.1 pounds. A random sample of 34 babies born to mothers living near a factory that might be polluting the air and water shows a mean birth-weight of only 7.2 pounds.
55 Example Two What is the chance that the sample mean is lower than 7.2 lbs? A B C D At birth, babies average 7.8 pounds, with a standard deviation of 2.1 pounds. A random sample of 34 babies born to mothers living near a factory that might be polluting the air and water shows a mean birth-weight of only 7.2 pounds. normalcdf(-100, 7.2, 7.8, 0.36) =
Example 56
Example 57