McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. Sampling Distributions and Estimation (Part 1) Chapter88 Sampling Variation Estimators and Sampling Distributions Sample Mean and the Central Limit Theorem Confidence Interval for a Mean ( ) with Known Confidence Interval for a Mean ( ) with Unknown Confidence Interval for a Proportion ( )
8A-2 Sampling Variation Sample statistic – a random variable whose value depends on which population items happen to be included in the random sample.Sample statistic – a random variable whose value depends on which population items happen to be included in the random sample. Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population.Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population. This sampling variation can easily be illustrated.This sampling variation can easily be illustrated.
8A-3 Sampling Variation Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants.Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants. The sample means ( x i ) tend to be close to the population mean ( = ).The sample means ( x i ) tend to be close to the population mean ( = ).
8A-4 Sampling Variation The dot plots show that the sample means have much less variation than the individual sample items.The dot plots show that the sample means have much less variation than the individual sample items.
Sampling Variation 8A-5
8A-6 Estimators and Sampling Distributions Estimator – a statistic derived from a sample to infer the value of a population parameter.Estimator – a statistic derived from a sample to infer the value of a population parameter. Estimate – the value of the estimator in a particular sample.Estimate – the value of the estimator in a particular sample. Population parameters are represented by Greek letters and the corresponding statistic by Roman letters.Population parameters are represented by Greek letters and the corresponding statistic by Roman letters. Some Terminology Some Terminology
8A-7 Estimators and Sampling Distributions Examples of Estimators Examples of Estimators
8A-8 Estimators and Sampling Distributions The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken.The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. An estimator is a random variable since samples vary.An estimator is a random variable since samples vary. Sampling Distributions Sampling Distributions Sampling error = – Sampling error = – ^
8A-9 Estimators and Sampling Distributions Bias is the difference between the expected value of the estimator and the true parameter.Bias is the difference between the expected value of the estimator and the true parameter. Bias Bias Bias = E( ) – Bias = E( ) – ^ An estimator is unbiased if E( ) = An estimator is unbiased if E( ) = ^ On average, an unbiased estimator neither overstates nor understates the true parameter.On average, an unbiased estimator neither overstates nor understates the true parameter.
8A-10 Estimators and Sampling Distributions Sampling error is random whereas bias is systematic.Sampling error is random whereas bias is systematic. Bias Bias An unbiased estimator avoids systematic error.An unbiased estimator avoids systematic error. Figure 8.4
8A-11 Estimators and Sampling Distributions
8A-12 Estimators and Sampling Distributions Efficiency refers to the variance of the estimator’s sampling distribution.Efficiency refers to the variance of the estimator’s sampling distribution. A more efficient estimator has smaller variance.A more efficient estimator has smaller variance. Efficiency Efficiency Figure 8.5
8A-13 Estimators and Sampling Distributions A consistent estimator converges toward the parameter being estimated as the sample sizeA consistent estimator converges toward the parameter being estimated as the sample sizeincreases. Consistency Consistency Figure 8.6
8A-14 Sample Mean and the Central Limit Theorem If a random sample of size n is drawn from a population with mean and standard deviation , the distribution of the sample mean x approaches a normal distribution with mean and standard deviation x = / n as the sample size increase.If a random sample of size n is drawn from a population with mean and standard deviation , the distribution of the sample mean x approaches a normal distribution with mean and standard deviation x = / n as the sample size increase. If the population is normal, the distribution of the sample mean is normal regardless of sample size.If the population is normal, the distribution of the sample mean is normal regardless of sample size. Central Limit Theorem (CLT) for a Mean Central Limit Theorem (CLT) for a Mean
8A-15 Sample Mean and the Central Limit Theorem If the population is exactly normal, then the sample mean follows a normal distribution.If the population is exactly normal, then the sample mean follows a normal distribution.
8A-16 Sample Mean and the Central Limit Theorem As the sample size n increases, the distribution of sample means narrows in on the population mean µ.As the sample size n increases, the distribution of sample means narrows in on the population mean µ.
8A-17 Sample Mean and the Central Limit Theorem If the sample is large enough, the sample means will have approximately a normal distribution even if your population is not normal.If the sample is large enough, the sample means will have approximately a normal distribution even if your population is not normal.
8A-18 Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem Illustrations of Central Limit Theorem
8A-19 Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem Illustrations of Central Limit Theorem Symmetric population Symmetric population
8A-20 Sample Mean and the Central Limit Theorem Illustrations of Central Limit Theorem Illustrations of Central Limit Theorem Skewed population Skewed population
8A-21 Sample Mean and the Central Limit Theorem Example - Bottle Filling: Variation in X Example - Bottle Filling: Variation in X
8A-22 Make the interval small by increasing n. + z + z n Sample Mean and the Central Limit Theorem The standard error declines as n increases, but at a decreasing rate.The standard error declines as n increases, but at a decreasing rate. Sample Size and Standard Error Sample Size and Standard Error The distribution of sample means collapses at the true population mean as n increases.
8A-23 Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}.Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}. The population parameters are: = 1.5, = 1.118The population parameters are: = 1.5, = Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population Illustration: All Possible Samples from a Uniform Population
8A-24 Sample Mean and the Central Limit Theorem All possible samples of size n = 2, with replacement, are given below along with their means.All possible samples of size n = 2, with replacement, are given below along with their means. Illustration: All Possible Samples from a Uniform Population Illustration: All Possible Samples from a Uniform Population
8A-25 Sample Mean and the Central Limit Theorem The population is uniform, yet the distribution of all possible sample means has a peaked triangular shape.The population is uniform, yet the distribution of all possible sample means has a peaked triangular shape. Illustration: All Possible Samples from a Uniform Population Illustration: All Possible Samples from a Uniform Population
8A-26 Sample Mean and the Central Limit Theorem The CLT’s predictions for the mean and standard error areThe CLT’s predictions for the mean and standard error are Illustration: All Possible Samples from a Uniform Population Illustration: All Possible Samples from a Uniform Population x = = 1.5 and x = / n = 1.118/ 2 =
8A-27 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population Illustration: All Possible Samples from a Uniform Population x the mean of means isx the mean of means is x = 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = The standard deviation of the means is The standard deviation of the means is
8A-28 Confidence Interval for a Mean ( ) with Known A sample mean x is a point estimate of the population mean .A sample mean x is a point estimate of the population mean . What is a Confidence Interval? What is a Confidence Interval? A confidence interval for the mean is a range lower < < upperA confidence interval for the mean is a range lower < < upper The confidence level is the probability that the confidence interval contains the true population mean.The confidence level is the probability that the confidence interval contains the true population mean. The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution.The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution.
8A-29 Confidence Interval for a Mean ( ) with Known What is a Confidence Interval? What is a Confidence Interval? The confidence interval for with known is:The confidence interval for with known is:
8A-30 Confidence Interval for a Mean ( ) with Known A higher confidence level leads to a wider confidence interval.A higher confidence level leads to a wider confidence interval. Choosing a Confidence Level Choosing a Confidence Level Greater confidence implies loss of precision.Greater confidence implies loss of precision. 95% confidence is most often used.95% confidence is most often used.
8A-31 Confidence Interval for a Mean ( ) with Known A confidence interval either does or does not contain .A confidence interval either does or does not contain . The confidence level quantifies the risk.The confidence level quantifies the risk. Out of 100 confidence intervals, approximately 95% would contain , while approximately 5% would not contain .Out of 100 confidence intervals, approximately 95% would contain , while approximately 5% would not contain . Interpretation Interpretation
8A-32 Confidence Interval for a Mean ( ) with Known Yes, but not very often.Yes, but not very often. In quality control applications with ongoing manufacturing processes, assume stays the same over time.In quality control applications with ongoing manufacturing processes, assume stays the same over time. In this case, confidence intervals are used to construct control charts to track the mean of a process over time.In this case, confidence intervals are used to construct control charts to track the mean of a process over time. Is Ever Known? Is Ever Known?
8A-33 Confidence Interval for a Mean ( ) with Unknown Use the Student’s t distribution instead of the normal distribution when the population is normal but the standard deviation is unknown and the sample size is small.Use the Student’s t distribution instead of the normal distribution when the population is normal but the standard deviation is unknown and the sample size is small. Student’s t Distribution Student’s t Distribution x + tx + tx + tx + t snsnsnsn The confidence interval for (unknown ) isThe confidence interval for (unknown ) is x - t snsnsnsn x + t snsnsnsn < <
8A-34 Confidence Interval for a Mean ( ) with Unknown Student’s t Distribution Student’s t Distribution
8A-35 Confidence Interval for a Mean ( ) with Unknown Student’s t Distribution Student’s t Distribution t distributions are symmetric and shaped like the standard normal distribution.t distributions are symmetric and shaped like the standard normal distribution. The t distribution is dependent on the size of the sample.The t distribution is dependent on the size of the sample. Figure 8.11
8A-36 Confidence Interval for a Mean ( ) with Unknown Degrees of Freedom Degrees of Freedom Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to determine the value of the t statistic.Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to determine the value of the t statistic. Degrees of freedom tell how many observations are used to calculate , less the number of intermediate estimates used in the calculation.Degrees of freedom tell how many observations are used to calculate , less the number of intermediate estimates used in the calculation. = n - 1 = n - 1
8A-37 Confidence Interval for a Mean ( ) with Unknown Degrees of Freedom Degrees of Freedom As n increases, the t distribution approaches the shape of the normal distribution.As n increases, the t distribution approaches the shape of the normal distribution. For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used.For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used.
8A-38 Confidence Interval for a Mean ( ) with Unknown Comparison of z and t Comparison of z and t For very small samples, t-values differ substantially from the normal.For very small samples, t-values differ substantially from the normal. As degrees of freedom increase, the t- values approach the normal z-values.As degrees of freedom increase, the t- values approach the normal z-values. For example, for n = 31, the degrees of freedom are:For example, for n = 31, the degrees of freedom are: What would the t-value be for a 90% confidence interval?What would the t-value be for a 90% confidence interval? = 31 – 1 = 30 = 31 – 1 = 30
8A-39 Confidence Interval for a Mean ( ) with Unknown Comparison of z and t Comparison of z and t For = 30, the corresponding z-value is
8A-40 Confidence Interval for a Mean ( ) with Unknown Example GMAT Scores Again Example GMAT Scores Again Here are the GMAT scores from 20 applicants to an MBA program:Here are the GMAT scores from 20 applicants to an MBA program: Figure 8.13
8A-41 Confidence Interval for a Mean ( ) with Unknown Example GMAT Scores Again Example GMAT Scores Again Construct a 90% confidence interval for the mean GMAT score of all MBA applicants.Construct a 90% confidence interval for the mean GMAT score of all MBA applicants. x = 510 s = Since is unknown, use the Student’s t for the confidence interval with = 20 – 1 = 19 d.f.Since is unknown, use the Student’s t for the confidence interval with = 20 – 1 = 19 d.f. First find t 0.90 from Appendix D.First find t 0.90 from Appendix D.
8A-42 Confidence Interval for a Mean ( ) with Unknown For a 90% confidence interval, use Appendix D to find t 0.05 = 1.729For a 90% confidence interval, use Appendix D to find t 0.05 = 1.729
8A-43 Confidence Interval for a Mean ( ) with Unknown Example GMAT Scores Again Example GMAT Scores Again The 90% confidence interval is:The 90% confidence interval is: x - t snsn x + t snsn < < 513 – 20 < < – < < We are 90% certain that the true mean GMAT score is within the interval < < We are 90% certain that the true mean GMAT score is within the interval < <
8A-44 Confidence Interval for a Mean ( ) with Unknown Confidence Interval Width Confidence Interval Width Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation.Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. To obtain a narrower interval and more precision - increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence)To obtain a narrower interval and more precision - increase the sample size or - lower the confidence level (e.g., from 90% to 80% confidence)
8A-45 Confidence Interval for a Mean ( ) with Unknown A “Good” Sample A “Good” Sample Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs.Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs.
8A-46 Confidence Interval for a Mean ( ) with Unknown A “Good” Sample A “Good” Sample An examination of the samples shows that sample 5 has an outlier.An examination of the samples shows that sample 5 has an outlier. The outlier is a warning that the resulting confidence interval possibly could not be trusted.The outlier is a warning that the resulting confidence interval possibly could not be trusted. In this case, a larger sample size is needed.In this case, a larger sample size is needed. Figure 8.15
8A-47 Confidence Interval for a Mean ( ) with Unknown Using Appendix D Using Appendix D Beyond = 50, Appendix D shows in steps of 5 or 10.Beyond = 50, Appendix D shows in steps of 5 or 10. If the table does not give the exact degrees of freedom, use the t-value for the next lower.If the table does not give the exact degrees of freedom, use the t-value for the next lower. This is a conservative procedure since it causes the interval to be slightly wider.This is a conservative procedure since it causes the interval to be slightly wider. For d.f. above 150, use the z-value.For d.f. above 150, use the z-value.
8A-48 Confidence Interval for a Mean ( ) with Unknown Using Excel Using Excel Use Excel’s function =TINV(probability, d.f.) to obtain a two-tailed value of t. Here, “probability” is 1 minus the confidence level.Use Excel’s function =TINV(probability, d.f.) to obtain a two-tailed value of t. Here, “probability” is 1 minus the confidence level. Figure 8.17
8A-49 Confidence Interval for a Mean ( ) with Unknown Using MegaStat Using MegaStat MegaStat give you a choice of z or t and does all calculations for you.MegaStat give you a choice of z or t and does all calculations for you. Figure 8.18
8A-50 Confidence Interval for a Mean ( ) with Unknown Using MINITAB Using MINITAB MINITAB also gives confidence intervals for the median and standard deviation.MINITAB also gives confidence intervals for the median and standard deviation. Figure 8.19
8A-51 Confidence Interval for a Proportion ( ) A proportion is a mean of data whose only value is 0 or 1.A proportion is a mean of data whose only value is 0 or 1. The Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x/n approaches a normal distribution with mean and standard deviationThe Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x/n approaches a normal distribution with mean and standard deviation p = x/n is a consistent estimator of .p = x/n is a consistent estimator of . p =p =p =p = (1- ) n
8A-52 Confidence Interval for a Proportion ( ) Management of the Pan-Asian Hotel System tracks the percent of hotel reservations made over the Internet.Management of the Pan-Asian Hotel System tracks the percent of hotel reservations made over the Internet. The binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the InternetThe binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the Internet After data was collected, it was determined that the proportion of Internet reservations is =.20.After data was collected, it was determined that the proportion of Internet reservations is =.20. Illustration: Internet Hotel Reservations Illustration: Internet Hotel Reservations
8A-53 Confidence Interval for a Proportion ( ) Here are five random samples of n = 20. Each p is a point estimate of .Here are five random samples of n = 20. Each p is a point estimate of . Illustration: Internet Hotel Reservations Illustration: Internet Hotel Reservations Notice the sampling variation in the value of p.Notice the sampling variation in the value of p.
8A-54 Confidence Interval for a Proportion ( ) The distribution of a sample proportion p = x/n is symmetric if =.50 and regardless of , approaches symmetry as n increases.The distribution of a sample proportion p = x/n is symmetric if =.50 and regardless of , approaches symmetry as n increases. Applying the CLT Applying the CLT
8A-55 Confidence Interval for a Proportion ( ) As n increases, the statistic p = x/n more closely resembles a continuous random variable.As n increases, the statistic p = x/n more closely resembles a continuous random variable. As n increases, the distribution becomes more symmetric and bell shaped.As n increases, the distribution becomes more symmetric and bell shaped. As n increases, the range of the sample proportion p = x/n narrows.As n increases, the range of the sample proportion p = x/n narrows. The sampling variation can be reduced by increasing the sample size n.The sampling variation can be reduced by increasing the sample size n. Applying the CLT Applying the CLT
8A-56 Confidence Interval for a Proportion ( ) Rule of Thumb: The sample proportion p = x/n may be assumed to be normal if bothRule of Thumb: The sample proportion p = x/n may be assumed to be normal if both n > 10 and n(1- ) > 10. n > 10 and n(1- ) > 10. When is it Safe to Assume Normality? When is it Safe to Assume Normality? Sample size to assume normality: Table 8.9
8A-57 Confidence Interval for a Proportion ( ) The standard error of the proportion p depends on , as well as n.The standard error of the proportion p depends on , as well as n. It is largest when is near.50 and smaller when is near 0 or 1.It is largest when is near.50 and smaller when is near 0 or 1. Standard Error of the Proportion Standard Error of the Proportion
8A-58 Confidence Interval for a Proportion ( ) The formula for the standard error is symmetric.The formula for the standard error is symmetric. Standard Error of the Proportion Standard Error of the Proportion Figure 8.22
8A-59 Confidence Interval for a Proportion ( ) Enlarging n reduces the standard error p but at a diminishing rate.Enlarging n reduces the standard error p but at a diminishing rate. Standard Error of the Proportion Standard Error of the Proportion Figure 8.23
8A-60 Confidence Interval for a Proportion ( ) The confidence interval for isThe confidence interval for is Confidence Interval for Confidence Interval for (1- ) n + z + z Since is unknown, the confidence interval for p = x/n (assuming a large sample) isSince is unknown, the confidence interval for p = x/n (assuming a large sample) is p(1-p) n p + zp + zp + zp + z Where z is based on the desired confidence.
8A-61 Confidence Interval for a Proportion ( ) z can be chosen for any confidence level. For example,z can be chosen for any confidence level. For example, Confidence Interval for Confidence Interval for
8A-62 Confidence Interval for a Proportion ( ) A sample of 75 retail in-store purchases showed that 24 were paid in cash. What is p?A sample of 75 retail in-store purchases showed that 24 were paid in cash. What is p? Example Auditing Example Auditing p = x/n = 24/75 =.32 Is p normally distributed?Is p normally distributed? np = (75)(.32) = 24 n(1-p) = (75)(.88) = 51 Both are > 10, so we may conclude normality.
8A-63 Confidence Interval for a Proportion ( ) The 95% confidence interval for the proportion of retail in-store purchases that are paid in cash is:The 95% confidence interval for the proportion of retail in-store purchases that are paid in cash is: Example Auditing Example Auditing p(1-p) n p + zp + z = .32(1-.32) = < <.426 We are 95% confident that this interval contains the true population proportion.We are 95% confident that this interval contains the true population proportion.
8A-64 Confidence Interval for a Proportion ( ) The width of the confidence interval for depends on - the sample size - the confidence level - the sample proportion pThe width of the confidence interval for depends on - the sample size - the confidence level - the sample proportion p To obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence levelTo obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence level Narrowing the Interval Narrowing the Interval
8A-65 Confidence Interval for a Proportion ( ) To find a confidence interval for a proportion in Excel, use (for example)To find a confidence interval for a proportion in Excel, use (for example)=0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) =0.15+NORMSINV(.95)*SQRT(0.15*( )/200) Using Excel and MegaStat Using Excel and MegaStat
8A-66 Confidence Interval for a Proportion ( ) In MegaStat, enter p and n to obtain the confidence interval for a proportion.In MegaStat, enter p and n to obtain the confidence interval for a proportion. Using Excel and MegaStat Using Excel and MegaStat MegaStat always assumes normality.MegaStat always assumes normality. Figure 8.23
8A-67 Confidence Interval for a Proportion ( ) If the sample is small, the distribution of p may not be well approximated by the normal.If the sample is small, the distribution of p may not be well approximated by the normal. Confidence limits around p can be constructed by using the binomial distribution.Confidence limits around p can be constructed by using the binomial distribution. Using Excel and MegaStat Using Excel and MegaStat Figure 8.24
8A-68 Confidence Interval for a Proportion ( ) In polls and surveys, the confidence interval width when =.5 is called the margin of error.In polls and surveys, the confidence interval width when =.5 is called the margin of error. Below are some margins of error for 95% confidence interval assuming =.50.Below are some margins of error for 95% confidence interval assuming =.50. Polls and Margin of Error Polls and Margin of Error Each reduction in the margin of error requires a disproportionately larger sample size.Each reduction in the margin of error requires a disproportionately larger sample size.
8A-69 Confidence Interval for a Proportion ( ) If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n.If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n. Rule of Three Rule of Three A Very Quick Rule (VQR) for a 95% confidence interval when p is near.50 isA Very Quick Rule (VQR) for a 95% confidence interval when p is near.50 is Very Quick Rule Very Quick Rule p + 1/ n
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. Applied Statistics in Business & Economics End of Chapter 8A 8A-70