Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry
Inference About the Description of a Single Population Chapter 11 Inference About the Description of a Single Population
11.1 Introduction In this chapter we utilize the approach developed before for making statistical inference about populations. Identify the parameter to be estimated or tested . Specify the parameter’s estimator and its sampling distribution. Construct an interval estimator or perform a test.
We will develop techniques to estimate and test three population parameters. The expected value m The variance s2 The population proportion p (for qualitative data) Examples A bank conducts a survey to estimate the number of times customer will actually use ATM machines. A random sample of processing times is taken to test the mean production time and the variance of production time on a production line.
11. 2 Inference About a Population. Mean When the Population 11.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown Recall that when s is known is normally distributed If the sample is drawn from a normal population, or if the population is not normal but the sample is sufficiently large. When s is unknown, we use its point estimator s, and the Z statistic is replaced then by the t-statistic
Z Z t t Z t t Z t t Z t t Z t t t t t s s s s s s s s s s When the sampled population is normally distributed, the statistic t is Student t distributed. The “degrees of freedom”, a function of the sample size determines how spread the distribution is (compared to the normal distribution) The t distribution is mound-shaped, and symmetrical around zero. d.f. = n2 d.f. = n1 n1 < n2
Probability calculations for the t distribution The t table provides critical value for various probabilities of interest. The form of the probabilities that appear in table 4 Appendix B are: P(t > tA, d.f.) = A For a given degree of freedom, and for a predetermined right hand tail probability A, the entry in the table is the corresponding tA. These values are used in computing interval estimates and performing hypotheses tests.
A = .05 tA t.100 t.05 t.025 t.01 t.005
Testing the population mean when the population standard deviation is unknown If the population is normally distributed, the test statistic for m when s is unknown is t. This statistic is Student t distributed with n-1 degrees of freedom.
Example 11.1 Trainees productivity In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. Can we conclude that this belief is correct, based on productivity observation of 50 trainees, See file XM11-01.
Solution The problem objective is to describe the population of the number of packages processed in one hour. The data are quantitative. H0:m = 450 H1:m > 450 The t statistic d.f. = n - 1 = 49
Solving by hand The rejection region is t > ta,n - 1 ta,n - 1 = t.05,49 = approximately to 1.676. From the data we have
The test statistic is Rejection region 1.676 1.89 Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
.05 .0323 Since .0323 < .05, we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
Estimating the population mean when the population standard deviation is unknown Confidence interval estimator of m when s is unknown
Example 11.2 An investor is trying to estimate the return on investment in companies that won quality awards last year. A random sample of 50 such companies is selected, and the return on investment is calculated had he invested in them. Construct a 95% confidence interval for the mean return. From the data we determine,
Solution The problem objective is to describe the population of annual returns from buying shares of quality award-winners. The data are quantitative. Solving by hand From the data we determine
Checking the required conditions We need to check that the population is normally distributed, or at least not extremely non-normal. There are statistical methods to test for normality (to be introduced later). Currently, we can plot the histogram of the data set.
A Histogram for XM11- 01 Packages A Histogram for XM11- 02 Returns
11.3 Inference About a Population Variance Some times we are interested in making inference about the variability of processes. Examples: The consistency of a production process for quality control purposes. Investors use variance as a measure of risk. To draw inference about variability, the parameter of interest is s2.
The sample variance s2 is an unbiased, consistent and efficient point estimator for s2. The statistic has a distribution called Chi-squared, if the population is normally distributed. d.f. = 1 d.f. = 10 d.f. = 5
The c2 table A =.01 A A =.01 1 - A =.99 c21-A c2A .990 .010 c2.01,10 = 23.2093 c2.995 c2.990 c2.975 c2.010 c2.005
Estimating the population variance From the following probability statement P(c21-a/2 < c2 < c2a/2) = 1-a we have (by substituting c2 = [(n - 1)s2]/s2.)
Example 11.3 (operation management application) A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). To test this belief a random sample of 25 1-liter fills was taken, and the results recorded. The data are provided in file XM11-03. Do these data support the belief that the variance is less than 1cc at 5% significance level?
Solution The problem objective is to describe the population of 1-liter fills from a filling machine. The data are quantitative, and we are interested in the variability of the fills. The complete test is: H0: s2 = 1 H1: s2 <1 We want to prove that the process is consistent
Solving by hand Note that (n - 1)s2 = S(xi - x)2 = Sxi2 - Sxi/n From the sample (data is presented in units of cc-1000 to avoid rounding) we can calculate Sxi = -3.6, and Sxi2 = 21.3. Then (n - 1)s2 = 21.3 - (-3.6)2/25 = 20.8. The complete test is shown next There is insufficient evidence to reject the hypothesis that the variance is equal to 1cc, in favor of the hypothesis that it is smaller.
Do not reject the null hypothesis a = .05 1-a = .95 Rejection region 13.8484 20.8 Do not reject the null hypothesis
11.4 Inference About a Population Proportion When the population consists of qualitative or categorical data, the only inference we can make is about the proportion of occurrence of a certain value. The parameter “p” was used before to calculate probabilities using the binomial distribution.
Statistic and sampling distribution the statistic employed is Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, with m = p and s2 = p(1 - p)/n.
Test statistic for p Interval estimator for p (1-a confidence level)
Example 11.5 (marketing application) For a new newspaper to be financially viable, it has to capture at least 12% of the Toronto market. In a survey conducted among 400 randomly selected prospective readers, 58 participants indicated they would subscribe to the newspaper if its cost did not exceed $20 a month. Can the publisher conclude that the proposed newspaper will be financially viable at 10% significance level?
Solution The problem objective is to describe the population of newspaper readers in Toronto. The responses to the survey are qualitative. The parameter to be tested is “p”. The hypotheses are: H0: p = .12 H1: p > .12 We want to prove that the newspaper is financially viable
Solving by hand The rejection region is z > za = z.10 = 1.28. The sample proportion is The value of the test statistic is The p-value is = P(Z>1.54) = .0618 There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 10% significance level we can argue that at least 12% of Toronto’s readers will subscribe to the new newspaper.
Example 11.6 (marketing application) In a survey of 2000 TV viewers at 11.40 p.m. on a certain night, 226 indicated they watched “The Tonight Show”. Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use a 95% confidence level. Solution
Selecting the Sample Size to Estimate the Proportion The interval estimator for the proportion is Thus, if we wish to estimate the proportion to within W, we can write The required sample size is
Example Suppose we want to estimate the proportion of customers who prefer our company’s brand to within .03 with 95% confidence. Find the sample size needed to guarantee that this requirement is met. Solution W = .03; 1 - a = .95, therefore a/2 = .025, so z.025 = 1.96 Since the sample has not yet been taken, the sample proportion is still unknown. We proceed using either one of the following two methods:
Method 1: Method 2: There is no knowledge about the value of Let , which results in the largest possible n needed for a 1-a confidence interval. If the sample proportion does not equal .5, the actual W will be narrower than .03. Method 2: There is some idea about the value of Use the value of to calculate the sample size
Inference about the Comparison of Two Populations Chapter 12 Inference about the Comparison of Two Populations
12.1 Introduction Variety of techniques are presented whose objective is to compare two populations. We are interested in: The difference between two means. The ratio of two variances. The difference between two proportions.
12.2 Inference about the Difference b/n Two Means: Independent Samples Two random samples are drawn from the two populations of interest. Because we are interested in the difference between the two means, we shall build the statistic for each sample (and support the analysis by the statistic S2 as well).
The Sampling Distribution of is normally distributed if the (original) population distributions are normal . is approximately normally distributed if the (original) population is not normal, but the sample size is large. Expected value of is m1 - m2 The variance of is s12/n1 + s22/n2
If the sampling distribution of is normal or approximately normal we can write: Z can be used to build a test statistic or a confidence interval for m1 - m2
Practically, the “Z” statistic is hardly used, because the population variances are not known. ? ? S12 S22 Instead, we construct a “t” statistic using the sample “variances” (S12 and S22).
Two cases are considered when producing the t-statistic. The two unknown population variances are equal. The two unknown population variances are not equal.
Case I: The two variances are equal Calculate the pooled variance estimate by: n2 = 15 n1 = 10 Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then,
Build an interval estimate Construct the t-statistic as follows: Perform a hypothesis test H0: m1 - m2 = 0 H1: m1 - m2 > 0; Build an interval estimate or < 0; or 0
Case II: The two variances are unequal
Run a hypothesis test as needed, or, build an interval estimate
Example 12.1 Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. For each person the number of calories consumed at lunch was recorded.
Calories consumed at lunch Solution: The data are quantitative. The parameter to be tested is the difference between two means. The claim to be tested is that mean caloric intake of consumers (m1) is less than that of non-consumers (m2).
Identifying the technique The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) < 0 To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S1 = 64.05, and S2 = 103.29. It appears that the variances are unequal. We run the t - test for unequal variances. (m1 < m2)
Calories consumed at lunch At 5% significance level there is sufficient evidence to reject the null hypothesis.
Solving by hand The interval estimator for the difference between two means is
Example 12.2 Do job design (referring to worker movements) affect worker’s productivity? Two job designs are being considered for the production of a new computer desk. Two samples are randomly and independently selected A sample of 25 workers assembled a desk using design A. A sample of 25 workers assembled the desk using design B. The assembly times were recorded Do the assembly times of the two designs differs?
Assembly times in Minutes Solution The data are quantitative. The parameter of interest is the difference between two population means. The claim to be tested is whether a difference between the two designs exists.
Solving by hand The hypotheses test is: H0: (m1 - m2) = 0 H1: (m1 - m2) 0 To check the relationship between the two variances calculate the value of S1 and S2. We have S1= 0.92, and S2 =1.14. We can infer that the two variances are equal to one another. Let us determine the rejection region To calculate the t-statistic we have:
The rejection region is The test: Since t= 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. Notice the absolute value For a = 0.05 | t | .025 Rejection region .093 2.009
Conclusion: From this experiment, it is unclear at 5% significance level if the two job designs are different in terms of worker’s productivity. .025 Rejection region .093 2.009
The Excel printout Degrees of freedom t - statistic P-value of the one tail test P-value of the two tail test
A 95% confidence interval for m1 - m2 is calculated as follows: Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the interval
Checking the required Conditions for the equal variances case (example 12.2) Design A The distributions are not bell shaped, but they seem to be approximately normal. Since the technique is robust, we can be confident about the results. Design B
12.4 Matched Pairs Experiment What is a matched pair experiment? Why matched pairs experiments are needed? How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.
Example 12.3 To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the following experiment. A pair of newly designed tires are installed on the rear wheels of 20 randomly selected cars. A pair of currently used tires are installed on the rear wheels of another 20 cars. Drivers drive in their usual way until the tires worn out. The number of miles driven by each driver were recorded. See data next.
Solution Compare two populations of quantitative data. The parameter is m1 - m2 The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 m1 Mean distance driven before worn out occurs for the new design tires Mean distance driven before worn out occurs for the existing design tires m2
The hypotheses are H0: m1 - m2 = 0 H1: m1 - m2 > 0 The test statistic is We run the t test, and obtain the following Excel results. We conclude that there is insufficient evidence to reject H0 in favor of H1.
New design Existing design 1 2 3 4 5 6 7 45 60 75 90 105 More New design 2 4 6 8 10 12 45 60 75 90 105 More Existing design While the sample mean of the new design is larger than the sample mean of the existing design, the variability within each sample is large enough for the sample distributions to overlap and cover about the same range. It is therefore difficult to argue that one expected value is different than the other.