1 Inference about Comparing Two Populations Chapter 13.

Slides:



Advertisements
Similar presentations
1 Selected Sections of Chapters 22 and 24 Confidence Intervals for p 1 - p 2 and µ 1 - µ 2.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
1 Chapter 12 Inference About One Population Introduction In this chapter we utilize the approach developed before to describe a population.In.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Lecture 3 Miscellaneous details about hypothesis testing Type II error
Sampling Distributions
Introduction to Hypothesis Testing
1 Statistical Inference: A Review of Chapters 12 and 13 Chapter 14.
Chapter 9 Chapter 10 Chapter 11 Chapter 12
Lecture 4 Chapter 11 wrap-up
Announcements Homework 2 will be posted on the web by tonight.
Inferences On Two Samples
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 13 Inference About Comparing Two Populations.
Chapter 16 Chi Squared Tests.
Announcements Extra office hours this week: Thursday, 12-12:45. The midterm will cover through Section I will spend half of Thursday’s class going.
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)
1 Inference about Comparing Two Populations Chapter 13.
1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter.
Lecture 9 Inference about the ratio of two variances (Chapter 13.5)
Inferences About Process Quality
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
5-3 Inference on the Means of Two Populations, Variances Unknown
Economics 173 Business Statistics Lecture 9 Fall, 2001 Professor J. Petry
Economics 173 Business Statistics Lecture 8 Fall, 2001 Professor J. Petry
Statistical Inference for Two Samples
1 Economics 173 Business Statistics Lectures 3 & 4 Summer, 2001 Professor J. Petry.
Ch 10 Comparing Two Proportions Target Goal: I can determine the significance of a two sample proportion. 10.1b h.w: pg 623: 15, 17, 21, 23.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Analysis of Variance Chapter 12 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested.
Economics 173 Business Statistics Lecture 6 Fall, 2001 Professor J. Petry
STATISTICAL INFERENCE PART VIII HYPOTHESIS TESTING - APPLICATIONS – TWO POPULATION TESTS 1.
Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry
Chapter 13 Inference About Comparing Two Populations.
1 Inference about Two Populations Chapter Introduction Variety of techniques are presented to compare two populations. We are interested in:
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Chapter 13 Inference About Comparing Two Populations.
Example (which tire lasts longer?) To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the.
Hypothesis and Test Procedures A statistical test of hypothesis consist of : 1. The Null hypothesis, 2. The Alternative hypothesis, 3. The test statistic.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
1 Nonparametric Statistical Techniques Chapter 17.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 8 Matched Pairs Review –Summary –The Flow approach to problem solving –Example.
1 Inference about Two Populations Chapter Introduction Variety of techniques are presented whose objective is to compare two populations. We.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
1 Confidence Intervals for Two Proportions Section 6.1.
Confidence Intervals for µ 1 - µ 2 and p 1 - p 2 1.
Chapter 13: Inferences about Comparing Two Populations Lecture 8b Date: 15 th November 2015 Instructor: Naveen Abedin.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry.
STATISTICAL INFERENCE PART VI HYPOTHESIS TESTING 1.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Chapter 12 Inference About One Population. We shall develop techniques to estimate and test three population parameters.  Population mean   Population.
1 Nonparametric Statistical Techniques Chapter 18.
Comparing Means: Confidence Intervals and Hypotheses Tests for the Difference between Two Population Means µ 1 - µ 2 Chapter 22 Part 2 CI HT for m 1 -
Economics 173 Business Statistics
Confidence Intervals for p1 - p2 and µ1 - µ2
Inference about Two Populations
Inference about Comparing Two Populations
Towson University - J. Jung
Elementary Statistics
Inference About Comparing Two Populations
Chapter 13: Inferences about Comparing Two Populations Lecture 7a
Presentation transcript:

1 Inference about Comparing Two Populations Chapter 13

Introduction Variety of techniques are presented whose objective is to compare two populations. We are interested in: –The difference between two means. –The ratio of two variances. –The difference between two proportions.

3 Two random samples are drawn from the two populations of interest. Because we compare two population means, we use the statistic. 13.2Inference about the Difference between Two Means: Independent Samples

4 · is normally distributed if the (original) population distributions are normal. · is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30).  The expected value of is  1 -  2  The variance of is  1 2 / n 1 +  2 2 / n 2 The Sampling Distribution of

5 If the sampling distribution of is normal or approximately normal we can write: Z can be used to build a test statistic or a confidence interval for  1 -  2 Making an inference about    –  

6 Practically, the “Z” statistic is hardly used, because the population variances are not known. ? ? Instead, we construct a “t” statistic using the sample “variances” (S 1 2 and S 2 2 ). S22S22 S12S12 t Making an inference about    –  

7 Two cases are considered when producing the t-statistic. –The two unknown population variances are equal. –The two unknown population variances are not equal. Making an inference about    –  

8 Inference about    –   : Equal variances Example: S 1 2 = 25; S 2 2 = 30; n 1 = 10; n 2 = 15. Then, Calculate the pooled variance estimate by: n 2 = 15 n 1 = 10 The pooled Variance estimator

9 Inference about    –   : Equal variances Example: S 1 2 = 25; S 2 2 = 30; n 1 = 10; n 2 = 15. Then, Calculate the pooled variance estimate by: n 2 = 15 n 1 = 10 The pooled Variance estimator

10 Inference about    –   : Equal variances Construct the t-statistic as follows: Perform a hypothesis test H 0 :     = 0 H a :     > 0; or < 0;or 0 Build a confidence interval

11 Inference about    –   : Unequal variances

12 Inference about    –   : Unequal variances Run a hypothesis test as needed, or, build a confidence interval

13 Which case to use: Equal variance or unequal variance? Whenever there is sufficient evidence that the variances are unequal, it is preferable to run the equal variances t-test. This is so, because for any two given samples The number of degrees of freedom for the equal variances case The number of degrees of freedom for the unequal variances case 

14

15 Example 13.1 –Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? –A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. –For each person the number of calories consumed at lunch was recorded. Example: Making an inference about    –  

16 Solution: The data are quantitative. The parameter to be tested is the difference between two means. The claim to be tested is: The mean caloric intake of consumers (  1 ) is less than that of non-consumers (  2 ). Example: Making an inference about    –  

17 The hypotheses are: H 0 : (  1 -  2 ) = 0 H 1 : (  1 -  2 ) < 0 – – To check the relationships between the variances, we use a computer output to find the sample variances (Xm13-1.xls). We have S 1 2 = 4103, and S 2 2 = 10,670.Xm13-1.xls – – It appears that the variances are unequal. Example: Making an inference about    –  

18 Solving by hand –From the data we have: Example: Making an inference about    –  

19 Solving by hand –The rejection region is t < -t  = -t.05,123  Example: Making an inference about    –  

20 Example: Making an inference about    –   At 5% significance level there is sufficient evidence to reject the null hypothesis < Xm13-1.xls <.05

21 Solving by hand The confidence interval estimator for the difference between two means is Example: Making an inference about    –  

22

23 Example 13.2 –An ergonomic chair can be assembled using two different sets of operations (Method A and Method B) –The operations manager would like to know whether the assembly time under the two methods differ. Example: Making an inference about    –  

24 Example 13.2 –Two samples are randomly and independently selected A sample of 25 workers assembled the chair using design A. A sample of 25 workers assembled the chair using design B. The assembly times were recorded –Do the assembly times of the two methods differs? Example: Making an inference about    –  

25 Example: Making an inference about    –   Assembly times in Minutes Solution The data are quantitative. The parameter of interest is the difference between two population means. The claim to be tested is whether a difference between the two designs exists.

26 Example: Making an inference about    –   Solving by hand – –The hypotheses test is: H 0 : (  1 -  2 )  0 H 1 : (  1 -  2 )  0 – –To check the relationship between the two variances we calculate the value of S 1 2 and S 2 2 (Xm13-02.xls).Xm13-02.xls – – We have S 1 2 = , and S 2 2 = – – The two variances appear to be equal.

27 Example: Making an inference about    –   Solving by hand – – To calculate the t-statistic we have:

28 The rejection region is t t  = t. 025,48 = The test: Since t= < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. For  = Rejection region Example: Making an inference about    –  

29 Example: Making an inference about    –   > <.9273 < Xm13-02.xls

30 Conclusion: From this experiment, it is unclear at 5% significance level if the two assembly methods are different in terms of assembly time Example: Making an inference about    –  

31 Example: Making an inference about    –   A 95% confidence interval for  1 -  2 is calculated as follows: Thus, at 95% confidence level <  1 -  2 < Notice: “Zero” is included in the confidence interval

32 Checking the required Conditions for the equal variances case (example 13.2) The data appear to be approximately normal Design A Design B Testing Normality Testing Normality Testing  1 -  2, Testing  1 -  2, non-normal populations non-normal populations

Matched Pairs Experiment What is a matched pair experiment? Why matched pairs experiments are needed? How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.

34

35 Example 13.3 –To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. –Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. –Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. –From the data, can we infer that finance majors obtain higher salary offers than do marketing majors among MBAs? Matched Pairs Experiment Additional example

36 Solution –Compare two populations of quantitative data. –The parameter tested is  1 -  2 11 22 The mean of the highest salary offered to Finance MBAs The mean of the highest salary offered to Marketing MBAs – – H 0 : (  1 -  2 ) = 0 H 1 : (  1 -  2 ) > Matched Pairs Experiment

37 Solution – continued From Xm13-3.xls we have:Xm13-3.xls Let us assume equal variances 13.4 Matched Pairs Experiment There is insufficient evidence to conclude that Finance MBAs are offered higher salaries than marketing MBAs.

38 Question –The difference between the sample means is – = 5,201. –So, why could not we reject H 0 and favor H 1 where (  1 –  2 > 0)? The effect of a large sample variability

39 Answer: –S p 2 is large (because the sample variances are large) S p 2 = 311,330,926. –A large variance reduces the value of the t statistic and it becomes more difficult to reject H 0. The effect of a large sample variability

40 Reducing the variability The values each sample consists of might markedly vary... The range of observations sample B The range of observations sample A

41...but the differences between pairs of observations might be quite close to one another, resulting in a small variability of the differences. 0 Differences The range of the differences Reducing the variability

42 The matched pairs experiment Since the difference of the means is equal to the mean of the differences we can rewrite the hypotheses in terms of  D (the mean of the differences) rather in terms of  1 –  2. This formulation has the benefit of a smaller variability. Group 1 Group 2 Difference Mean1 =12.5 Mean2 =11.5 Mean1 – Mean2 = 1 Mean Differences = 1

43 Example 12.4 (12.3 part II) –It was suspected that salary offers were affected by students’ GPA, (which caused S 1 2 and S 2 2 to increase). –To reduce this variability, the following procedure was used: 25 ranges of GPAs were predetermined. Students from each major were randomly selected, one from each GPA range. The highest salary offer for each student was recorded. –From the data presented can we conclude that Finance majors are offered higher salaries? The matched pairs experiment

44 Solution (by hand) –The parameter tested is  D (=  1 –  2 ) –The hypotheses: H 0 :  D = 0 H 1 :  D > 0 –The t statistic: Finance Marketing The matched pairs hypothesis test Degrees of freedom = n D – 1 The rejection region is t > t.05,25-1 = 1.711

45 Solution (by hand) – continue –From the data (Xm13-4.xls) calculate:Xm13-4.xls The matched pairs hypothesis test

46 Solution (by hand) – continue –Calculate t The matched pairs hypothesis test

47 Recall: The rejection region is t > t . Indeed, > <.05 The matched pairs hypothesis test Xm13-4.xls

48 Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than this of the Marketing MBAs. The matched pairs hypothesis test

49 The matched pairs mean difference estimation

50 The matched pairs mean difference estimation Using Data Analysis Plus Xm13-4.xls First calculate the differences, then run the confidence interval procedure in Data Analysis Plus.

51 Checking the required conditions for the paired observations case The validity of the results depends on the normality of the differences. Testing Normality Testing Normality Testing  D Testing  D non-normal populations non-normal populations

Inferences about the ratio of two variances In this section we draw inference about the ratio of two population variances. This question is interesting because: –Variances can be used to evaluate the consistency of processes. –The relationships between variances determine the technique used to test relationships between mean values

53 The parameter tested is  1 2 /  2 2 The statistic used is Parameter tested and statistic The Sampling distribution of  1 2 /  2 2 – –The statistic [ s 1 2 /  1 2 ] / [ s 2 2 /  2 2 ] follows the F distribution with 1 = n 1 – 1, and 2 = n 2 – 1.

54 –Our null hypothesis is always H 0 :  1 2 /  2 2 = 1 – –Under this null hypothesis the F statistic becomes F = S12/12S12/12 S22/22S22/22 Parameter tested and statistic

55

56 (see example 13.1) In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first. Example 13.6 (revisiting 13.1) Calories intake at lunch The hypotheses are: H 0 : H 1 : Testing the ratio of two population variances

57 – –The F statistic value is F=S 1 2 /S 2 2 =.3845 – –Conclusion: Because.3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence in the data to argue at 5% significance level that the variance of the two groups differ. Testing the ratio of two population variances Solving by hand –The rejection region is F>F  2, 1, 2 or F<1/F 

58 (see Xm13.1)Xm13.1 In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first. The hypotheses are: H 0 : H 1 : Example 13.6 (revisiting 13.1) Testing the ratio of two population variances

59 Estimating the Ratio of Two Population Variances From the statistic F = [ s 1 2 /  1 2 ] / [ s 2 2 /  2 2 ] we can isolate  1 2 /  2 2 and build the following confidence interval:

60 Example 13.7 –Determine the 95% confidence interval estimate of the ratio of the two population variances in example 12.1 –Solution We find F  /2,v1,v2 = F.025,40,120 = 1.61 (approximately) F  /2,v2,v1 = F.025,120,40 = 1.72 (approximately) LCL = (s 1 2 /s 2 2 )[1/ F a/2,v1,v2 ] = ( /10, )[1/1.61]=.2388 UCL = (s 1 2 /s 2 2 )[ F a/2,v2,v1 ] = ( /10, )[1.72]=.6614 Estimating the Ratio of Two Population Variances

Inference about the difference between two population proportions In this section we deal with two populations whose data are nominal. For nominal data we compare the population proportions of the occurrence of a certain event. Examples –Comparing the effectiveness of new drug vs.old one –Comparing market share before and after advertising campaign –Comparing defective rates between two machines

62 Parameter tested and statistic Parameter –When the data is nominal, we can only count the occurrences of a certain event in the two populations, and calculate proportions. –The parameter tested is therefore p 1 – p 2. Statistic –An unbiased estimator of p 1 – p 2 is (the difference between the sample proportions).

63 Sample 1 Sample size n 1 Number of successes x 1 Sample proportion Sample 1 Sample size n 1 Number of successes x 1 Sample proportion Two random samples are drawn from two populations. The number of successes in each sample is recorded. The sample proportions are computed. Sample 2 Sample size n 2 Number of successes x 2 Sample proportion Sample 2 Sample size n 2 Number of successes x 2 Sample proportion x n 1 1 ˆ  p 1 Sampling distribution of

64 The statistic is approximately normally distributed if n 1 p 1, n 1 (1 - p 1 ), n 2 p 2, n 2 (1 - p 2 ) are all equal to or greater than 5. The mean of is p 1 - p 2. The variance of is p 1 (1-p 1 ) /n 1 )+ (p 2 (1-p 2 )/n 2 ) Sampling distribution of

65 Because p 1 and p 2 are unknown, we use their estimates instead. Thus, should all be equal to or greater than 5. The z-statistic

66 Testing the p 1 – p 2 There are two cases to consider: Case 1: H 0 : p 1 -p 2 =0 Calculate the pooled proportion Then Case 2: H 0 : p 1 -p 2 =D (D is not equal to 0) Do not pool the data

67 Example 13.8 –Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. –A study is performed in two communities: Design A is distributed in Community 1. Design B is distributed in Community 2. The old design packages is still offered in both communities. –Design A is more expensive, therefore,to be financially viable it has to outsell design B. Testing p 1 – p 2 (Case I)

68 Summary of the experiment results –Community packages with new design A sold 324 packages with old design sold –Community packages with new design B sold 442 packages with old design sold –Use 5% significance level and perform a test to find which type of packaging to use. Testing p 1 – p 2 (Case I)

69 Solution –The problem objective is to compare the population of sales of the two packaging designs. –The data is qualitative (yes/no for the purchase of the new design per customer) –The hypotheses test are H 0 : p 1 - p 2 = 0 H 1 : p 1 - p 2 > 0 –We identify here case 1. Population 1 – purchases of Design A Population 2 – purchases of Design B Testing p 1 – p 2 (Case I)

70 Solving by hand –For a 5% significance level the rejection region is z > z  = z.05 = From Xm13-08.xls we have:Xm13-08.xls Testing p 1 – p 2 (Case I)

71 Excel (Data Analysis Plus) Testing p 1 – p 2 (Case I) Xm13-08.xls Conclusion Since 2.89 > 1.645, there is sufficient evidence in the data to conclude at 5% significance level, that design A will outsell design B. Additional example

72 Example 13.9 (Revisit example 13.08) –Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. –A study is performed in two communities: Design A is distributed in Community 1. Design B is distributed in Community 2. The old design packages is still offered in both communities. –For design A to be financially viable it has to outsell design B by at least 3%. Testing p 1 – p 2 (Case II)

73 Summary of the experiment results –Community packages with new design A sold 324 packages with old design sold –Community packages with new design B sold 442 packages with old design sold Use 5% significance level and perform a test to find which type of packaging to use. Testing p 1 – p 2 (Case II)

74 Solution –The hypotheses to test are H 0 : p 1 - p 2 =.03 H 1 : p 1 - p 2 >.03 –We identify case 2 of the test for difference in proportions (the difference is not equal to zero). Testing p 1 – p 2 (Case II)

75 Solving by hand The rejection region is z > z  = z.05 = Conclusion: Since 1.58 < do not reject the null hypothesis. There is insufficient evidence to infer that packaging with Design A will outsell this of Design B by 3% or more. Testing p 1 – p 2 (Case II)

76 Using Excel (Data Analysis Plus) Testing p 1 – p 2 (Case II) Xm13-08.xls

77 Estimating p 1 – p 2 Example (estimating the cost of life saved) –Two drugs are used to treat heart attack victims: Streptokinase (available since 1959, costs $460) t-PA (genetically engineered, costs $2900). –The maker of t-PA claims that its drug outperforms Streptokinase. –An experiment was conducted in 15 countries. 20,500 patients were given t-PA 20,500 patients were given Streptokinase The number of deaths by heart attacks was recorded.

78 Experiment results –A total of 1497 patients treated with Streptokinase died. –A total of 1292 patients treated with t-PA died. Estimate the cost per life saved by using t-PA instead of Streptokinase. Estimating p 1 – p 2

79 Solution –The problem objective: Compare the outcomes of two treatments. –The data is nominal (a patient lived/died) –The parameter estimated is p 1 – p 2. p 1 = death rate with t-PA p 2 = death rate with Streptokinase Estimating p 1 – p 2

80 Solving by hand –Sample proportions: –The 95% confidence interval is Estimating p 1 – p 2

81 Interpretation –We estimate that between.51% and 1.49% more heart attack victims will survive because of the use of t-PA. –The difference in cost per life saved is = $2440. –The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431 Estimating p 1 – p 2