St. Edward’s University SLIDES BY John Loucks St. Edward’s University .
Chapter 11 Comparisons Involving Proportions and a Test of Independence Inferences About the Difference Between Two Population Proportions Hypothesis Test for Proportions of a Multinomial Population Test of Independence
Inferences About the Difference Between Two Population Proportions Interval Estimation of p1 - p2 Hypothesis Tests About p1 - p2
Inferences About the Difference Between Two Population Proportions Let: p1 denote the proportion for population 1 p2 denote the population for population 2 To make an inference about p1 - p2 we will select two independent random samples consisting of n1 units from population 1 and n2 units from population 2. Let: denote the sample proportion for population 1 denote the sample proportion for population 2
Sampling Distribution of Expected Value Standard Deviation (Standard Error) where: n1 = size of sample taken from population 1 n2 = size of sample taken from population 2
Sampling Distribution of If the sample sizes are large, the sampling distribution of can be approximated by a normal probability distribution. The sample sizes are sufficiently large if all of these conditions are met: n1p1 > 5 n1(1 - p1) > 5 n2p2 > 5 n2(1 - p2) > 5
Sampling Distribution of p1 – p2
Interval Estimation of p1 - p2 Interval Estimate where: Point Estimate is Margin of Error is
Interval Estimation of p1 - p2 Example: Market Research Associates Market Research Associates is conducting research to evaluate the effectiveness of a client’s new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of the client’s product. The new campaign has been initiated with TV and newspaper advertisements running for three weeks.
Interval Estimation of p1 - p2 Example: Market Research Associates A survey conducted immediately after the new campaign showed 120 of 250 households “aware” of the client’s product. Does the data support the position that the advertising campaign has provided an increased awareness of the client’s product?
Point Estimator of the Difference Between Two Population Proportions p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign = sample proportion of households “aware” of the product after the new campaign product before the new campaign
Interval Estimation of p1 - p2 For = .05, z.025 = 1.96: .08 + 1.96(.0510) .08 + .10 Hence, the 95% confidence interval for the difference in before and after awareness of the product is -.02 to +.18.
Interval Estimation of p1 - p2 Excel Formula Worksheet A B C D E 1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2) 2 No Yes Sample Size 250 150 3 No. of "Yes" =COUNTIF(A2:A251,"Yes") =COUNTIF(B2:B151,"Yes") 4 Samp. Propor. =D3/D2 =E3/E2 5 6 Confid. Coeff. 0.95 7 Lev. Of Signif. =1-D6 8 z Value =NORM.S.INV(1-D7/2,TRUE) 9 10 Std. Error =SQRT(D4*(1-D4)/D2+E4*(1-E4)/E2) 11 Marg. of Error =D8*D10 12 13 Pt. Est. of Diff. =D4-E4 14 Lower Limit =D13-D11 15 Upper Limit =D13+D11 Note: Rows 16-251 are not shown.
Interval Estimation of p1 - p2 Excel Value Worksheet A B C D E 1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2) 2 No Yes Sample Size 250 150 3 No. of "Yes" 120 60 4 Samp. Propor. 0.48 0.40 5 6 Confid. Coeff. 0.95 7 Lev. Of Signif. 0.05 8 z Value 1.960 9 10 Std. Error 0.0510 11 Marg. of Error 0.0999 12 13 Pt. Est. of Diff. 0.080 14 Lower Limit -0.020 15 Upper Limit 0.180 Note: Rows 16-251 are not shown.
Hypothesis Tests about p1 - p2 Hypotheses We focus on tests involving no difference between the two population proportions (i.e. p1 = p2) H0: p1 - p2 < 0 Ha: p1 - p2 > 0 Left-tailed Right-tailed Two-tailed
Hypothesis Tests about p1 - p2 Pooled Estimate of Standard Error of where:
Hypothesis Tests about p1 - p2 Test Statistic
Hypothesis Tests about p1 - p2 Example: Market Research Associates Can we conclude, using a .05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign?
Hypothesis Tests about p1 - p2 p -Value and Critical Value Approaches 1. Develop the hypotheses. H0: p1 - p2 < 0 Ha: p1 - p2 > 0 p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign
Hypothesis Tests about p1 - p2 p -Value and Critical Value Approaches 2. Specify the level of significance. a = .05 3. Compute the value of the test statistic.
Hypothesis Tests about p1 - p2 p –Value Approach 4. Compute the p –value. For z = 1.56, the p–value = .0594 5. Determine whether to reject H0. Because p–value > a = .05, we cannot reject H0. We cannot conclude that the proportion of households aware of the client’s product increased after the new campaign.
Hypothesis Tests about p1 - p2 Critical Value Approach 4. Determine the critical value and rejection rule. For a = .05, z.05 = 1.645 Reject H0 if z > 1.645 5. Determine whether to reject H0. Because 1.56 < 1.645, we cannot reject H0. We cannot conclude that the proportion of households aware of the client’s product increased after the new campaign.
Hypothesis Tests about p1 - p2 Excel Formula Worksheet A B C D E 1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2) 2 No Yes Sample Size =COUNTA(A2:A251) =COUNTA(B2:B151) 3 Resp. of Interest 4 Count for Resp. =COUNTIF(A2:A251,D3) =COUNTIF(B2:B151,E3) 5 Sample Propor. =D4/D2 =E4/E2 6 7 Hypoth. Value 8 Point Est. of Diff. =D5-E5 9 10 Pooled Est. of p =(D2*D5+E2*E5)/(D2+E2) 11 Standard Error 12 Test Statistic =(D8-D7)/D11 13 14 -Value (lower tail) =NORM.S.DIST(D12,TRUE) 15 -Value (upper tail) =1-NORM.S.DIST(D12,TRUE) 16 -Value (two tail) =2*MIN(D14,D15) =SQRT(D10*(1-D10)*(1/D2+1/E2)) Note: Rows 17-251 are not shown.
Hypothesis Tests about p1 - p2 Excel Value Worksheet A B C D E 1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2) 2 No Yes Sample Size 250 150 3 Resp. of Interest 4 Count for Resp. 120 60 5 Sample Propor. 0.48 0.40 6 7 Hypoth. Value 8 Point Est. of Diff. 0.08 9 10 Pooled Est. of p 0.450 11 Standard Error 12 Test Statistic 1.557 13 14 -Value (lower tail) 0.940 15 -Value (upper tail) 0.060 16 -Value (two tail) 0.120 0.0514 Note: Rows 17-251 are not shown.
Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned to one and only one of several classes or categories. Such a population is a multinomial population. The multinomial distribution can be thought of as an extension of the binomial distribution. On each trial of a multinomial experiment: One and only one of the outcomes occurs Each trial is assumed to be independent The probabilities of the outcomes remain the same for each trial
1. State the null and alternative hypotheses. Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population 1. State the null and alternative hypotheses. H0: The population follows a multinomial distribution with specified probabilities for each of the k categories Ha: The population does not follow a multinomial distribution with specified probabilities for each of the k categories
Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population 2. Select a random sample and record the observed frequency, fi , for each of the k categories. 3. Assuming H0 is true, compute the expected frequency, ei , in each category by multiplying the category probability by the sample size.
Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population 4. Compute the value of the test statistic. where: fi = observed frequency for category i ei = expected frequency for category i k = number of categories Note: The test statistic has a chi-square distribution with k – 1 df provided that the expected frequencies are 5 or more for all categories.
Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population 5. Rejection rule: p-value approach: Reject H0 if p-value < a Critical value approach: Reject H0 if where is the significance level and there are k - 1 degrees of freedom
Multinomial Distribution Goodness of Fit Test Example: Finger Lakes Homes (A) Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log cabin, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected.
Multinomial Distribution Goodness of Fit Test Example: Finger Lakes Homes (A) The number of homes sold of each model for 100 sales over the past two years is shown below. Split- A- Model Colonial Log Level Frame # Sold 30 20 35 15
Multinomial Distribution Goodness of Fit Test Hypotheses H0: pC = pL = pS = pA = .25 Ha: The population proportions are not pC = .25, pL = .25, pS = .25, and pA = .25 where: pC = population proportion that purchase a colonial pL = population proportion that purchase a log cabin pS = population proportion that purchase a split-level pA = population proportion that purchase an A-frame
Multinomial Distribution Goodness of Fit Test Rejection Rule Reject H0 if p-value < .05 or c2 > 7.815. With = .05 and k - 1 = 4 - 1 = 3 degrees of freedom Do Not Reject H0 Reject H0 2 7.815
Multinomial Distribution Goodness of Fit Test Expected Frequencies Test Statistic e1 = .25(100) = 25 e2 = .25(100) = 25 e3 = .25(100) = 25 e4 = .25(100) = 25 = 1 + 1 + 4 + 4 = 10
Multinomial Distribution Goodness of Fit Test Conclusion Using the p-Value Approach Area in Upper Tail .10 .05 .025 .01 .005 c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838 Because c2 = 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between .025 and .01. The p-value < a . We can reject the null hypothesis.
Multinomial Distribution Goodness of Fit Test Conclusion Using the Critical Value Approach c2 = 10 > 7.815 We reject, at the .05 level of significance, the assumption that there is no home style preference.
Multinomial Distribution Goodness of Fit Test Excel Worksheet (showing data) Note: Rows 13-101 are not shown.
Multinomial Distribution Goodness of Fit Test Excel Formula Worksheet C D E F G H I 1 Hyp. Observed Expect. Sq'd. Sq.Diff./ 2 Categ. Prop. Frequency Freq. Diff. Exp.Freq. 3 Col. 0.25 =COUNTIF(B2:B101,"Col") =D3*$E$7 =E3-F3 =G3^2 =H3/F3 4 Log =COUNTIF(B2:B101,"Log") =D4*$E$7 =E4-F4 =G4^2 =H4/F4 5 Split-L =COUNTIF(B2:B101,"Spl") =D5*$E$7 =E5-F5 =G5^2 =H5/F5 6 A-Fr. =COUNTIF(B2:B101,"Afr") =D6*$E$7 =E6-F6 =G6^2 =H6/F6 7 Total =SUM(E3:E6) =SUM(I3:I6) 8 9 10 =I7 11 =E9-1 12 =CHISQ.DIST.RT(E10,E11) Categories Degr. of Free. p -Value Test Statistic Note: Columns A-B and rows 13-101 are not shown.
Multinomial Distribution Goodness of Fit Test Excel Value Worksheet C D E F G H I 1 Hyp. Observed Expect. Sq'd. Sq.Diff./ 2 Categ. Prop. Frequency Freq. Diff. Exp.Freq. 3 Col. 0.25 30 25 5 4 Log 20 -5 Split-L 35 10 100 6 A-Fr. 15 -10 7 Total 8 9 11 12 0.0186 Categories Degr. of Free. p -Value Test Statistic Note: Columns A-B and rows 13-101 are not shown.
Test of Independence Another important application of the chi-square distribution involves using sample data to test for the independence of two variables. To test whether two variables are independent, one sample is selected and crosstabulation is used to summarize the data for the two variables simultaneously.
Test of Independence 1. Set up the null and alternative hypotheses. H0: The column variable is independent of the row variable Ha: The column variable is not independent of the row variable 2. Select a random sample and record the observed frequency, fij , for each cell of the contingency table. 3. Compute the expected frequency, eij , for each cell.
Test of Independence 4. Compute the test statistic. 5. Determine the rejection rule. Reject H0 if p -value < a or . where is the significance level and, with n rows and m columns, there are (n - 1)(m - 1) degrees of freedom.
Test of Independence Example: Finger Lakes Homes (B) Each home sold by Finger Lakes Homes can be classified according to price and to style. Finger Lakes’ manager would like to determine if the price of the home and the style of the home are independent variables.
Test of Independence Example: Finger Lakes Homes (B) The number of homes sold for each model and price for the past two years is shown below. For convenience, the price of the home is listed as either $200,000 or less or more than $200,000. Price Colonial Log Split-Level A-Frame < $200,000 18 6 19 12 > $200,000 12 14 16 3
Test of Independence Hypotheses H0: Price of the home is independent of the style of the home that is purchased Ha: Price of the home is not independent of the style of the home that is purchased
Test of Independence Expected Frequencies Price Colonial Log Split-Level A-Frame Total < $200K > $200K Total 18 6 19 12 55 12 14 16 3 45 30 20 35 15 100
Test of Independence Rejection Rule With = .05 and (2 - 1)(4 - 1) = 3 d.f., Reject H0 if p-value < .05 or 2 > 7.815 Test Statistic = .1364 + 2.2727 + . . . + 2.0833 = 9.149
Test of Independence Conclusion Using the p-Value Approach Area in Upper Tail .10 .05 .025 .01 .005 c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838 Because c2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between .05 and .025. The p-value < a . We can reject the null hypothesis.
Test of Independence Conclusion Using the Critical Value Approach We reject, at the .05 level of significance, the assumption that the price of the home is independent of the style of home that is purchased.
Test of Independence Excel Worksheet (showing data) A B C D E 1 Home Price ($) Style 2 >200K Colonial 3 <=200K Log 4 5 A-Frame 6 7 Split-Level 8 9 10 Note: Rows 11-101 are not shown.
Test of Independence Excel Worksheet (showing Pivot Table) J 1 2 Price ($) Colonial Log Split-Lev. A-Frame 3 <=200K 18 6 19 12 55 4 >200K 14 16 45 5 Grand Total 30 20 35 15 100 Count of Home Preference Grand Tot. Note: Columns A-D are not shown.
Test of Independence Excel Formula Worksheet G H I J 1 2 Price ($) Colonial Log Split-Lev. A-Frame 3 <=200K 18 6 19 12 55 4 >200K 14 16 45 5 Grand Total 30 20 35 15 100 Count of Home Preference 7 Expected Frequencies 8 9 10 11 =I5*J3/J5 =I5*J4/J5 =H5*J3/J5 =H5*J4/J5 =F5*J3/J5 =G5*J3/J5 =F5*J4/J5 =G5*J4/J5 p -Value =CHISQ.TEST(F3:I4,F9:I10) Grand Tot. Note: Columns A-D are not shown.
Test of Independence Excel Value Worksheet G H I J 1 2 Price ($) Colonial Log Split-Lev. A-Frame 3 <=200K 18 6 19 12 55 4 >200K 14 16 45 5 Grand Total 30 20 35 15 100 Count of Home Preference 7 Expected Frequencies 8 9 10 11 8.25 6.75 19.25 15.75 16.50 11.00 13.50 9.00 p -Value 0.0274 Grand Tot. Note: Columns A-D are not shown.
End of Chapter 11