Hypothesis Testing and Confidence Intervals Two Population Means Hypothesis Testing and Confidence Intervals For Differences in Proportions
SITUATION: 2 Populations Yes = 1 No = 0 Men - Republican? Women - Republican? TV in Cuba - Good? TV in China - Good? HS >$100K? College > $100K? Trad. Course “A”? Internet Course “A”?
Notation
Individual Responses Proportion of Responses Each Individual Response is: 1 = YES or 0 = NO Bernoulli Distribution: Mean = p; Variance = pq (where q = (1-p)) The Average or Proportion of n responses (n large) by the Central Limit Theorem is: Distributed approximately normal with mean = p Variance = pq/n
Differences in Proportions Proportion of “1’s” from Population 1 has: a normal distribution (approximately) true mean: p1 (which we don’t know) true variance p1q1/n1 Proportion of “1’s” from Population 2 has: true mean: p2 (which we don’t know) true variance p2q2/n2
Distribution of the Difference in Proportions True mean: p1 - p2 True variance p1q1/n1 + p2q2/n2 True standard deviation: SQRT(p1q1/n1 + p2q2/n2) But since p1 and p2 are unknown, what should we use for the standard deviation in confidence intervals and hypothesis tests? For all confidence intervals and for all hypothesis tests except H0: p1- p2 = 0 For hypothesis tests of the form H0: p1- p2 = 0
Hypothesis Tests and Confidence Intervals Z-Statistics for Difference in Proportions H0: p1- p2 = v Where v ≠ 0 H0: p1- p2 = 0 Confidence Interval
Example Midas wants to compare customer satisfaction between NY and LA operations. Can it conclude: (1) A greater proportion in LA are satisfied? (2) Customer satisfaction in LA exceeds NY by > 2%? (3) Give a 95% confidence interval for difference in customer satisfaction. Results -- 350 out of 400 in LA were satisfied 160 out of 200 in NY were satisfied
(1) Is a greater proportion in LA? H0: p1 - p2 = 0 HA: p1 - p2 > 0 Select α = .05 Reject H0 (Accept HA) if z > z.05 = 1.645 2.425 > 1.645; so it can be concluded that a greater proportion of Midas customers are satisfied customers in LA compared to New York. This is a hypothesis test with v = 0. Use
(2) Is Customer Satisfaction more than 2% greater in LA? H0: p1 - p2 = .02 HA: p1 - p2 > .02 Select α = .05. Reject H0 (Accept HA) if z > z.05 = 1.645 1.679 > 1.645, so it can be concluded that customer satisfaction in LA exceeds that in New York by more than 2%. This is a hypothesis test with v ≠ 0. Use
95% Confidence Interval for the Difference in Proportions
Excel – Differences in Proportions =COUNTA(A2:A401) =COUNTA(B2:B201) =F1+F2 =COUNTIF(A2:A401,“YES”) =COUNTIF(B2:B201,“YES”) =F5+F6 =F5/F1 =F6/F2 =F7/F3 =(F9-F10-0)/SQRT(F11*(1-F11)*(1/F1+1/F2)) =1-NORMSDIST(E14) =(F9-F10-.02)/SQRT((F9*(1-F9)/F1)+(F10*(1-F10)/F2)) =1-NORMSDIST(E19) =(F9-F10)-NORMSINV(.975)*SQRT((F9*(1-F9)/F1)+(F10*(1-F10)/F2)) =(F9-F10)+NORMSINV(.975)*SQRT((F9*(1-F9)/F1)+(F10*(1-F10)/F2))
Determining Sample Sizes Usual Assumptions: Sample Sizes equal n1 = n2 = n Take the “worst case scenario” for the standard deviation -- when p1 = p2 = .5 (unless you have reason to believe otherwise) Use the “±” part of the confidence interval How many people need to be surveyed in each city to estimate the difference in customer satisfaction to within ± 4%?
No Idea of the Proportions
In a Recent Survey 90% in LA and 75% in NY Were Satisfied
Review Confidence Intervals for difference in proportions Hypothesis Tests for difference in proportions of the form p1 - p2 = 0 By hand and by Excel Hypothesis Tests for difference in proportions of the form p1 - p2 = d Confidence Intervals for difference in proportions Estimating Sample Sizes No idea of the proportions Some idea of the proportions