Goodness of Fit Multinomials
Multinomial Proportions Thus far we have discussed proportions for situations where the result for the qualitative variable could be only “success” or “failure” Now we discuss the situation where there are multiple outcomes for the qualitative variable
EXAMPLE Suppose the 1000 people had 5 choices COLA OBSERVED FREQUENCY (1) Coke f 1 = 410 (2) Pepsif 2 = 350 (3) RCf 3 = 80 (4) Shastaf 4 = 50 (5) Joltf 5 = 110
QUESTIONS (1) Can we conclude that there are differences in cola preference? (2) Last year 40% favored Coke, 35% Pepsi and 25% all other brands. Can we conclude these preferences have changed? (3) Give a 95% confidence interval for those who favor Coke.
(1) CAN IT BE CONCLUDED THAT THERE ARE DIFFERENCES IN COLA PREFERENCES? The answer is NO unless we can conclusively show otherwise. H 0 : (NO) p 1 = p 2 = p 3 = p 4 = p 5 =.2 H A : (YES) At least one p j .2 =.05 THIS IS A 2 (Chi-squared) TEST!
THE 2 (Chi-squared) STATISTIC The 2 (Chi-squared) statistic is defined as the cumulative mean square differences between the observed values (f i ) and the expected values if H 0 were true (e i )
RULE OF 5 2 (Chi-squared) is actually only an approximate distribution for the test statistic To be a “valid” approximation: ALL e i ’s should be 5 If the rule of 5 is violated, combine some categories so that the condition is met
THE 2 (Chi-squared) TEST Reject H 0 if 2 > 2.05,DF DF = k-1, where k = # categories (=5, here) 2.05,4 = a critical value found in a 2 table 2.05,4 = If H 0 were true, p 1 = p 2 = p 3 = p 4 = p 5 =.2 We would expect to find: e 1 =.2(1000) = 200; and e 2 =.2(1000) = 200; e 3 =.2(1000) = 200; e 4 =.2(1000) = 200; e 5 =.2(1000) = 200 ALL e i ’s ARE 5
CALCULATION OF 2 THE MULTINOMIAL TABLE Cola ObservedExpected Difference Mean Sq. Dif. i f i e i (f i - e i ) (f i - e i ) 2 /e i SUM = = 2
RESULTS 2 = > 2.05,4 = There is strong evidence that differences exist in cola preferences
(2) CAN IT BE CONCLUDED COLA PREFERENCES HAVE CHANGED SINCE LAST YEAR? H 0 : (NO) p 1 =.40; p 2 =.35; p OTHER =.25 H A : (YES) At least one p j its hypothesized value =.05 There are now k = 3 categories. Reject H 0 if 2 > 2.05,2 =
CALCULATION OF 2 Cola ObservedExpected Difference Mean Sq. Dif. i f i e i (f i - e i ) (f i - e i ) 2 /e i Other SUM =.65 = 2 2 =.65 < Cannot conclude preferences have changed All e i ’s > 5
(3) CONFIDENCE INTERVAL FOR PROPORTION WHO FAVOR COKE This is now binomial –Coke and everything else
Excel The Excel function CHITEST returns the p-value for the hypothesis test. Its form is CHITEST (Range of observed values,Range of estimated values)
=C3*$B$8 Drag down to D4:D7 =CHITEST(B3:B7,D3:D7 ) VERY LOW p-value =C12*$B$15 Drag down to D13:D14 VERY HIGH p-value =CHITEST(B12:B14,D12:D14 )
Review Multinomial problems exist when there are more than two possible outcomes for a qualitative variable Excel Approach -- compare observed values to expected values by using CHITEST to give the p-value Hand approach -- Compare the 2 statistic to 2 ,DF where DF = # categories - 1 = k-1 The value of the 2 statistic is: