Chapter 14 Chi Square - 2 1/10/2019 Chi square
Chi Square Chi Square is a non-parametric statistic used to test the null hypothesis. It is used for nominal data. It is equivalent to the F test that we used for single factor and factorial analysis. 1/10/2019 Chi square
… Chi Square Nominal data puts each participant in a category. Categories are best when mutually exclusive and exhaustive. This means that each and every participant fits in one and only one category Chi Square looks at frequencies in the categories. 1/10/2019 Chi square
Expected frequencies and the null hypothesis ... Chi Square compares the expected frequencies in categories to the observed frequencies in categories. “Expected frequencies”are the frequencies in each cell predicted by the null hypothesis 1/10/2019 Chi square
… Expected frequencies and the null hypothesis ... H0: fo = fe There is no difference between the observed frequency and the frequency predicted (expected) by the null. The experimental hypothesis: H1: fo fe The observed frequency differs significantly from the frequency predicted (expected) by the null. 1/10/2019 Chi square
Calculating 2 For each cell: Calculate the deviations of the observed from the expected. Square the deviations. Divide the squared deviations by the expected value. 1/10/2019 Chi square
Calculating 2 Add ‘em up. Then, look up 2 in Chi Square Table df = k - 1 (one sample 2) OR df= (Columns-1) * (Rows-1) (2 or more samples) 1/10/2019 Chi square
Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89
Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Degrees of freedom
= .05 Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Critical values = .05
= .01 Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Critical values = .01
Example If there were 5 degrees of freedom, how big would 2 have to be for significance at the .05 level? 1/10/2019 Chi square
Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89
Another example If there were 2 degrees of freedom, how big would 2 have to be for significance at the .05 level? Note: Unlike most other tables you have seen, the critical values for Chi Square get larger as df increase. This is because you are summing over more cells, each of which usually contributes to the total observed value of chi square. 1/10/2019 Chi square
Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89
One sample example from the cpe: Party: 75% male, 25% female There are 40 swimmers. Since 75% of people at party are male, 75% of swimmers should be male. So expected value for males is .750 X 40 = 30. For women it is .250 x 40 = 10.00 Observed 20 Expected 30 10 O-E -10 10 (O-E)2 100 (O-E)2/E 3.33 10 Male Female 2 = 13.33 df = k-1 = 2-1 = 1 1/10/2019 Chi square
2 (1, n=40)= 13.33 Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Exceeds critical value at = .01 Reject the null hypothesis. Gender does affect who goes swimming. Women go swimming more than expected. Men go swimming less than expected.
Freshman and sophomores who like horror movies. 2 sample example Freshman and sophomores who like horror movies. Freshmen Sophomores Likes horror films 150 50 Dislikes horror films 100 200 1/10/2019 Chi square
… CPE 15.2.1 Freshman and sophomores and horror movies. There are 500 altogether. 200 (or a proportion of .400 are freshmen, 300 (.600) are sophmores. (Proportions appear in parentheses in the margins.) Multiplying by row totals yield the following expected frequency for the first cell. (This time we use the formula: (Proprowncol)=Expected Frequency). (EF appears in parentheses in each cell.) Freshmen Sophomores Likes horror films 150 (100) 50 (100) 200 (.400) Dislikes horror films 100 (150) 200 (150) 300 (.600) 250 250 500 1/10/2019 Chi square
Computing 2 2 = 83.33 Observed 150 100 50 200 Expected 100 150 O-E -50 (O-E)2 2500 (O-E)2/E 25.00 16.67 Fresh Likes Fresh Dislikes Soph Likes Soph Dislikes 2 = 83.33 df = (C-1)(R-1) = (2-1)(2-1) = 1 1/10/2019 Chi square
2 (1, n=500)= 83.33 Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Critical at = .01 Reject the null hypothesis. Fresh/Soph dimension does affect liking for horror movies. Proportionally, more freshman than sophomores like horror movies
The only (slightly) hard part is computing expected frequencies In one sample case, multiply n by hypothetical proportion based on random model. Random model says that proportion in population in each category should be same as in the sample. 1/10/2019 Chi square
Simple Example - 100 teenagers listen to radio stations H1: Some stations are more popular with teenagers than others. H0: Radio station do not differ in popularity with teenagers. NOTE: YOU ALWAYS TEST H0 Expected frequencies are the frequencies predicted by the null hypothesis. In this case, the problem is simple because the null predicts an equal proportion of teenagers will prefer each of the four radio stations. Station Station Station Station A B C D 25 Expected Values 40 10 20 30 Observed Values Is the observed significantly different from the expected? 1/10/2019 Chi square
2 = 20.00 2(3, n=100) = 20.00, p<.01 (O-E)2 O-E (O-E)2/E Observed Expected 40 30 20 10 25 15 5 -5 225 25 9.00 1.00 Closeness to final exam Category 1 Station 2 Station 3 Station 4 2 = 20.00 df = k-1 = (4-1) = 3 2(3, n=100) = 20.00, p<.01
Example - Admissions to Psychiatric Hospitals Close to a once/year final H1: More students are admitted to psychiatric hospitals when it is near their final exam. H0: Time from final exam does not have an effect on hospital admissions. . Category 1: Within 7 days of final. (11 admitted) Category 2: Between 8 and 30 days. (24 admitted) Category 3: Between 31 and 90 days. (69 admitted) Category 4: More than 90 days. (96 admitted) 1/10/2019 Chi square
Psychiatric Admissions Expected frequency=expected proportion of days*n There are 365 days and 1 final and 200 patients admitted each year. Proportion of each kind of day computed below: Number of days Category 1 (within 7): Category 2 (8-30): Category 3 (31-90): Category 4 (rest of year): 1/10/2019 Chi square
Expected Frequencies To obtain expected frequencies with 200 admissions: multiply proportion of days of each type by n=200. This time the proportions are not equal. Days: Category 1 (within 7): Category 2 (8-30): Category 3 (31-90 ): Category 4 (rest of year): 1/10/2019 Chi square
2 = 1.57 2(3, n=200) = 1.57, n.s. (O-E)2 O-E (O-E)2/E Observed Expected 11 24 69 96 8 26 66 100 3 -2 -4 9 4 16 1.12 0.15 0.14 0.16 Closeness to final exam Category 1 Category 2 Category 3 Category 4 2 = 1.57 df = k-1 = (4-1) = 3 2(3, n=200) = 1.57, n.s.
The only (slightly)hard part is computing expected frequencies In the multi-sample case, multiply the proportion in each row by n in each column to obtain EF in each cell. 1/10/2019 Chi square
Vit C and flu study Sixty randomly chosen participants. Thirty get Vitamin C. Of that 30, 10 get the flu, 20 do not Thirty get placebo Of that 30, 15 get the flu, 15 do not 1/10/2019 Chi square
Expected frequency = proportionROW nCOL got flu no flu row n (prop.) Vit C 10 20 30 (.500) No Vit C 15 15 30 (.500) Col. Totals 30 30 n=60 1/10/2019 Chi square
Expected frequencies 10 15 20 (12.50) (17.50) Vitamin C Placebo Multiply the proportion in each row times the number in each column. Here Vitamin C row has 30 research participants. Total n = 60. So proportion in that row =30/60=.500. Same for placebo group. Number in each column: Twenty-five got influenza. So (25 X .500=12.50 should come from the Vitamin C group. Same for placebo. Thirty five did not get influenza, so 35X.500 = 17.5 of each group should not have gotten the flu. Vitamin C Placebo Had Influenza. No influenza. 10 15 20 (Expected) Values (12.50) (17.50) Observed Values Are the observed significantly different from the expected? 1/10/2019 Chi square
Computing 2 2 = 1.72 Observed 10 20 15 Expected 12.50 17.50 O-E -2.50 2.50 (O-E)2 6.25 (O-E)2/E .50 .36 VitC-got flu VitC-no flu Placebo-got flu Placebo-no flu 2 = 1.72 df = (C-1)(R-1) = (2-1)(2-1) = 1 1/10/2019 Chi square
Differences are not significant 2 (1, n=60) = 1.72, n.s. Vit C consumption not significantly related to getting the flu in this study. 1/10/2019 Chi square
A 3 x 4 Chi Square Women, stress, and seating preferences. (and perimeter vs. interior, front vs. back Front Front Back Back Perim Inter Perim Inter Very Stressed Females Moderately Stressed Females Control Group Females 10 70 5 15 100 15 50 10 25 100 35 30 15 20 100 30 60 n=300 60 150 1/10/2019 Chi square
Proportion in each row nROW/n=100/300=.333 1/10/2019 Chi square
Expected frequencies Women, stress, and perimeter versus interior seating preferences. Front Front Back Back Perim Inter Perim Inter Very Stressed Females Moderately Stressed Females Control Group Females 10 (20) 70 5 15 100 15 (20) 50 10 25 100 35 (20) 30 15 20 100 30 60 300 60 150 1/10/2019 Chi square
Column 2 Women, stress, and perimeter versus interior seating preferences. Front Front Back Back Perim Inter Perim Inter Very Stressed Females Moderately Stressed Females Control Group Females 10 (20) 70 (50) 5 15 100 15 (20) 50 (50) 10 25 100 35 (20) 30 (50) 15 20 100 30 60 300 60 150 1/10/2019 Chi square
Column 3 Women, stress, and perimeter versus interior seating preferences. Front Front Back Back Perim Inter Perim Inter Very Stressed Females Moderately Stressed Females Control Group Females 10 (20) 70 (50) 5 (10) 15 100 15 (20) 50 (50) 10 (10) 25 100 35 (20) 30 (50) 15 (10) 20 100 30 60 300 60 150 1/10/2019 Chi square
All the expected frequencies Women, stress, and perimeter versus interior seating preferences. Front Front Back Back Perim Inter Perim Inter Very Stressed Females Moderately Stressed Females Control Group Females 10 (20) 70 (50) 5 (10) 15 (20) 100 15 (20) 50 (50) 10 (10) 25 (20) 100 35 (20) 30 (50) 15 (10) 20 (20) 100 30 60 300 60 150 1/10/2019 Chi square
2 = 41.00 Observed 10 70 5 15 Expected 20 50 10 O-E -10 20 -5 (O-E)2 100 400 25 (O-E)2/E 5.00 8.00 2.50 1.25 Very Stressed FrontP FrontI BackP BackI 15 50 10 25 20 50 10 -5 5 25 1.25 0.00 Moderately Stressed FrontP FrontI BackP BackI 35 30 15 20 20 50 10 15 -20 5 225 400 25 11.25 8.00 2.50 0.00 Control Group FrontP FrontI BackP BackI 2 = 41.00 df = (C-1)(R-1) = (4-1)(3-1) = 6
2 (6, N=300)= 41.00 Critical values of 2 df 1 2 3 4 5 6 7 8 .05 3.84 5.99 5.82 9.49 11.07 12.59 14.07 15.51 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 df 9 10 11 12 13 14 15 16 .05 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 .01 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 df 17 18 19 20 21 22 23 24 .05 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 .01 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 df 25 26 27 28 29 30 .05 37.65 38.89 40.14 41.34 42.56 43.77 .01 44.31 45.64 46.96 48.28 49.59 50.89 Critical at = .01 Reject the null hypothesis. There is an effect between stressed women and seating position.
2 = 41.00 Observed 10 70 5 15 Expected 20 50 10 O-E -10 20 -5 (O-E)2 100 400 25 (O-E)2/E 5.00 8.00 2.50 1.25 Very Stressed FrontP FrontI BackP BackI 15 50 10 25 20 50 10 -5 5 25 1.25 0.00 Moderately Stressed FrontP FrontI BackP BackI Very stressed women avoid the perimeter and prefer the front interior. The control group prefers the perimeter and avoids the front interior. 35 30 15 20 20 50 10 15 -20 5 225 400 25 11.25 8.00 2.50 0.00 Control Group FrontP FrontI BackP BackI 2 = 41.00 df = (C-1)(R-1) = (4-1)(3-1) = 6
One sample Multi-sample Summary: Different Ways of Computing the Frequencies Predicted by the Null Hypothesis One sample Expect subjects to be distributed equally in each cell. OR Expect subjects to be distributed proportionally in each cell. OR Expect subjects to be distributed in each cell based on prior knowledge, such as, previous research. Multi-sample Expect subjects in different conditions to be distributed similarly to each other. Find the proportion in each row and multiply by the number in each column to do so. 1/10/2019 Chi square
Conclusion - Chi Square Chi Square is a non-parametric statistic,used for nominal data. It is equivalent to the F test that we used for single factor and factorial analysis. Chi Square compares the expected frequencies in categories to the observed frequencies in categories. 1/10/2019 Chi square
… Conclusion - Chi Square The null hypothesis: H0: fo = fe There is no difference between the observed frequency and frequency predicted by the null hypothesis. The experimental hypothesis: H1: fo fe The observed frequency differs significantly from the frequency expected by the null hypothesis. 1/10/2019 Chi square
The end. 1/10/2019 Chi square