Chi-square: Comparing Observed and Expected Counts Scientific Practice Chi-square: Comparing Observed and Expected Counts
Where We Are/Where We Are Going Most of what we have done so far has looked at variables and how those might vary in relation to something else… BP when a subject takes a drug (eg paired t-test) what was the effect of the drug on the mean BP? Lung function in association with another variable (correlation/regression) how does carbon dioxide affect Minute Volume? A different approach involves putting subjects into ‘bins’ based on overall outcomes eg ‘the patient died’ we can compare the bin-size of the actual (observed) data against what we expected
Proportions: Observed vs Expected Here’s an example from Intuitive Biostatistics… On average, 10% of patients die following a particularly risky operation this month, 16 out of 75 died is this a ‘real change’ or just ‘coincidence’? it’s obviously a ‘real change’; however, we can estimate the probability of seeing a proportion of 16/75 (21.3%) if the background average still 10% We need to compare what we observed (16/75) with what we expected (7.5/75, 10%) can show this as a table…
Proportions: Observed vs Expected #Observed #Expected (10%) Alive 59 67.5 Dead 16 7.5 Total 75 75 Step 1 : The Null Hypothesis there has been no change in the proportion dying Step 2 : Generate Test Statistic, Chi-square (χ2) first work out what ‘expected’ under Null Hypo then, χ2 = Σ ((Observed – Expected)2 / Expected) the more the data do not fit the expected pattern, the bigger χ2 gets
Proportions: Observed vs Expected Step 2 : Generate Test Statistic, Chi-square (χ2) χ2 = Σ ((Observed – Expected)2 / Expected) χ2 = ((59-67.5)2 / 67.5) +((16-7.5)2 / 7.5) = 10.7 Step 3 : Calculate the probability use a table of critical values of χ2 cols = levels of significance (p-value, or alpha) rows = degrees of freedom α = 0.05 df = categories-1 2-1 = 1
Proportions: Observed vs Expected
Proportions: Observed vs Expected Critical value for χ2 = 3.84 Our value for χ2 = 10.70 So, p < 0.05 if fact, critical value for p = 0.0025 is 9.14, ours is 10.70 so p < 0.0025 Step 4 : Interpret the probability the probability of seeing our proportion (21.3%) dying if the underlying trend of 10% still applied is less than 0.25% (p < 0.0025) so reject the Null Hypo in favour of Alt Hypo… that some factor other than chance responsible
Comparing Two Proportions The above example just looked at one category with two ‘states’ (dead and alive) But what if we wanted to include male and female in our investigation? eg suspect more men than women dying χ2 can be used to look do this via a 2 x 2 table Example, disease progression in AZT/placebo patients… Disease progression No prog Total AZT 76 399 475 Placebo 129 332 461 Total 205 731 936
Comparing Two Proportions Disease progression No prog Total AZT 76 399 475 Placebo 129 332 461 Total 205 731 936 Step 1 : The Null Hypothesis no difference in disease progression in AZT vs placebo Step 2 : Generate the Test Statistic first, need to work out what we would expect if the Null Hypo were true…
Comparing Two Proportions Disease progression No prog Total AZT 76 399 475 Placebo 129 332 461 Total 205 731 936 Disease progression is 205/936 = 21.9% so, if Null Hypo is true, we’d expect… 21.9/100 x 475 = 104 AZT to progress 21.9/100 x 461 = 101 Placebo to progress that leaves 475 – 104 = 371 AZT to not prog and 461 – 101 = 360 Placebo to not prog (can do this with row total x col total / grand total)
Comparing Two Proportions OBSERVED Disease progression No prog Total AZT 76 399 475 Placebo 129 332 461 Total 205 731 936 EXPECTED AZT 104 371 475 Placebo 101 360 461
Comparing Two Proportions Step 2 : Generate Test Statistic, Chi-square (χ2) χ2 = Σ ((Observed – Expected)2 / Expected) χ2 = ((76-104)2 / 104) +((399-371)2 / 371) +((129-101)2 / 101) +((332-360)2 / 360) χ2 = 7.538 +2.113 +7.762 +2.117 = 19.53
Comparing Two Proportions Step 3 : Calculate the probability use a table of critical values of χ2 cols = levels of significance (p-value, or alpha) rows = degrees of freedom α = 0.05 df = (rows-1) x (cols-1) (2-1) x (2-1) = 1 Critical value for χ2 = 3.84 Our value for χ2 = 19.53 So, p < 0.05 if fact, critical value for p = 0.0025 is 10.83, ours is 19.53 so p < 0.001
Comparing Two Proportions Step 4 : Interpret the probability if the Null Hypo is true (no difference in disease progression in AZT vs placebo)… …there is < 0.1% chance of us observing the pattern of progression we actually saw. so reject the Null Hypo in favour of Alt Hypo… that AZT produces different progression rates a reduction! Phew!
Chi-square in Minitab C1 = AZT, C2 = Placebo Row 1 = Progression, Row 2 = No prog
Summary Where counts can be put in ‘bins’ then tests of proportions can be carried out One such test is the chi-square (χ2) test Involves working out what we expected to see if the Null Hypo applied χ2 = Σ ((Observed – Expected)2 / Expected) The bigger (χ2) is, the more unlikely that the Null Hypo (ie the ‘expected’ pattern) is true A common use of χ2 is the 2x2 table