Lesson #29 2 2 Contingency Tables
In general, contingency tables are used to present data that has been “cross-classified” by two categorical variables. Begin with a 2 2 table, where both variables are dichotomous.
ab cd a+cb+d a+b c+d Variable 2 Variable 1 n = a+b+c+d In the table, we have observed frequencies (a, b, c, and d). These can also be denoted by: O i i = 1, 2, 3, 4
Arthritis Exercise High Low Yes No OR = (35)(115) (91)(82) = 0.54
We can also test for an association between the two independent variables. The null hypothesis is: This is called a test of independence, or a test of homogeneity. - no association between the two variables - the two variables are independent - the distributions of one variable are homogeneous over levels of the other or
To perform the test, we first need to calculate expected frequencies, E i, in each cell. Recall that if two events are independent, P(A and B) = P(A) P(B) This indicates how many observations we expect to see, if the null hypothesis is true.
P(an observation being in any cell) = P(being in that row and being in that column) = P(being in that row) P(being in that column) Then, “under H 0 ” Thus, under H 0, we can estimate this by
To get the expected number in any cell, multiply the probability of being in that cell by n. This is done for all 4 cells in the 2 2 table
The test statistic is then: OiOi - E i ( ) 2 EiEi ~ under H 0 Reject H 0 if
High Low YesNo Observed Expected High Low YesNo E1E1 = (126)(117) 323 =
Reject H 0 if = = 6.38 Reject H 0 Arthritis is less likely among those who exercised
For a 2 2 table, there is a “shortcut” method: ab cd a+cb+d a+b c+d Variable 2 Variable 1 n = a+b+c+d
Arthritis Exercise High Low Yes No = 6.38