Two-dimensional Chi-square Sometimes, we want to classify cases on two dimensions at the same time – for example, we might want to classify newly-qualified physicians on the basis of their choice of type of practice and their sex. If we did this, we could ask whether there is any relationship between the two – that is, are women and men equally likely to choose each type?
Two-dimensional Chi-square If we classify a set of cases on two dimensions, and the two dimensions are independent of each other, then the proportions of events in the categories on one dimension should be the same in all the categories on the other dimension: Thus, if choice of type of medical practice is independent of sex, then the proportions of men choosing various types of practice should be the same as the proportions of women…
Two-dimensional Chi-square Specialty Sex Rural GP City GP Specialist Σ Male 5 20 15 40 Female 20 80 60 160 In this data set, there are four times as many women in the sample as men. There are also four times as many women in each specialty – thus, choice of specialty appears to be independent of sex.
Two-dimensional Chi-square Specialty Sex Rural GP City GP Specialist Σ Male 16 100 44 160 Female 4 25 11 40 In this data set, there are four times as many men as women – but again the proportions are constant across specialties. Again, choice of specialty appears to be independent of sex.
Two-dimensional Chi-square Specialty Sex Rural GP City GP Specialist Σ Male 20 45 35 100 Female 5 65 30 100 In this data set, there are equal numbers of women and men. But the proportions vary across specialties – thus, choice of specialty appears to be dependent on sex.
Two-dimensional Chi-square The null hypothesis in the two-dimensional chi-square test is that the two dimensions are not related (that is, they are independent). To test this hypothesis, we need to compute expected values for each of the cells defined by the two dimensions. In there were 25 rural GPs in our sample, and if type of practice were independent of sex, then half of the rural GPs should be men and half women.
Two-dimensional Chi-square Our expected values reflect two proportions: the proportion of the sample in each sex category and the proportion in each practice category: Specialty Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Female 12.5 55 32.5 100 Σ 25 110 65 200
Two-dimensional Chi-square We’ll step through the calculations: Specialty Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Female 12.5 55 32.5 100 Σ 25 110 65 200
Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Specialty Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Female 12.5 55 32.5 100 Σ 25 110 65 200 First, notice this number – the sum of all the observations Then note this number – the number of males
Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Specialty Sex Rural GP City GP Specialist Σ Male 12.5 55 32.5 100 Female 12.5 55 32.5 100 Σ 25 110 65 200 Then note this number – the number of rural GPs This number is calculated as: 100 * 25 = 12.5 200
Two-dimensional Chi-square Thus, expected values are computed as Expected value = (Row total * column total) sum of observations. If you can do that, you can do the 2-dimensional chi-square.
Two-dimensional Chi-square For the physicians example, we compute: Χ2 = [20-12.5]2 + [5-12.5]2 + [45-55]2 + [65-55]2 12.5 12.5 55 55 + [35-32.5]2 + [30-32.5]2 32.5 32.5 = 13.0209
Two-dimensional Chi-square For the 2-D chi-square, degrees of freedom are: (r-1)(c-1) where r = # of rows and c = # of columns. Here, r = 2, c = 3, so d.f. = 1 * 2 = 2. Thus, Χ2crit = Χ2(.05,2) = 5.99147. Our decision is to reject the null hypothesis (that the two dimensions are independent).
Formula for compute expected values More generally, the rule for working out expected values in two dimensional classifications is: Ê(nij) = ri * cj n where n = total number of observations (cases in the sample)
Chi-square – Example 1 (from last week) At a recent meeting of the Coin Flippers Society, each member flipped three coins simultaneously and the number of tails occurring was recorded. 1b. Subsequently, the number of tails each member flipped was determined for different value coins. The data are shown on the next slide as the number of members throwing different numbers of tails with different value coins.
Chi-square – Example 1b Coin Number of Tails Value 0 1 2 3 .05 20 55 72 15 .10 24 70 70 24 .25 21 57 52 20 Is there evidence that the number of tails is affected by coin value? (α = .05)
Σ Chi-square – Example 1b HO: The two classifications are independent HA: The two classifications are dependent Test statistic: Χ2 = [nij – Ê(nij)]2 Ê(nij) Rejection region: Χ2obt > Χ2crit = Χ2(.05, 6) = 12.5916 Σ
Chi-square – Example 1b The first step is to compute the expected values for each cell, using the formula: Ê(nij) = ri * cj n For the top left cell, we get: (65) (162) = 21.06 500
Chi-square – Example 1b Using the formula for all the other cells gives: 0 1 2 3 .05 21.06 58.99 62.86 19.12 .10 24.44 68.43 72.94 22.18 .25 19.50 54.60 58.20 17.70 We are now ready to compute Χ2 obtained.
Chi-square – Example 1b Χ2obt = [20-21.06]2 + … + [20-17.7]2 21.06 17.7 = 4.032 Decision: do not reject HO - there is no evidence that the number of tails is affected by coin value.
Chi-square – Example 2b There is an “old wives’ tale” that babies don’t tend to be born randomly during the day but tend more to be born in the middle of the night, specifically between the hours of 1 AM and 5 AM. To investigate this, a researcher collects birth-time data from a large maternity hospital. The day was broken into 4 parts: Morning (5 AM to 1 PM), Mid-day (1 PM to 5 PM), Evening (5 PM to 1 AM), and Mid-night (1 AM to 5 AM).
Chi-square – Example 2b The numbers of births at these times for the last three months (January to March) are shown below: Morning Mid-day Evening Mid-night 110 50 100 100
Chi-square – Example 2b A question can certainly be raised as to whether the pattern reported above is peculiar to births in the winter months or reflects births at other times of the year as well. The data obtained from the same hospital during the hottest summer months last year are shown on the next slide, along with the original data.
Chi-square – Example 2b Morn Midd Even Mid-night Σ 110 50 100 100 360 110 50 100 100 360 90 40 80 70 280 Σ 200 90 180 170 640 Are the two patterns different? (α = .05)
Σ Chi-square – Example 2b HO: The two classifications are independent HA: The two classifications are dependent Test statistic: Χ2 = [nij – Ê(nij)]2 Ê(nij) Rejection region: Χ2obt > Χ2crit = Χ2(.05, 3) = 7.81 Σ
Chi-square – Example 2b The first step is to compute the expected values for each cell, using the formula: Ê(nij) = ri * cj n For the top left cell, we get: (200) (360) = 112.5 640
Chi-square – Example 2b Using the formula for the other cells we get: Morn Midd Even Midn Cold 112.5 50.625 101.25 95.625 Hot 87.5 39.375 78.75 74.375
Chi-square – Example 1b Χ2obt = [110-112.5]2 + … + [70-74.375]2 112.5 74.375 = 0.6374 Decision: do not reject HO - there is no evidence that the pattern of births is different in the hot months compared to the rest of the year.