Objectives (PSLS Chapter 22) The chi-square test for two-way tables Two-way tables Hypotheses for the chi-square test for two-way tables Expected counts in a two-way table Conditions for the chi-square test Chi-square test for two-way tables of fit Simpson’s paradox
Two-way tables Two-way tables organize data about two categorical variables with a finite number of levels/treatments. High school students were asked whether they smoke and whether their parents smoke: First factor: Parent smoking status Second factor: Student smoking status A two-way table refers to the fact that there are two ways to group or summarize the data. 400 1380 416 1823 188 1168
Marginal distribution The marginal distributions (in the “margins” of the table) summarize each factor independently. Marginal distribution for parental smoking: P(both parents) = ??/?? = ??% P(one parent) = ??% P(neither parent) = ??% 400 1380 416 1823 188 1168
With two factors, there are two marginal distributions. 400 1380 416 1823 188 1168 Marginal distribution for student smoking: P(student smokes) = ??/?? = ??% P(student doesn’t) = ??/?? = ??%
Conditional distribution The cells of the two-way table represent the intersection of a given level of one factor with a given level of the other factor. They can be used to compute the conditional distributions. 400 1380 416 1823 188 1168 Conditional distribution of student smoking for different parental smoking statuses: P(student smokes | both parents) = ??/?? = ??% P(student smokes | one parent) = ??/?? =??% P(student smokes | neither parent) = ??/?? = ??%
Hypotheses A two-way table has r rows and c columns. H0: There is no association between the row and column variables. Ha: There is an association/relationship between the two variables. The null hypothesis can also be stated as, “The row and column variables are independent.” We will compare actual counts from the sample data with expected counts given the null hypothesis of no relationship.
Expected counts in a two-way table A two-way table has r rows and c columns. H0 states that there is no association between the row and column variables (factors) in the table. The expected count in any cell of a two-way table when H0 is true is: Compute an expected count for each cell INSIDE the table (excluding the margin totals). Do not round the expected counts beyond reasonable rounding; the expected counts do not need to be whole numbers because they are only theoretical expectations.
Conditions for the chi-square test The chi-square test for two-way tables looks for evidence of association between two categorical variables (factors) in sample data. The samples can be drawn either: By randomly selecting SRSs from different populations (or from a population subjected to different treatments) girls vaccinated for HPV(Human Papillomavirus) or not among 8th-graders and 12th-graders remission or no remission for different treatments Or by taking one SRS and classifying the individuals according to two categorical variables (factors) obesity and ethnicity among high school students
We can safely use the chi-square test when: very few (no more than 1 in 5) expected counts are < 5.0 all expected counts are ≥ 1.0 [Note: If one factor has many levels and too many expected counts are too low, you might be able to “collapse” some of the levels (regroup them) and thus have large enough expected counts.]
The chi-square test for two-way tables H0 : there is no association between the row and column variables Ha : H0 is not true The c2 statistic is summed over all r × c cells in the table: When H0 is true, the c2 statistic follows ~ c2 distribution with (r-1)(c-1) degrees of freedom. Use Table D or technology to obtain the P-value. P-value: P(c2 variable ≥ calculated c2)
Expected counts computation Student smokes Student not smokes Row total Both parents smoke 400 1780*1004/5375=332.49 1380 1780*4371/5375=1447.51 1780 One 416 2239*1004/5375=418.22 1823 2239*4371/5375=1820.78 2239 Neither 188 1356*1004/5375=253.29 1168 ?? 1356 Column total 1004 4371 5375
Chi-square Stat computation
Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits; columns are the students’ smoking habits). What does it tell you? Sample size? Hypotheses? Are the data okay for a c2 test? Interpretation? N = 5375 H0: there is no association between parental smoking habits and student smoking habits; Ha: H0 is not true. The data are an SRS and the expected counts are all greater than 5, so a chi-square test is okay. There is strong evidence of an association between parental and student smoking habits (P < 0.001). In particular, students with no parents smoking are less likely to smoke, whereas students with both parents smoking are more likely to smoke than we would expect if there was no relationship.
Table D Ex: df = 6 If c2 = 15.9 the P-value is between 0.01−0.02.
1. antidepressant treatment (desipramine) Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect, the drug may have to be taken more frequently and at higher doses. After stopping use, users will feel tired, sleepy, and depressed. A study compares the rates of successful rehabilitation for cocaine addicts following one of three treatment options: 1. antidepressant treatment (desipramine) 2. standard treatment (lithium) 3. placebo (“sugar pill”) Art from http://www.drugabuse.gov/PDF/RRCocaine.pdf
Expected relapse counts Observed % Expected % 35% 35% 35% Expected relapse counts No Yes 25*26/74 ≈ 8.78 25*0.351 16.22 25*0.649 9.13 26*0.351 16.87 26*0.649 8.07 23*0.351 14.93 23*0.649 Desipramine Lithium Placebo
Table of counts: “actual/expected,” with three rows and two columns: No relapse Relapse Table of counts: “actual/expected,” with three rows and two columns: df = (3 − 1)(2 − 1) = 2 15 8.78 10 16.22 7 9.13 19 16.87 4 8.07 19 14.93 Desipramine Lithium Placebo We compute the c2 statistic: Using Table D: 10.60 < c2 < 11.98 ?? > P > ?? The P-value is very small (software gives P = 0.0047) and we reject H0. There is a significant relationship between treatment type (desipramine, lithium, placebo) and outcome (relapse or not).
Interpreting the c2 output When the c2 test is statistically significant: The largest components indicate which condition(s) are most different from H0. You can also compare the observed and expected counts, or compare the computed proportions in a graph. No relapse Relapse Desipramine Lithium Placebo c2 components The largest c2 component, 4.41, is for desipramine/no relapse. Desipramine has the highest success rate (see graph).
The test assumptions are not met. A 2013 Gallup study investigated how phrasing affects the opinions of Americans regarding physician-assisted suicide. Telephone interviews were conducted with a random sample of 1,535 national adults. Using random assignment, 719 heard the question in Form A (“end the patient’s life by some painless means”) and 816 the one in Form B (“assist the patient to commit suicide”). The chi-square test statistic for these data is 2 = 57.88. Conclude using = 0.05. There is significant evidence of a relationship between opinions and question wording. We failed to find significant evidence of a relationship between opinions and question wording. The test assumptions are not met. A. With such a large test statistic, the P-value is very small (software gives P = 3E-13) and highly significant: we reject the null hypothesis of no relationship between phrasing and opinions. This is a comparative randomized experiment and the expected counts are all greater than 5.0 (smallest is 719×55/1535=25.76). So the test assumptions are met.
We found that phrasing significantly (P < 0 We found that phrasing significantly (P < 0.0005) influences opinions about physician-assisted suicide. Specifically, the phrasing of “painless means” resulted in a substantially higher approval (70% in favor) than the phrasing of “commit suicide” (51% in favor). This is a comparative randomized experiment, therefore a causal conclusion is appropriate.
Caution with categorical data Beware of lurking variables! An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox. Kidney stones A study compared the success rates of two different procedures for removing kidney stones: open surgery and percutaneous nephrolithotomy (PCNL), a minimally invasive technique. 273 289 77 61 22% 17% Can you think of a possible lurking variable here?
The procedures are not chosen randomly by surgeons The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones (with a good chance of success) whereas open surgery is likely used for more problematic conditions. 273 289 77 61 22% 17% For both small stones and large stones, open surgery has a lower failure rate. This is Simpson’s paradox. The more challenging cases with large stones tend to be treated more often with open surgery, making it appear as if the procedure was less reliable overall.