Download presentation
Presentation is loading. Please wait.
1
Two-Way Tables and The Chi-Square Test*
Basic Practice of Statistics - 3rd Edition CHAPTER 24 Two-Way Tables and The Chi-Square Test* STATISTICS CONCEPTS AND CONTROVERSIES Eighth Edition David S. Moore, William I. Notz Lecture Presentation Chapter 5
2
Chapter 24 Concepts Two-Way Tables Inference for a Two-Way Table
The Chi-Square Test Using the Chi-Square Test Simpson’s Paradox
3
Two-Way Tables
4
Two-Way Tables To display relationships between two categorical variables, use a two-way table like the table of admission status and gender of applicants. Admission status is the row variable because each row in the table describes one of the possible admission decisions for an applicant. Gender is the column variable because each column describes one gender. The entries in the table are the counts of applicants in each admission status–by–gender class.
5
Two-Way Tables To describe relationships among categorical variables, calculate appropriate percentages from the counts given.
6
Inference for a Two-Way Table
We often gather data and arrange them in a two-way table to see if two categorical variables are related to each other. The sample data are easy to investigate: turn them into percentages and look for an association between the row and column variables. Is the association in the sample evidence of an association between these variables in the entire population? Or could the sample association easily arise just from the luck of random sampling? This is a question for a significance test.
7
Inference for a Two-Way Table
Cocaine addicts need the drug to feel pleasure. Perhaps giving them a medication that fights depression will help them resist cocaine. A three-year study compared an antidepressant called desipramine, lithium (a standard treatment for cocaine addiction), and a placebo. The subjects were 72 chronic users of cocaine who wanted to break their drug habit. An equal number of the subjects were randomly assigned to each treatment. Here are the counts and percentages of the subjects who succeeded in not using cocaine during the study:
8
Inference for a Two-Way Table
The sample proportions of subjects who did not use cocaine are quite different. In particular, the percentage of subjects in the desipramine group who did not use cocaine was much higher than for the lithium or placebo group. Are these data good evidence that there is a relationship between treatment and outcome in the population of all cocaine addicts?
9
Inference for a Two-Way Table
H0: There is no association between the treatment an addict receives and whether or not there is success in not using cocaine in the population of all cocaine addicts. To test H0, we compare the observed counts in a two-way table with the expected counts, the counts we would expect—except for random variation—if H0 were true. If the observed counts are far from the expected counts, that is evidence against H0. Expected Counts The expected count in any cell of a two-way table when H0 is true is
10
Inference for a Two-Way Table
11
The Chi-Square Test To see if the data give evidence against the null hypothesis of “no relationship,” compare the counts in the two-way table with the counts we would expect if there really were no relationship. The significance test uses a statistic that measures how far apart the observed and expected counts are. Chi-Square Statistic The chi-square statistic, denoted X2, is a measure of how far the observed counts in a two-way table are from the expected counts. The formula for the statistic is
12
The Chi-Square Test
13
The Chi-Square Test
14
The Chi-Square Test The Chi-Square Distributions
The sampling distribution of the chi-square statistic X2 when the null hypothesis of no association is true is called a chi-square distribution. The chi-square distributions are a family of distributions that take only nonnegative values and are skewed to the right. A specific chi-square distribution is specified by giving its degrees of freedom. The chi-square test for a two-way table with r rows and c columns uses critical values from the chi-square distribution with (r − 1)(c − 1) degrees of freedom.
15
The Chi-Square Test
16
The Chi-Square Test
17
The Chi-Square Test
18
Using the Chi-Square Test
Like our test for a population proportion, the chi-square test uses some approximations that become more accurate as we take more observations. Here is a rough rule for when it is safe to use this test. Cell Counts Required for the Chi-Square Test You can safely use the chi-square test when no more than 20% of the expected counts are less than 5, and all individual expected counts are 1 or greater.
19
Using the Chi-Square Test
There is a clear trend: as the anger score increases, so does the percentage who suffer heart disease. Is this relationship between anger and heart disease statistically significant?
20
Using the Chi-Square Test
21
Using the Chi-Square Test
22
Using the Chi-Square Test
23
Using the Chi-Square Test
Can we conclude that proneness to anger causes heart disease? This is an observational study, not an experiment. It isn’t surprising to find that some lurking variables are confounded with anger. The study report used advanced statistics to adjust for many differences among the three anger groups. The adjustments raised the P-value from P = to P = 0.02 because the lurking variables explain some of the heart disease. This is still good evidence for a relationship if a significance level of 0.05 is used. Because the study started with a random sample of people who had no CHD and followed them forward in time, and because many lurking variables were measured and accounted for, it does give some evidence for causation.
24
Simpson’s Paradox As is the case with quantitative variables, the effects of lurking variables can change or even reverse relationships between two categorical variables.
25
Simpson’s Paradox Almost half of the males but only one-third of the females who applied were admitted. Isn’t this proof of discrimination against women?
26
Simpson’s Paradox In its defense, the university produces a three-way table that classifies applicants by gender, admission decision, and the program to which they applied. We now see that engineering admitted exactly half of all applicants, both male and female, and that English admitted one-fourth of both males and females. There is no association between sex and admission decision in either program.
27
Simpson’s Paradox How can no association in either program produce strong association when the two are combined? Look at the data: English is hard to get into, and mainly females apply to that program. Electrical engineering is easier to get into and attracts mainly male applicants. English had 40 female and 20 male applicants, while engineering had 60 male and only 20 female applicants. The original two-way table, which did not take account of the difference between programs, was misleading.
28
Simpson’s Paradox Simpson’s Paradox
An association or comparison that holds for all of several groups can disappear or even reverse direction when the data are combined to form a single group. This situation is called Simpson’s paradox. Simpson’s paradox is just an extreme form of the fact that observed associations can be misleading when there are lurking variables. Remember the caution from Chapter 15: beware the lurking variable.
29
Chapter 24 Concepts Two-Way Tables Inference for a Two-Way Table
The Chi-Square Test Using the Chi-Square Test Simpson’s Paradox
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.