Download presentation
Presentation is loading. Please wait.
Published byEthel Moody Modified over 9 years ago
1
Chapter 241 Two-Way Tables and the Chi-square Test
2
Chapter 242 Thought Question 1 A random sample of registered voters were asked whether they preferred balancing the budget or cutting taxes. Each was then categorized as being either a Democrat or a Republican. Of the 30 Democrats, 12 preferred cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes. How would you display the data in a table?
3
Chapter 243 u In this chapter we will study the relationship between two categorical variables (variables whose values fall in groups or categories). u To analyze categorical data, use the counts or percents of individuals that fall into various categories. Categorical Variables
4
Chapter 244 u When there are two categorical variables, the data are summarized in a two-way table –each row represents a value of the row variable –each column represents a value of the column variable u The number of observations falling into each combination of categories is entered into each cell of the table –Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table v prevents misleading comparisons due to unequal sample sizes for different groups Two-Way Table
5
Chapter 245 Case Study Statistical Methods for Psychology, 3rd edition, D. C. Howell, 1992, Belmont, CA: Duxbury Press, p. 154. Helped Pick Up Pencils? Which is More Likely: Females or Males?
6
Chapter 246 Case Study Drop the Pencils A handful of pencils were accidentally dropped, so it appeared, by the researcher in an elevator in the presence of either a female subject or a male subject. The subject’s response was observed: did the subject help pick up the pencils or not?
7
Chapter 247 Case Study The Question The question was whether the males or females who observed this mishap would be more likely to help pick up the pencils. u Explanatory variable: gender u Response variable: “pick up” action (Y/N) Categorical Data
8
Chapter 248 Case Study Display the Results: Contingency (Two-Way) Table
9
Chapter 249 Case Study Display the Results: Percentages
10
Chapter 2410 Case Study Statistical Significance Is the difference between the percentages for males vs. females statistically significant? One of the following must be true: Percentages are really the same in population; observed difference is due to chance. - or - Percentages are really different in population; observed difference reflects this.
11
Chapter 2411 Assessing Statistical Significance for a Two-Way Table u Strength of the relationship –measured by the difference in the sample percentages u Much easier to rule out chance with large samples
12
Chapter 2412 Measuring the Difference with the Chi-Square Statistic The Chi-Square statistic measures the magnitude of the difference in the sample percentages, incorporating sample size in its calculation u If percentages in the population are the same, then the Chi-square tends to be small (near 0) u If percentages in the population are different, then the Chi-square tends to be large
13
Chapter 2413 Make the Decision: Is the relationship statistically significant? u “Critical value” for 2 2 tables = 3.84 u If the chi-square value (for 2 2 tables) is larger than 3.84, then the relationship is considered to be statistically significant. v Note: Z = square root of the chi-square. (for 2 2 tables) v Critical value 3.84 is (1.96) 2 [ (~2) 2 ] *Note that the procedure given here is specifically for 2 2 tables (2 rows and 2 columns); the general procedure for any two-way table (with any number of rows and columns) is given later in this chapter (see slide 32)
14
Chapter 2414 Case Study Statistical Significance Is the difference between the percentages for males vs. females statistically significant? Chi-square statistic = 8.65 Since our chi-square is 8.65 > 3.84, we conclude there is a statistically significant relationship between gender and helping to pick up the pencils.
15
Chapter 2415 Case Study Thought Question 1 Agenda versus Political Party Chi-square = 2.75 (significant?)
16
Chapter 2416 Case Study Quitting Smoking with Nicotine Patches (JAMA, Feb. 23, 1994, pp. 595-600) u Two Categorical Variables: –Explanatory: Treatment assignment v Nicotine patch v Control patch –Response: Still smoking after 8 weeks? v Yes v No
17
Chapter 2417 Case Study Display the Results: Contingency (Two-Way) Table
18
Chapter 2418 Case Study Statistically Significant Relationship? u Chi-square = 19.2 u There is a statistically significant relationship between the type of patch used and the cessation of smoking for at least 8 weeks.
19
Chapter 2419 Case Study Popular Ad: Seldane-D Allergy Tablets u Double-blind study of side effects –Seldane-D: 374 subjects v 27 (7.2%) reported drowsiness v 347 did not –Placebo: 193 subjects v 22 (11.4%) reported drowsiness v 171 did not Time, 27 March 1995, p. 18
20
Chapter 2420 Case Study Summaries u Chi-square = 2.58 u Baseline risk of drowsiness using placebo = 11.4% u Risk of drowsiness using Seldane-D= 7.2% u Relative risk of drowsiness using Seldane-D versus placebo = 0.63
21
Chapter 2421 Case Study Conclusion u Randomized, controlled experiment u Statistically insignificant relationship between use of Seldane-D allergy tablets and presence of drowsiness. u Evidence does not support that Seldane-D causes drowsiness in some people.
22
Chapter 2422 A Caution About Sample Size, Statistical Significance, and Chi-Square u The effect of sample size on the chi- square statistic when the table percentages stay the same: u For example, if n=850 (instead of 567) and all percentages remain the same, then the chi-square would be 3.87 (instead of 2.58); would the conclusion change? demo
23
Chapter 2423 Inference for Relative Risk u Confidence Intervals –if two risks are the same, the relative risk is 1 –see if confidence interval contains 1 u Hypothesis Tests –to test if two individual risks are equal, test to see if the relative risk is 1 –use the chi-square value to find the P-value
24
Chapter 2424 Relationship between breast cancer and induced abortion Case Study Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp. 1584-1592. Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not?
25
Chapter 2425 Relationship between breast cancer and induced abortion Case Study: Sample u 845 breast cancer cases were identified in Washington State from 1983 to 1990. u 910 control women were identified using random-digit dialing in the same area. u Women born prior to 1944 were excluded.
26
Chapter 2426 Relationship between breast cancer and induced abortion Case Study: C.I. Results u The relative risk for breast cancer was 1.5, with the higher risk for women who had an induced abortion. u A 95% confidence interval for the relative risk was 1.2 to 1.9. (given) u Note the confidence interval does not contain the value one ( risks are different)
27
Chapter 2427 Relationship between breast cancer and induced abortion Case Study: C.I. Results u No increased risk was found for women who had spontaneous abortions; the relative risk was 0.9. u A 95% confidence interval for the relative risk was 0.7 to 1.2. (given) u Note the confidence interval does contain the value one ( risks are not different)
28
Chapter 2428 Relationship between breast cancer and induced abortion Case Study (continued) Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp. 1584-1592. Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not?
29
Chapter 2429 Case Study: The Hypotheses Null: The risk of developing breast cancer for women who have had an induced abortion is the same as the risk for women who have not had an induced abortion. [RR = 1] Alt: The risk of developing breast cancer for women who have had an induced abortion is different from the risk for women who have not had an induced abortion. [RR 1]
30
Chapter 2430 Case Study: Test Statistic and P-value u Relative Risk = 1.5 u Could also display data in a 2 2 table and compute the chi-square value (9.75). u The P-value (we will not compute this one, just take it from the study) is 0.002. v Recall: Z = square root of the chi-square. (for 2 2 tables)
31
Chapter 2431 Case Study: Decision u Since the P-value is small, we reject chance as the reason for the relative risk (1.5) being different from 1.0. u We find the result to be statistically significant. u We reject the null hypothesis. The data provide evidence that the two population risks (of developing breast cancer) are not the same.
32
Chapter 2432 u The remainder of this chapter presents the general procedure for determining if a significant relationship exists between two categorical variables with any number of levels –how to analyze two-way tables with any number of rows and columns –results apply to special case of 2 2 tables Two-Way Table: General Procedure
33
Chapter 2433 Case Study Data from patients’ own assessment of their quality of life relative to what it had been before their heart attack (data from patients who survived at least a year) Health Care: Canada and U.S. Mark, D. B. et al., “Use of medical resources and quality of life after acute myocardial infarction in Canada and the United States,” New England Journal of Medicine, 331 (1994), pp. 1130-1135.
34
Chapter 2434 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total3112165 Case Study Health Care: Canada and U.S.
35
Chapter 2435 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total3112165 Case Study Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: We have that 75 Canadians reported feeling much better, compared to 541 Americans. The groups appear greatly different, but look at the group totals.
36
Chapter 2436 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total3112165 Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: Change the counts to percents Now, with a fairer comparison using percents, the groups appear very similar in terms of feeling much better. Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% Case Study
37
Chapter 2437 Health Care: Canada and U.S. Is there a relationship between the explanatory variable (Country) and the response variable (Quality of life)? Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For each level of the explanatory variable (Country), look at the percents across all levels of the response variable (Quality of life). Case Study Conclude that a relationship exists if these distributions look significantly different.
38
Chapter 2438 u In tests for two categorical variables, we are interested in whether a relationship observed in a single sample reflects a real relationship in the population. u Hypotheses: –Null: the percentages for one variable are the same for every level of the other variable (No real relationship). –Alt: the percentages for one variable vary over levels of the other variable. (Is a real relationship). Hypothesis Test
39
Chapter 2439 Health Care: Canada and U.S. Null hypothesis: The percentages for one variable are the same for every level of the other variable. (No real relationship). Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For example, could look at differences in percentages between Canada and U.S. for each level of “Quality of life”: 24% vs. 25% for those who felt ‘Much better’, 23% vs. 23% for ‘Somewhat better’, etc. * Want to do all of these comparisons as one overall test… Case Study
40
Chapter 2440 u H 0 : no real relationship between the two categorical variables that make up the rows and columns of a two-way table u To test H 0, compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H 0 were true) –if the observed counts are far from the expected counts, that is evidence against H 0 in favor of a real relationship between the two variables Hypothesis Test
41
Chapter 2441 u The expected count in any cell of a two-way table (when H 0 is true) is Expected Counts
42
Chapter 2442 Quality of lifeCanadaUnited StatesTotal Much better75541616 Somewhat better71498569 About the same96779875 Somewhat worse50282332 Much worse196584 Total31121652476 Case Study Health Care: Canada and U.S. For the expected count of Canadians who feel ‘Much better’ (expected count for Row 1, Column 1): For the observed data to the right, find the expected value for each cell:
43
Chapter 2443 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts: Quality of lifeCanadaUnited States Much better77.37538.63 Somewhat better71.47497.53 About the same109.91765.09 Somewhat worse41.70290.30 Much worse10.5573.45 Expected counts: Compare to see if the data support the null hypothesis
44
Chapter 2444 u To determine if the differences between the observed counts and expected counts are statistically significant (to show a real relationship between the two categorical variables), we use the chi-square statistic: Chi-Square Statistic where the sum is over all cells in the table.
45
Chapter 2445 u The chi-square statistic is a measure of the distance of the observed counts from the expected counts –is always zero or positive –is only zero when the observed counts are exactly equal to the expected counts –large values of X 2 are evidence against H 0 because these would show that the observed counts are far from what would be expected if H 0 were true –the chi-square test is one-sided (any violation of H 0 produces a large value of X 2 ) Chi-Square Statistic
46
Chapter 2446 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts CanadaUnited States 77.37538.63 71.47497.53 109.91765.09 41.70290.30 10.5573.45 Expected counts
47
Chapter 2447 u Calculate value of chi-square statistic –by hand (cumbersome) –using technology (computer software, etc.) u Find P-value in order to reject or fail to reject H 0 –use chi-square table for chi-square distribution (next few slides) –from computer output u If significant relationship exists (small P-value): –compare appropriate percents in data table –compare individual observed and expected cell counts –look at individual terms in the chi-square statistic Chi-Square Test
48
Chapter 2448 Case Study Health Care: Canada and U.S. Using Technology:
49
Chapter 2449 u Family of distributions that take only positive values and are skewed to the right u Specific chi-square distribution is specified by giving its degrees of freedom (formula on next slide) Chi-Square Distributions
50
Chapter 2450 u Chi-square test for a two-way table with r rows and c columns uses critical values from a chi-square distribution with (r 1)(c 1) degrees of freedom u P-value is the area to the right of X 2 under the density curve of the chi-square distribution –use chi-square table Chi-Square Distributions
51
Chapter 2451 Chi-Square Table u See Chapter 24 in text for Table 24.1
52
Chapter 2452 u Finding the P-value for a chi-square test: –use the row for the appropriate degrees of freedom (df) in the left margin of the table –locate the X 2 value in the body of the table –the corresponding probability of lying to the right of this value is found in the top margin of the table (this is the P-value) –usually, the observed X 2 value will be between two values in the table, and the P-value is reported as being between two values Chi-Square P-value
53
Chapter 2453 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. X 2 = 11.725 df = (r 1)(c 1) = (5 1)(2 1) = 4 Look in the df=4 row of Table 24.1; the value X 2 = 11.725 falls between the 0.05 and 0.01 critical values. Thus, the P-value for this chi-square test is between 0.01 and 0.05 (is actually 0.019482 ~ 0.02). ** P-value <.05, so we conclude a significant relationship ** Table 24.1
54
Chapter 2454 u To be significant at level (the cut-off value), the chi-square statistic must be larger than the table entry for –Example: for =.05 (typical cut-off), and df = 4, the test is significant if X 2 > 9.48 v previous Case Study had X 2 = 11.725 > 9.48 –Note: for 2 2 tables, df = (2 1)(2 1) = 1, and the test is significant if X 2 > 3.84 Chi-Square Critical Values
55
Chapter 2455 u Tests the null hypothesis H 0 : no relationship between two categorical variables when you have a two-way table from either of these situations: –Independent SRSs from each of several populations, with each individual classified according to one categorical variable [Example: Health Care case study: two samples (Canadians & Americans); each individual classified according to “Quality of life”] –A single SRS with each individual classified according to both of two categorical variables [Example: Sample of 8235 subjects, with each classified according to their “Job Grade” (1, 2, 3, or 4) and their “Marital Status” (Single, Married, Divorced, or Widowed)] Uses of the Chi-Square Test
56
Chapter 2456 u The chi-square test is an approximate method, and becomes more accurate as the counts in the cells of the table get larger u The following must be satisfied for the approximation to be accurate: –No more than 20% of the expected counts are less than 5 –All individual expected counts are 1 or greater u If these requirements fail, then two or more groups must be combined to form a new (‘smaller’) two-way table Chi-Square Test: Requirements
57
Chapter 2457 u When studying the relationship between two variables, there may exist a lurking variable that creates a reversal in the direction of the relationship when the lurking variable is ignored as opposed to the direction of the relationship when the lurking variable is considered. u The lurking variable creates subgroups, and failure to take these subgroups into consideration can lead to misleading conclusions regarding the association between the two variables. Simpson’s Paradox
58
Chapter 2458 Consider the acceptance rates for the following group of men and women who applied to college. Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men198162360 Women88112200 Total286274560 percents Accepted Not accepted Men55%45% Women44%56% A higher percentage of men were accepted: Discrimination?
59
Chapter 2459 Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men18102120 Women2496120 Total42198240 percents Accepted Not accepted Men15%85% Women20%80% A higher percentage of women were accepted in Business Lurking variable: Applications were split between the Business School (240) and the Art School (320). BUSINESS SCHOOL
60
Chapter 2460 Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men18060240 Women641680 Total24476320 percents Accepted Not accepted Men75%25% Women80%20% ART SCHOOL A higher percentage of women were also accepted in Art Lurking variable: Applications were split between the Business School (240) and the Art School (320).
61
Chapter 2461 u So within each school a higher percentage of women were accepted than men. There is not any discrimination against women!!! u This is an example of Simpson’s Paradox. When the lurking variable (School applied to: Business or Art) is ignored, the data seem to suggest discrimination against women. However, when the School is considered, the association is reversed and suggests discrimination against men. Discrimination? (Simpson’s Paradox)
62
Chapter 2462 Key Concepts u Displaying data in a contingency (two-way) table u Concept of statistical significance with respect to a 2 2 table (chi-square critical value) u Interpreting confidence intervals for relative risk u Interpreting hypothesis tests for relative risk u Chi-square statistic and a statistically significant relationship for any two-way table u Simpson’s Paradox
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.