Chapter 241 Two-Way Tables and the Chi-square Test.

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 11 Inference for Distributions of Categorical Data
Does Background Music Influence What Customers Buy?
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square Test of Independence
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 10 Analyzing the Association Between Categorical Variables
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
Analysis of Two-Way tables Ch 9
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Copyright © 2010 Pearson Education, Inc. Slide
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 6: Two-Way Tables*
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Two-Way Tables and The Chi-Square Test*
CHAPTER 11 Inference for Distributions of Categorical Data
Essential Statistics Two Categorical Variables: The Chi-Square Test
Chapter 8: Inference for Proportions
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Chapter 241 Two-Way Tables and the Chi-square Test

Chapter 242 Thought Question 1 A random sample of registered voters were asked whether they preferred balancing the budget or cutting taxes. Each was then categorized as being either a Democrat or a Republican. Of the 30 Democrats, 12 preferred cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes. How would you display the data in a table?

Chapter 243 u In this chapter we will study the relationship between two categorical variables (variables whose values fall in groups or categories). u To analyze categorical data, use the counts or percents of individuals that fall into various categories. Categorical Variables

Chapter 244 u When there are two categorical variables, the data are summarized in a two-way table –each row represents a value of the row variable –each column represents a value of the column variable u The number of observations falling into each combination of categories is entered into each cell of the table –Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table v prevents misleading comparisons due to unequal sample sizes for different groups Two-Way Table

Chapter 245 Case Study Statistical Methods for Psychology, 3rd edition, D. C. Howell, 1992, Belmont, CA: Duxbury Press, p Helped Pick Up Pencils? Which is More Likely: Females or Males?

Chapter 246 Case Study Drop the Pencils A handful of pencils were accidentally dropped, so it appeared, by the researcher in an elevator in the presence of either a female subject or a male subject. The subject’s response was observed: did the subject help pick up the pencils or not?

Chapter 247 Case Study The Question The question was whether the males or females who observed this mishap would be more likely to help pick up the pencils. u Explanatory variable: gender u Response variable: “pick up” action (Y/N) Categorical Data

Chapter 248 Case Study Display the Results: Contingency (Two-Way) Table

Chapter 249 Case Study Display the Results: Percentages

Chapter 2410 Case Study Statistical Significance Is the difference between the percentages for males vs. females statistically significant? One of the following must be true: Percentages are really the same in population; observed difference is due to chance. - or - Percentages are really different in population; observed difference reflects this.

Chapter 2411 Assessing Statistical Significance for a Two-Way Table u Strength of the relationship –measured by the difference in the sample percentages u Much easier to rule out chance with large samples

Chapter 2412 Measuring the Difference with the Chi-Square Statistic The Chi-Square statistic measures the magnitude of the difference in the sample percentages, incorporating sample size in its calculation u If percentages in the population are the same, then the Chi-square tends to be small (near 0) u If percentages in the population are different, then the Chi-square tends to be large

Chapter 2413 Make the Decision: Is the relationship statistically significant? u “Critical value” for 2  2 tables = 3.84 u If the chi-square value (for 2  2 tables) is larger than 3.84, then the relationship is considered to be statistically significant. v Note: Z = square root of the chi-square. (for 2  2 tables) v Critical value 3.84 is (1.96) 2 [ (~2) 2 ] *Note that the procedure given here is specifically for 2  2 tables (2 rows and 2 columns); the general procedure for any two-way table (with any number of rows and columns) is given later in this chapter (see slide 32)

Chapter 2414 Case Study Statistical Significance Is the difference between the percentages for males vs. females statistically significant? Chi-square statistic = 8.65 Since our chi-square is 8.65 > 3.84, we conclude there is a statistically significant relationship between gender and helping to pick up the pencils.

Chapter 2415 Case Study Thought Question 1 Agenda versus Political Party Chi-square = 2.75 (significant?)

Chapter 2416 Case Study Quitting Smoking with Nicotine Patches (JAMA, Feb. 23, 1994, pp ) u Two Categorical Variables: –Explanatory: Treatment assignment v Nicotine patch v Control patch –Response: Still smoking after 8 weeks? v Yes v No

Chapter 2417 Case Study Display the Results: Contingency (Two-Way) Table

Chapter 2418 Case Study Statistically Significant Relationship? u Chi-square = 19.2 u There is a statistically significant relationship between the type of patch used and the cessation of smoking for at least 8 weeks.

Chapter 2419 Case Study Popular Ad: Seldane-D Allergy Tablets u Double-blind study of side effects –Seldane-D: 374 subjects v 27 (7.2%) reported drowsiness v 347 did not –Placebo: 193 subjects v 22 (11.4%) reported drowsiness v 171 did not Time, 27 March 1995, p. 18

Chapter 2420 Case Study Summaries u Chi-square = 2.58 u Baseline risk of drowsiness using placebo = 11.4% u Risk of drowsiness using Seldane-D= 7.2% u Relative risk of drowsiness using Seldane-D versus placebo = 0.63

Chapter 2421 Case Study Conclusion u Randomized, controlled experiment u Statistically insignificant relationship between use of Seldane-D allergy tablets and presence of drowsiness. u Evidence does not support that Seldane-D causes drowsiness in some people.

Chapter 2422 A Caution About Sample Size, Statistical Significance, and Chi-Square u The effect of sample size on the chi- square statistic when the table percentages stay the same: u For example, if n=850 (instead of 567) and all percentages remain the same, then the chi-square would be 3.87 (instead of 2.58); would the conclusion change? demo

Chapter 2423 Inference for Relative Risk u Confidence Intervals –if two risks are the same, the relative risk is 1 –see if confidence interval contains 1 u Hypothesis Tests –to test if two individual risks are equal, test to see if the relative risk is 1 –use the chi-square value to find the P-value

Chapter 2424 Relationship between breast cancer and induced abortion Case Study Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not?

Chapter 2425 Relationship between breast cancer and induced abortion Case Study: Sample u 845 breast cancer cases were identified in Washington State from 1983 to u 910 control women were identified using random-digit dialing in the same area. u Women born prior to 1944 were excluded.

Chapter 2426 Relationship between breast cancer and induced abortion Case Study: C.I. Results u The relative risk for breast cancer was 1.5, with the higher risk for women who had an induced abortion. u A 95% confidence interval for the relative risk was 1.2 to 1.9. (given) u Note the confidence interval does not contain the value one (  risks are different)

Chapter 2427 Relationship between breast cancer and induced abortion Case Study: C.I. Results u No increased risk was found for women who had spontaneous abortions; the relative risk was 0.9. u A 95% confidence interval for the relative risk was 0.7 to 1.2. (given) u Note the confidence interval does contain the value one (  risks are not different)

Chapter 2428 Relationship between breast cancer and induced abortion Case Study (continued) Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not?

Chapter 2429 Case Study: The Hypotheses  Null: The risk of developing breast cancer for women who have had an induced abortion is the same as the risk for women who have not had an induced abortion. [RR = 1]  Alt: The risk of developing breast cancer for women who have had an induced abortion is different from the risk for women who have not had an induced abortion. [RR  1]

Chapter 2430 Case Study: Test Statistic and P-value u Relative Risk = 1.5 u Could also display data in a 2  2 table and compute the chi-square value (9.75). u The P-value (we will not compute this one, just take it from the study) is v Recall: Z = square root of the chi-square. (for 2  2 tables)

Chapter 2431 Case Study: Decision u Since the P-value is small, we reject chance as the reason for the relative risk (1.5) being different from 1.0. u We find the result to be statistically significant. u We reject the null hypothesis. The data provide evidence that the two population risks (of developing breast cancer) are not the same.

Chapter 2432 u The remainder of this chapter presents the general procedure for determining if a significant relationship exists between two categorical variables with any number of levels –how to analyze two-way tables with any number of rows and columns –results apply to special case of 2  2 tables Two-Way Table: General Procedure

Chapter 2433 Case Study Data from patients’ own assessment of their quality of life relative to what it had been before their heart attack (data from patients who survived at least a year) Health Care: Canada and U.S. Mark, D. B. et al., “Use of medical resources and quality of life after acute myocardial infarction in Canada and the United States,” New England Journal of Medicine, 331 (1994), pp

Chapter 2434 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Case Study Health Care: Canada and U.S.

Chapter 2435 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Case Study Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: We have that 75 Canadians reported feeling much better, compared to 541 Americans. The groups appear greatly different, but look at the group totals.

Chapter 2436 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: Change the counts to percents Now, with a fairer comparison using percents, the groups appear very similar in terms of feeling much better. Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% Case Study

Chapter 2437 Health Care: Canada and U.S. Is there a relationship between the explanatory variable (Country) and the response variable (Quality of life)? Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For each level of the explanatory variable (Country), look at the percents across all levels of the response variable (Quality of life). Case Study Conclude that a relationship exists if these distributions look significantly different.

Chapter 2438 u In tests for two categorical variables, we are interested in whether a relationship observed in a single sample reflects a real relationship in the population. u Hypotheses: –Null: the percentages for one variable are the same for every level of the other variable (No real relationship). –Alt: the percentages for one variable vary over levels of the other variable. (Is a real relationship). Hypothesis Test

Chapter 2439 Health Care: Canada and U.S. Null hypothesis: The percentages for one variable are the same for every level of the other variable. (No real relationship). Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For example, could look at differences in percentages between Canada and U.S. for each level of “Quality of life”: 24% vs. 25% for those who felt ‘Much better’, 23% vs. 23% for ‘Somewhat better’, etc. * Want to do all of these comparisons as one overall test… Case Study

Chapter 2440 u H 0 : no real relationship between the two categorical variables that make up the rows and columns of a two-way table u To test H 0, compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H 0 were true) –if the observed counts are far from the expected counts, that is evidence against H 0 in favor of a real relationship between the two variables Hypothesis Test

Chapter 2441 u The expected count in any cell of a two-way table (when H 0 is true) is Expected Counts

Chapter 2442 Quality of lifeCanadaUnited StatesTotal Much better Somewhat better About the same Somewhat worse Much worse Total Case Study Health Care: Canada and U.S. For the expected count of Canadians who feel ‘Much better’ (expected count for Row 1, Column 1): For the observed data to the right, find the expected value for each cell:

Chapter 2443 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts: Quality of lifeCanadaUnited States Much better Somewhat better About the same Somewhat worse Much worse Expected counts: Compare to see if the data support the null hypothesis

Chapter 2444 u To determine if the differences between the observed counts and expected counts are statistically significant (to show a real relationship between the two categorical variables), we use the chi-square statistic: Chi-Square Statistic where the sum is over all cells in the table.

Chapter 2445 u The chi-square statistic is a measure of the distance of the observed counts from the expected counts –is always zero or positive –is only zero when the observed counts are exactly equal to the expected counts –large values of X 2 are evidence against H 0 because these would show that the observed counts are far from what would be expected if H 0 were true –the chi-square test is one-sided (any violation of H 0 produces a large value of X 2 ) Chi-Square Statistic

Chapter 2446 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts CanadaUnited States Expected counts

Chapter 2447 u Calculate value of chi-square statistic –by hand (cumbersome) –using technology (computer software, etc.) u Find P-value in order to reject or fail to reject H 0 –use chi-square table for chi-square distribution (next few slides) –from computer output u If significant relationship exists (small P-value): –compare appropriate percents in data table –compare individual observed and expected cell counts –look at individual terms in the chi-square statistic Chi-Square Test

Chapter 2448 Case Study Health Care: Canada and U.S. Using Technology:

Chapter 2449 u Family of distributions that take only positive values and are skewed to the right u Specific chi-square distribution is specified by giving its degrees of freedom (formula on next slide) Chi-Square Distributions

Chapter 2450 u Chi-square test for a two-way table with r rows and c columns uses critical values from a chi-square distribution with (r  1)(c  1) degrees of freedom u P-value is the area to the right of X 2 under the density curve of the chi-square distribution –use chi-square table Chi-Square Distributions

Chapter 2451 Chi-Square Table u See Chapter 24 in text for Table 24.1

Chapter 2452 u Finding the P-value for a chi-square test: –use the row for the appropriate degrees of freedom (df) in the left margin of the table –locate the X 2 value in the body of the table –the corresponding probability of lying to the right of this value is found in the top margin of the table (this is the P-value) –usually, the observed X 2 value will be between two values in the table, and the P-value is reported as being between two values Chi-Square P-value

Chapter 2453 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. X 2 = df = (r  1)(c  1) = (5  1)(2  1) = 4 Look in the df=4 row of Table 24.1; the value X 2 = falls between the 0.05 and 0.01 critical values. Thus, the P-value for this chi-square test is between 0.01 and 0.05 (is actually ~ 0.02). ** P-value <.05, so we conclude a significant relationship ** Table 24.1

Chapter 2454 u To be significant at level  (the cut-off value), the chi-square statistic must be larger than the table entry for  –Example: for  =.05 (typical cut-off), and df = 4, the test is significant if X 2 > 9.48 v previous Case Study had X 2 = > 9.48 –Note: for 2  2 tables, df = (2  1)(2  1) = 1, and the test is significant if X 2 > 3.84 Chi-Square Critical Values

Chapter 2455 u Tests the null hypothesis H 0 : no relationship between two categorical variables when you have a two-way table from either of these situations: –Independent SRSs from each of several populations, with each individual classified according to one categorical variable [Example: Health Care case study: two samples (Canadians & Americans); each individual classified according to “Quality of life”] –A single SRS with each individual classified according to both of two categorical variables [Example: Sample of 8235 subjects, with each classified according to their “Job Grade” (1, 2, 3, or 4) and their “Marital Status” (Single, Married, Divorced, or Widowed)] Uses of the Chi-Square Test

Chapter 2456 u The chi-square test is an approximate method, and becomes more accurate as the counts in the cells of the table get larger u The following must be satisfied for the approximation to be accurate: –No more than 20% of the expected counts are less than 5 –All individual expected counts are 1 or greater u If these requirements fail, then two or more groups must be combined to form a new (‘smaller’) two-way table Chi-Square Test: Requirements

Chapter 2457 u When studying the relationship between two variables, there may exist a lurking variable that creates a reversal in the direction of the relationship when the lurking variable is ignored as opposed to the direction of the relationship when the lurking variable is considered. u The lurking variable creates subgroups, and failure to take these subgroups into consideration can lead to misleading conclusions regarding the association between the two variables. Simpson’s Paradox

Chapter 2458 Consider the acceptance rates for the following group of men and women who applied to college. Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men Women Total percents Accepted Not accepted Men55%45% Women44%56% A higher percentage of men were accepted: Discrimination?

Chapter 2459 Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men Women Total percents Accepted Not accepted Men15%85% Women20%80% A higher percentage of women were accepted in Business Lurking variable: Applications were split between the Business School (240) and the Art School (320). BUSINESS SCHOOL

Chapter 2460 Discrimination? (Simpson’s Paradox) counts Accepted Not accepted Total Men Women Total percents Accepted Not accepted Men75%25% Women80%20% ART SCHOOL A higher percentage of women were also accepted in Art Lurking variable: Applications were split between the Business School (240) and the Art School (320).

Chapter 2461 u So within each school a higher percentage of women were accepted than men. There is not any discrimination against women!!! u This is an example of Simpson’s Paradox. When the lurking variable (School applied to: Business or Art) is ignored, the data seem to suggest discrimination against women. However, when the School is considered, the association is reversed and suggests discrimination against men. Discrimination? (Simpson’s Paradox)

Chapter 2462 Key Concepts u Displaying data in a contingency (two-way) table u Concept of statistical significance with respect to a 2  2 table (chi-square critical value) u Interpreting confidence intervals for relative risk u Interpreting hypothesis tests for relative risk u Chi-square statistic and a statistically significant relationship for any two-way table u Simpson’s Paradox