Two Categorical Variables: The Chi-Square Test BPS 7e Chapter 25 © 2015 W. H. Freeman and Company
Two-Way Tables For which of the following pairs of variables would a two-way table of counts be a useful data analysis tool? age and blood pressure age and opinion on presidential performance party affiliation and opinion on presidential performance diet and growth rate
Two-Way Tables (answer) For which of the following pairs of variables would a two-way table of counts be a useful data analysis tool? age and blood pressure age and opinion on presidential performance party affiliation and opinion on presidential performance diet and growth rate
Two-Way Tables This table shows the results of a simple random sample asking respondents what types of cars they drive and whether their commuting distance is less than 20 miles. What proportion of the people in the sample drive less than 20 miles to work? 33 / 175 = 0.189 33 / 250 = 0.132 175 / 250 = 0.700 75 / 250 = 0.300 52 / 250 = 0.208
Two-Way Tables (answer) This table shows the results of a simple random sample asking respondents what types of cars they drive and whether their commuting distance is less than 20 miles. What proportion of the people in the sample drive less than 20 miles to work? 33 / 175 = 0.189 33 / 250 = 0.132 175 / 250 = 0.700 75 / 250 = 0.300 52 / 250 = 0.208
Multiple Comparisons What is the “problem of multiple comparisons” in statistics? how to sort out the general meaning of a set of comparisons how to do many comparisons with an overall measure of confidence how to decide which of a large set of comparisons to do how to compute a large number of comparisons efficiently
Multiple Comparisons (answer) What is the “problem of multiple comparisons” in statistics? how to sort out the general meaning of a set of comparisons how to do many comparisons with an overall measure of confidence how to decide which of a large set of comparisons to do how to compute a large number of comparisons efficiently
Multiple Comparisons What is the usual solution to the “problem of multiple comparisons”? Ignore the problem. Choose the one most important comparison to actually carry out. Carry out all comparisons but pay attention only to the one with the largest difference. Carry out all comparisons but pay attention only to the one with the smallest difference. Carry out a single overall test followed by a detailed follow-up analysis.
Multiple Comparisons (answer) What is the usual solution to the “problem of multiple comparisons”? Ignore the problem. Choose the one most important comparison to actually carry out. Carry out all comparisons but pay attention only to the one with the largest difference. Carry out all comparisons but pay attention only to the one with the smallest difference. Carry out a single overall test followed by a detailed follow-up analysis.
Expected Counts Expected counts are calculated assuming: Ha is true. H0 is true. neither hypothesis is true. both hypotheses are true.
Expected Counts (answer) Expected counts are calculated assuming: Ha is true. H0 is true. neither hypothesis is true. both hypotheses are true.
Expected Counts How is the expected count for a cell of a two-way table calculated? (row total × column total) / table total row total / column total (row total + column total) / table total (row total / table total) × (column total / table total)
Expected Counts (answer) How is the expected count for a cell of a two-way table calculated? (row total × column total) / table total row total / column total (row total + column total) / table total (row total / table total) × (column total / table total)
Expected Counts This table shows the results of a simple random sample asking respondents what type of car they drive and whether their commuting distance is less than 20 miles. What is the expected count for the Compact<20 cell? 44 × 61 / 250 61 × 250 / 175 44 × 17 / 250 61 × 175 / 250 (61 + 175) / 250
Expected Counts (answer) This table shows the results of a simple random sample asking respondents what type of car they drive and whether their commuting distance is less than 20 miles. What is the expected count for the Compact<20 cell? 44 × 61 / 250 61 × 250 / 175 44 × 17 / 250 61 × 175 / 250 (61 + 175) / 250
Chi-Square Test Statistic True or False: The 2 (chi-square statistic) is similar to all other test statistics we have discussed in that it compares observed values with values that would be expected if H0 were true. True False
Chi-Square Test Statistic (answer) True or False: The 2 (chi-square statistic) is similar to all other test statistics we have discussed in that it compares observed values with values that would be expected if H0 were true. True False
Chi-Square Test Statistic Which statement is NOT correct about the 2 test statistic? The test statistic is the sum of positive numbers and therefore must be positive. A small value of the test statistic would indicate evidence supporting the null hypothesis. A large value of the test statistic would be in support of the alternative hypothesis. A value close to 0 would indicate that expected counts are much different from observed counts.
Chi-Square Test Statistic (answer) Which statement is NOT correct about the 2 test statistic? The test statistic is the sum of positive numbers and therefore must be positive. A small value of the test statistic would indicate evidence supporting the null hypothesis. A large value of the test statistic would be in support of the alternative hypothesis. A value close to 0 would indicate that expected counts are much different from observed counts.
Chi-Square Test Statistic Which of the following is the formula for the chi-square statistic?
Chi-Square Test Statistic (answer) Which of the following is the formula for the chi-square statistic?
Chi-Square Test Statistic True or False: In calculating the chi-square, each term in the sum represents the “contribution” to the chi-square statistic for one of the cells in the table. In this regard, even rounding each term to two decimal places can still create round-off error in the sum. True False
Chi-Square Test Statistic (answer) True or False: In calculating the chi-square, each term in the sum represents the “contribution” to the chi-square statistic for one of the cells in the table. In this regard, even rounding each term to two decimal places can still create round-off error in the sum. True False
Chi-Square Test Statistic We can safely use the chi-square test with critical values from the chi-square distribution when no more than _____ of the expected counts are less than ___ and all individual expected counts are ___ or greater. 20%; 1; 5 15%; 1; 5 20%; 5; 1 15%; 5; 1
Chi-Square Test Statistic (answer) We can safely use the chi-square test with critical values from the chi-square distribution when no more than _____ of the expected counts are less than ___ and all individual expected counts are ___ or greater. 20%; 1; 5 15%; 1; 5 20%; 5; 1 15%; 5; 1
Cell Counts Required Here is a two-way table of counts. The expected counts are in red. The data are based on an SRS with two categorical variables. Is it safe to use the chi-square test with critical values from the chi-square distribution? yes, because all of the observed counts are 5 or greater yes, because all of the expected counts are 5 or greater no, because the expected counts are not whole numbers yes, because the sum of observed counts = sum of expected 50 51.35 36 31.65 12 14.99 16 18.34 13 11.31 6 5.36 54 50.30 25 31.01 17 14.69
Cell Counts Required (answer) Here is a two-way table of counts. The expected counts are in red. The data are based on an SRS with two categorical variables. Is it safe to use the chi-square test with critical values from the chi-square distribution? yes, because all of the observed counts are 5 or greater yes, because all of the expected counts are 5 or greater no, because the expected counts are not whole numbers yes, because the sum of observed counts = sum of expected 50 51.35 36 31.65 12 14.99 16 18.34 13 11.31 6 5.36 54 50.30 25 31.01 17 14.69
Chi-Square Tests Using a computer program, the chi-square test statistic for the commuter data is 2 = 5.37 with P-value < 0.01. If we use = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.
Chi-Square Tests (answer) Using a computer program, the chi-square test statistic for the commuter data is 2 = 5.37 with P-value < 0.01. If we use = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.
Chi-Square Tests If the chi-square test is significant, what should we do as a follow-up? Compare expected counts with observed counts in each cell. Examine the components of the chi-square test for each cell. Informally compare the conditional distributions. All of the above
Chi-Square Tests (answer) If the chi-square test is significant, what should we do as a follow-up? Compare expected counts with observed counts in each cell. Examine the components of the chi-square test for each cell. Informally compare the conditional distributions. All of the above
Chi-Square Tests Using a computer program, the chi-square test statistic for the commuter data is 2 = 15.2 with P-value < 0.08. If we use = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.
Chi-Square Tests (answer) Using a computer program, the chi-square test statistic for the commuter data is 2 = 15.2 with P-value < 0.08. If we use = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.
Uses of the Chi-Square Test Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. What type of data-collection situation was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Uses of the Chi-Square Test (answer) Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. What type of data-collection situation was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Uses of the Chi-Square Test What are the hypotheses for the chi-square test for two-way tables of counts? H0: no relationship, Ha: relationship H0: relationship, Ha: no relationship H0: strong relationship, Ha: weak relationship
Uses of the Chi-Square Test (answer) What are the hypotheses for the chi-square test for two-way tables of counts? H0: no relationship, Ha: relationship H0: relationship, Ha: no relationship H0: strong relationship, Ha: weak relationship
Uses of the Chi-Square Test Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. The chi-square test is used to test the null hypothesis: H0: no linear relationship between gender and choice. H0: no relationship between gender and choice. H0: no difference between the means of gender and choice.
Uses of the Chi-Square Test (answer) Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. The chi-square test is used to test the null hypothesis: H0: no linear relationship between gender and choice. H0: no relationship between gender and choice. H0: no difference between the means of gender and choice.
Uses of the Chi-Square Test To see if percentages of high school graduates in seven states in New England are equal, researchers took samples from each state and recorded whether the respondent was a high school graduate. What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Uses of the Chi-Square Test (answer) To see if percentages of high school graduates in seven states in New England are equal, researchers took samples from each state and recorded whether the respondent was a high school graduate. What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Uses of the Chi-Square Test When police respond to spousal-abuse calls, they can either arrest the offender, issue a citation to the offender, and/or separate the couple. A group of researchers studied whether one method was better than the others in deterring further instances of spousal abuse. When presented with an eligible case, dispatchers randomly selected the method the officers should use when responding to the call. The following data were collected: What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Uses of the Chi-Square Test (answer) When police respond to spousal-abuse calls, they can either arrest the offender, issue a citation to the offender, and/or separate the couple. A group of researchers studied whether one method was better than the others in deterring further instances of spousal abuse. When presented with an eligible case, dispatchers randomly selected the method the officers should use when responding to the call. The following data were collected: What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables
Chi-Square Distributions Which statement is NOT correct about the 2 distribution? The area under the curve is equal to 1, or 100%. It has a left-skewed shape. The values are always positive. It has different curves depending on the degrees of freedom.
Chi-Square Distributions (answer) Which statement is NOT correct about the 2 distribution? The area under the curve is equal to 1, or 100%. It has a left-skewed shape. The values are always positive. It has different curves depending on the degrees of freedom.
Chi-Square Distributions A researcher wanted to see if there was a relationship between education-attainment level and whether someone smokes or does not smoke. She classified education attainment into five categories—did not graduate from high school, high school graduate, some college, college graduate, or graduate degree—and then performed a chi-square test on the data. What were the degrees of freedom? (5) (2) = 10 (4) (2) = 8 (4) (1) = 4 It cannot be determined without the data.
Chi-Square Distributions (answer) A researcher wanted to see if there was a relationship between education-attainment level and whether someone smokes or does not smoke. She classified education attainment into five categories—did not graduate from high school, high school graduate, some college, college graduate, or graduate degree—and then performed a chi-square test on the data. What were the degrees of freedom? (5) (2) = 10 (4) (2) = 8 (4) (1) = 4 It cannot be determined without the data.