Two Categorical Variables: The Chi-Square Test

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 13: Inference for Distributions of Categorical Data
Chapter 9 Hypothesis Testing.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Goodness-of-Fit Tests and Categorical Data Analysis
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
© Copyright McGraw-Hill 2004
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Comparing Counts Chi Square Tests Independence.
11.1 Chi-Square Tests for Goodness of Fit
22. Chi-square test for two-way tables
Test of independence: Contingency Table
Comparing Two Proportions
Jeopardy Vocabulary Formulas Q $100 Q $100 Q $100 Q $100 Q $100 Q $200
CHAPTER 26 Comparing Counts.
Chi-Square hypothesis testing
Warm Up Check your understanding on p You do NOT need to calculate ALL the expected values by hand but you need to do at least 2. You do NOT need.
Presentation 12 Chi-Square test.
CHAPTER 11 Inference for Distributions of Categorical Data
Objectives (PSLS Chapter 22)
Objectives (BPS chapter 23)
Introduction The two-sample z procedures of Chapter 10 allow us to compare the proportions of successes in two populations or for two treatments. What.
Inference and Tests of Hypotheses
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 25 Comparing Counts.
22. Chi-square test for two-way tables
One-Way Analysis of Variance: Comparing Several Means
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chi Square Two-way Tables
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
Statistical Inference about Regression
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Inference for Relationships
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Lesson 11 - R Chapter 11 Review:
Analyzing the Association Between Categorical Variables
Hypothesis Tests for a Standard Deviation
Chapter 26 Comparing Counts.
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 9 Analysis of Two-Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Analysis of two-way tables
Presentation transcript:

Two Categorical Variables: The Chi-Square Test BPS 7e Chapter 25 © 2015 W. H. Freeman and Company

Two-Way Tables For which of the following pairs of variables would a two-way table of counts be a useful data analysis tool? age and blood pressure age and opinion on presidential performance party affiliation and opinion on presidential performance diet and growth rate

Two-Way Tables (answer) For which of the following pairs of variables would a two-way table of counts be a useful data analysis tool? age and blood pressure age and opinion on presidential performance party affiliation and opinion on presidential performance diet and growth rate

Two-Way Tables This table shows the results of a simple random sample asking respondents what types of cars they drive and whether their commuting distance is less than 20 miles. What proportion of the people in the sample drive less than 20 miles to work? 33 / 175 = 0.189 33 / 250 = 0.132 175 / 250 = 0.700 75 / 250 = 0.300 52 / 250 = 0.208

Two-Way Tables (answer) This table shows the results of a simple random sample asking respondents what types of cars they drive and whether their commuting distance is less than 20 miles. What proportion of the people in the sample drive less than 20 miles to work? 33 / 175 = 0.189 33 / 250 = 0.132 175 / 250 = 0.700 75 / 250 = 0.300 52 / 250 = 0.208

Multiple Comparisons What is the “problem of multiple comparisons” in statistics? how to sort out the general meaning of a set of comparisons how to do many comparisons with an overall measure of confidence how to decide which of a large set of comparisons to do how to compute a large number of comparisons efficiently

Multiple Comparisons (answer) What is the “problem of multiple comparisons” in statistics? how to sort out the general meaning of a set of comparisons how to do many comparisons with an overall measure of confidence how to decide which of a large set of comparisons to do how to compute a large number of comparisons efficiently

Multiple Comparisons What is the usual solution to the “problem of multiple comparisons”? Ignore the problem. Choose the one most important comparison to actually carry out. Carry out all comparisons but pay attention only to the one with the largest difference. Carry out all comparisons but pay attention only to the one with the smallest difference. Carry out a single overall test followed by a detailed follow-up analysis.

Multiple Comparisons (answer) What is the usual solution to the “problem of multiple comparisons”? Ignore the problem. Choose the one most important comparison to actually carry out. Carry out all comparisons but pay attention only to the one with the largest difference. Carry out all comparisons but pay attention only to the one with the smallest difference. Carry out a single overall test followed by a detailed follow-up analysis.

Expected Counts Expected counts are calculated assuming: Ha is true. H0 is true. neither hypothesis is true. both hypotheses are true.

Expected Counts (answer) Expected counts are calculated assuming: Ha is true. H0 is true. neither hypothesis is true. both hypotheses are true.

Expected Counts How is the expected count for a cell of a two-way table calculated? (row total × column total) / table total row total / column total (row total + column total) / table total (row total / table total) × (column total / table total)

Expected Counts (answer) How is the expected count for a cell of a two-way table calculated? (row total × column total) / table total row total / column total (row total + column total) / table total (row total / table total) × (column total / table total)

Expected Counts This table shows the results of a simple random sample asking respondents what type of car they drive and whether their commuting distance is less than 20 miles. What is the expected count for the Compact<20 cell? 44 × 61 / 250 61 × 250 / 175 44 × 17 / 250 61 × 175 / 250 (61 + 175) / 250

Expected Counts (answer) This table shows the results of a simple random sample asking respondents what type of car they drive and whether their commuting distance is less than 20 miles. What is the expected count for the Compact<20 cell? 44 × 61 / 250 61 × 250 / 175 44 × 17 / 250 61 × 175 / 250 (61 + 175) / 250

Chi-Square Test Statistic True or False: The 2 (chi-square statistic) is similar to all other test statistics we have discussed in that it compares observed values with values that would be expected if H0 were true. True False

Chi-Square Test Statistic (answer) True or False: The 2 (chi-square statistic) is similar to all other test statistics we have discussed in that it compares observed values with values that would be expected if H0 were true. True False

Chi-Square Test Statistic Which statement is NOT correct about the 2 test statistic? The test statistic is the sum of positive numbers and therefore must be positive. A small value of the test statistic would indicate evidence supporting the null hypothesis. A large value of the test statistic would be in support of the alternative hypothesis. A value close to 0 would indicate that expected counts are much different from observed counts.

Chi-Square Test Statistic (answer) Which statement is NOT correct about the 2 test statistic? The test statistic is the sum of positive numbers and therefore must be positive. A small value of the test statistic would indicate evidence supporting the null hypothesis. A large value of the test statistic would be in support of the alternative hypothesis. A value close to 0 would indicate that expected counts are much different from observed counts.

Chi-Square Test Statistic Which of the following is the formula for the chi-square statistic?

Chi-Square Test Statistic (answer) Which of the following is the formula for the chi-square statistic?

Chi-Square Test Statistic True or False: In calculating the chi-square, each term in the sum represents the “contribution” to the chi-square statistic for one of the cells in the table. In this regard, even rounding each term to two decimal places can still create round-off error in the sum. True False

Chi-Square Test Statistic (answer) True or False: In calculating the chi-square, each term in the sum represents the “contribution” to the chi-square statistic for one of the cells in the table. In this regard, even rounding each term to two decimal places can still create round-off error in the sum. True False

Chi-Square Test Statistic We can safely use the chi-square test with critical values from the chi-square distribution when no more than _____ of the expected counts are less than ___ and all individual expected counts are ___ or greater. 20%; 1; 5 15%; 1; 5 20%; 5; 1 15%; 5; 1

Chi-Square Test Statistic (answer) We can safely use the chi-square test with critical values from the chi-square distribution when no more than _____ of the expected counts are less than ___ and all individual expected counts are ___ or greater. 20%; 1; 5 15%; 1; 5 20%; 5; 1 15%; 5; 1

Cell Counts Required Here is a two-way table of counts. The expected counts are in red. The data are based on an SRS with two categorical variables. Is it safe to use the chi-square test with critical values from the chi-square distribution? yes, because all of the observed counts are 5 or greater yes, because all of the expected counts are 5 or greater no, because the expected counts are not whole numbers yes, because the sum of observed counts = sum of expected 50 51.35 36 31.65 12 14.99 16 18.34 13 11.31 6 5.36 54 50.30 25 31.01 17 14.69

Cell Counts Required (answer) Here is a two-way table of counts. The expected counts are in red. The data are based on an SRS with two categorical variables. Is it safe to use the chi-square test with critical values from the chi-square distribution? yes, because all of the observed counts are 5 or greater yes, because all of the expected counts are 5 or greater no, because the expected counts are not whole numbers yes, because the sum of observed counts = sum of expected 50 51.35 36 31.65 12 14.99 16 18.34 13 11.31 6 5.36 54 50.30 25 31.01 17 14.69

Chi-Square Tests Using a computer program, the chi-square test statistic for the commuter data is 2 = 5.37 with P-value < 0.01. If we use  = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.

Chi-Square Tests (answer) Using a computer program, the chi-square test statistic for the commuter data is 2 = 5.37 with P-value < 0.01. If we use  = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.

Chi-Square Tests If the chi-square test is significant, what should we do as a follow-up? Compare expected counts with observed counts in each cell. Examine the components of the chi-square test for each cell. Informally compare the conditional distributions. All of the above

Chi-Square Tests (answer) If the chi-square test is significant, what should we do as a follow-up? Compare expected counts with observed counts in each cell. Examine the components of the chi-square test for each cell. Informally compare the conditional distributions. All of the above

Chi-Square Tests Using a computer program, the chi-square test statistic for the commuter data is 2 = 15.2 with P-value < 0.08. If we use  = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.

Chi-Square Tests (answer) Using a computer program, the chi-square test statistic for the commuter data is 2 = 15.2 with P-value < 0.08. If we use  = 0.05, what conclusion is appropriate? Don’t reject H0; the results are significant. Don’t reject H0; the results are not significant. Reject H0; the results are significant. Reject H0; the results are not significant.

Uses of the Chi-Square Test Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. What type of data-collection situation was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Uses of the Chi-Square Test (answer) Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. What type of data-collection situation was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Uses of the Chi-Square Test What are the hypotheses for the chi-square test for two-way tables of counts? H0: no relationship, Ha: relationship H0: relationship, Ha: no relationship H0: strong relationship, Ha: weak relationship

Uses of the Chi-Square Test (answer) What are the hypotheses for the chi-square test for two-way tables of counts? H0: no relationship, Ha: relationship H0: relationship, Ha: no relationship H0: strong relationship, Ha: weak relationship

Uses of the Chi-Square Test Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. The chi-square test is used to test the null hypothesis: H0: no linear relationship between gender and choice. H0: no relationship between gender and choice. H0: no difference between the means of gender and choice.

Uses of the Chi-Square Test (answer) Students in a simple random sample of sixth-graders were asked which of three choices is most important to them: making good grades, being popular, or participating in sports. Also recorded was the students’ gender. The chi-square test is used to test the null hypothesis: H0: no linear relationship between gender and choice. H0: no relationship between gender and choice. H0: no difference between the means of gender and choice.

Uses of the Chi-Square Test To see if percentages of high school graduates in seven states in New England are equal, researchers took samples from each state and recorded whether the respondent was a high school graduate. What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Uses of the Chi-Square Test (answer) To see if percentages of high school graduates in seven states in New England are equal, researchers took samples from each state and recorded whether the respondent was a high school graduate. What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Uses of the Chi-Square Test When police respond to spousal-abuse calls, they can either arrest the offender, issue a citation to the offender, and/or separate the couple. A group of researchers studied whether one method was better than the others in deterring further instances of spousal abuse. When presented with an eligible case, dispatchers randomly selected the method the officers should use when responding to the call. The following data were collected: What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Uses of the Chi-Square Test (answer) When police respond to spousal-abuse calls, they can either arrest the offender, issue a citation to the offender, and/or separate the couple. A group of researchers studied whether one method was better than the others in deterring further instances of spousal abuse. When presented with an eligible case, dispatchers randomly selected the method the officers should use when responding to the call. The following data were collected: What type of data-collection system was this? independent SRSs with individuals classified by one categorical variable single SRS with individuals classified by two categorical variables

Chi-Square Distributions Which statement is NOT correct about the 2 distribution? The area under the curve is equal to 1, or 100%. It has a left-skewed shape. The values are always positive. It has different curves depending on the degrees of freedom.

Chi-Square Distributions (answer) Which statement is NOT correct about the 2 distribution? The area under the curve is equal to 1, or 100%. It has a left-skewed shape. The values are always positive. It has different curves depending on the degrees of freedom.

Chi-Square Distributions A researcher wanted to see if there was a relationship between education-attainment level and whether someone smokes or does not smoke. She classified education attainment into five categories—did not graduate from high school, high school graduate, some college, college graduate, or graduate degree—and then performed a chi-square test on the data. What were the degrees of freedom? (5) (2) = 10 (4) (2) = 8 (4) (1) = 4 It cannot be determined without the data.

Chi-Square Distributions (answer) A researcher wanted to see if there was a relationship between education-attainment level and whether someone smokes or does not smoke. She classified education attainment into five categories—did not graduate from high school, high school graduate, some college, college graduate, or graduate degree—and then performed a chi-square test on the data. What were the degrees of freedom? (5) (2) = 10 (4) (2) = 8 (4) (1) = 4 It cannot be determined without the data.