Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?

Slides:



Advertisements
Similar presentations
Chi-square test or c2 test
Advertisements

Chapter 11 Other Chi-Squared Tests
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
AP Statistics Tuesday, 15 April 2014 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi-square test for independence;
Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Homogeneity.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Chi-square Goodness of Fit Test
Chi-Square Analysis Test of Independence. We will now apply the principles of Chi-Square analysis to determine if two variables are independent of one.
Presentation 12 Chi-Square test.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-square test or c2 test
Chi-square test Chi-square test or  2 test Notes: Page Goodness of Fit 2.Independence 3.Homogeneity.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Other Chi-Square Tests
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
+ Chi Square Test Homogeneity or Independence( Association)
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Inference for Tables Chi-Square Test of Independence The chi-square(  2 ) test for independence/association test if the variables in a two-way table.
Chapter 11 Chi Square Distribution and Its applications.
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
The χ 2 (Chi-Squared) Test. Crazy Dice? You roll a die 60 times and get: 3 ones, 6 twos, 19 threes, 22 fours, 6 fives, and 4 sixes  Is this a fair die?
AP Statistics Tuesday, 05 April 2016 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi- square test for independence;
Inference for Tables Chi-Square Tests Chi-Square Test Basics Formula for test statistic: Conditions: Data is from a random sample/event. All individual.
12.2 Tests for Homogeneity and Independence in a two-way table Wednesday, June 22, 2016.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Goodness-of-Fit and Contingency Tables Chapter 11.
Chapter 12 Lesson 12.2b Comparing Two Populations or Treatments 12.2: Test for Homogeneity and Independence in a Two-way Table.
Comparing Counts Chi Square Tests Independence.
Chi-square test or χ2 test
CHAPTER 26 Comparing Counts.
Chi-square test or c2 test
Chi-squared test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
Chapter 25 Comparing Counts.
Chi-square test or c2 test
Chi-square test or c2 test
Contingency Tables: Independence and Homogeneity
15.1 Goodness-of-Fit Tests
Chi-square test or c2 test
Chapter 26 Comparing Counts.
Inference for Tables Chi-Square Tests Ch. 11.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Chi-square test or c2 test
Presentation transcript:

Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?

 2 test for homogeneity single categorical two (or more) independent samplesUsed with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same (homogeneous) Several groups but STILL ONE VARIABLE

Assumptions & formula remain the same! Samples are from a random sampling All expected counts are greater than 5

Hypotheses – written in words H 0 : the proportions for the two (or more) distributions are the same H a : At least one of the proportions for the distributions is different Be sure to write in context!

Expected Counts Assuming H 0 is true,

Degrees of freedom Or cover up one row & one column & count the number of cells remaining!

Should Dentist Advertise? It may seem hard to believe but until the 1970’s most professional organizations prohibited their members from advertising. In 1977, the U.S. Supreme Court ruled that prohibiting doctors and lawyers from advertising violated their free speech rights. Why do you think professional organizations sought to prohibit their members from advertising?

Should Dentist Advertise? The paper “Should Dentist Advertise?” (J. of Advertising Research (June 1982): 33 – 38) compared the attitudes of consumers and dentists toward the advertising of dental services. Separate samples of 101 consumers and 124 dentists were asked to respond to the following statement: “I favor the use of advertising by dentists to attract new patients.”

Should Dentist Advertise? Possible responses were: strongly agree, neutral, disagree, strongly disagree. The authors were interested in determining whether the two groups— dentists and consumers—differed in their attitudes toward advertising.

Should Dentist Advertise? This is a done by a chi-squared test of homogeneity, that is we are testing the claim that different populations have the same ratio across some second variable characteristic. So how should we state the null and alternative hypotheses for this test?

Should Dentist Advertise? H 0 : H a : The true category proportions for all responses are the same for both populations of consumers and dentists. The true category proportions for all responses are not the same for both populations of consumers and dentists.

Observed Data How do we determine the expected cell count under the assumption of homogeneity? The expected cell counts are estimated from the sample data (assuming that H 0 is true) by using …

Expected Values So the calculation for the first cell is …

Observed Data Students on the right side of the classroom finish the first row and the left side find the expected values for the dentists

Conditions So now we can consider the conditions of our analysis. –We will assume the data was randomly selected. –The sample was large enough because every cell in the contingency table had an expected frequency of at least 5.

Test Statistic Now we can calculate the  2 test statistic:

Sampling Distribution The two-way table for this situation has 2 rows and 5 columns, so the appropriate degrees of freedom is (2 – 1)(5 – 1) = 4. Since the likelihood of seeing such a large amount of difference between the observed frequencies and what we would expected to have seen if the two populations were homogeneous is so small (approx 0), there is strong evidence against the assumption that the proportions in the response categories are the same for the populations of consumers and dentists.

Post-graduation activities of graduates from an upstate NY high school Total College/post HS education Employment Military Travel Total Have what kids do after graduation changed across three graduating classes?

Could test whether two proportions are the same using a two-proportion z test…. but we have 3 groups. Chi-square goodness-of-fit tests against given proportions (theoretical models) …. but we want to know if choices have changed. So… we’ll use a chi-square test of homogeneity. Homogeneity means that things are the same so we have a built-in null hypothesis – the distribution does not change from group to group. This test looks for differences too large from what we might expect from random sample-to-sample variation.

Total College/post HS education 320 (365.2) 245 (233.8) 288 (253.9) 853 Employment98 (59.5) 24 (38.1) 17 (41.4) 139 Military18 (17.98) 19 (11.5) 5 (12.5) 42 Travel17 (10.3) 2 (6.6) 5 (7.1) 24 Total

H o : The post-high school choices made by classes of 1980, 1990, 2000 have the same distributions H a : The post-high school choices made by classes of 1980, 1990, 2000 do not have the same distributions Conditions: * categorical data with counts * expected values are all at least 5 Degrees of freedom: (R – 1)(C – 1) = 3 * 2 = 6 Test statistic:

= P-value = P(x 2 > 72.77) < The P-value is very small, so I reject the null hypothesis and conclude there is evidence that the choices made by high-school graduates have changed over the three classes examined.

When we reject the null hypothesis, it’s a good idea to examine residuals. To standardize the residuals: College/post HS education Employment Military Travel What can this show us?

The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)

Assumptions: Have 2 random sample of students All expected counts are greater than 5. H 0 : the proportions of drinking behaviors is the same for female & male students H a : at least one of the proportions of drinking behavior is different for female & male students P-value =.000df = 3  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students. Expected Counts: M F L M H

 2 test for Independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent) or not associated (independent) One sample but two variables

Hypotheses – written in words H 0 : two variables are independent H a : two variables are dependent Be sure to write in context!

Assumptions & formula remain the same! Expected counts & df are found the same way as test for homogeneity. Only Only change is the hypotheses!

A study from the University of Texas Southwestern Medical Center examined whether the risk of hepatitis C was related to whether people had tattoos and to where they got their tattoos. Hepatitis CNo Hepatitis CTotal Tattoo, parlor Tattoo, elsewhere None Total Data differs from other kinds because they categorize subjects from a single group on two categorical variable rather than on only one.

Is the chance of having hepatitis C independent of tattoo status? If hepatitis status is independent of tattoos, we expect the proportion of people testing positive for hepatitis to be the same for the three levels of tattoo status. Are the categorical variables tattoo status and hepatitis statistically independent? A chi-square test for independence

H o : Tattoo status and hepatitis status are independent H a : Tattoo status and hepatitis status are not independent Conditions: * categorical data with counts * expected values are all at least 5 Degrees of freedom: (R – 1)(C – 1) = 2 * 1 = 2 Test statistic:

Hepatitis CNo Hepatitis CTotal Tattoo, parlor17 (3.904) 35 (48.096) 52 Tattoo, elsewhere 8 (4.580) 53 (56.420) 61 None22 (38.516) 491 ( ) 513 Total = P-value = P(x 2 > 57.91) <

The p-value is very small, so I reject the null hypothesis and conclude that hepatitis status is not independent of tattoo status. Because the expected cell frequency condition was violated, I need to check that the two cells with small expected counts did not influence this result too greatly.

Whenever we reject the null hypothesis, it’s a good idea to examine the residuals. Since counts may be different for cells, we are better off standardizing the residuals. To standardize a cell’s residuals, divide by the square root of its expected value. The + and the – sign indicate whether we observed more cases than we expected, or fewer.

Hepatitis CNo Hepatitis C Tattoo, parlor Tattoo, elsewhere None Examining the residuals: largest component: Hepatitis C/Tattoo parlor – suggest that a principal source of infection may be tattoo parlors second largest component: Hepatitis C/no tattoo – those who have no tattoos are less likely to be infected with hepatitis C than we might expect if the two variables are independent

A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

If beef preference is independent of geographic region, how would we expect this table to be filled in? NorthSouthTotal Cut A150 Cut B275 Cut C75 Total

Now suppose that in the actual sample of 500 consumers the observed numbers were as follows: (on your paper) Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?)

Assumptions: Have a random sample of people All expected counts are greater than 5. H 0 : geographic region and beef preference are independent H a : geographic region and beef preference are dependent P-value =.0226df = 2  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that geographic region and beef preference are dependent. Expected Counts: N S A90 60 B C45 30