Chi-square test or c2 test
Chi-square test Used to test the counts of categorical data Three types Goodness of fit Independence Homogeneity
c2 distribution – df=3 df=5 df=10
c2 distribution Different df have different curves Skewed right Only positive values As df increases, curve shifts toward right & becomes more like a normal curve
c2 Goodness of fit test Want to see how well the observed counts “fit” what we expect the counts to be
c2 Goodness of fit test Explain the parameters. State the Hypotheses Null Hypothesis: H0: p1 = hypothesized proportion for category 1 and p2 = hypothesized proportion for category 2 and … i.e. The actual population distribution is equal to the expected distribution. Alternative Hypothesis: Ha: H0 is not true. i.e. The actual population distribution is different from the expected distribution. Conditions: 1. Observed cell counts are based on a random sample. 2. The sample size is large. The sample size is large enough for the chi-squared test to be appropriate as long as every expected count is at least 5.
c2 Goodness of fit test Test Statistic: Degrees of freedom = Number of categories -1 Write the decision and conclusion.
c2 Goodness of fit test Example Last year, at the 6pm time slot, television channels 2, 11, 13 and 26 captured the entire audience with 30%, 25%, 20% and 25% respectively. During the first week of the new season, 500 viewers are interviewed with the results below. Has the preference changed from last season? Channel 2 11 13 26 Viewers 129 148 112 111 Parameters p1 = true proportion of channel 2 viewers p2 = true proportion of channel 11 viewers p3 = true proportion of channel 13 viewers p4 = true proportion of channel 26 viewers
c2 Goodness of fit test Example Last year, at the 6pm time slot, television channels 2, 11, 13 and 26 captured the entire audience with 30%, 25%, 20% and 25% respectively. During the first week of the new season, 500 viewers are interviewed with the results below. Has the preference changed from last season? Channel 2 11 13 26 Viewers 129 148 112 111 Hypothesis Ho : p1 = 0.30 p2 = 0.25 p3 = 0.20 p4 = 0.25 Ha: At least one of the proportions is not as expected
c2 Goodness of fit test Example Conditions The sample should be random which I will assume. 2) The sample size should be large. Channel 2 11 13 26 Expected 150 125 100 observed 129 148 112 111 Since all expected counts are greater than 5 the sample is large enough. df = 3 p-value = = .05
c2 Goodness of fit test Example Decision Decision Since the p-value < , I reject the null hypothesis at the .05 level. Since the p-value < , I reject the null hypothesis at the .05 level. Conclusion There is evidence to conclude that the viewing preference for the 6 pm news has changed.
c2 test for independence Used to see if the two categorical variables are associated or not associated (independent)
c2 test for independence State the Hypotheses Null Hypothesis: H0: The two variables are independent (or not associated) Alternative Hypothesis: Ha: The two variables are not independent (or associated) Conditions: 1) A random sample is taken from one large population. 2) The sample size is large - all expected cell counts are at least 5 3) Each outcome can be classified into one of several categories on one variable and into one of several categories on a second variable.
c2 test for independence Test Statistic: expected cell count = df = (# of rows -1)(# of column -1) Write the decision and conclusion.
c2 test for independence A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the south preferences were as in the table. Is beef preference independent of geographic region? Geographic Region North South Cut A 100 50 Cut B 150 125 Cut C 25 Beef Preference Hypothesis Ho : Beef preference is independent of geographic region Ha: Beef preference is not independent of geographic region
c2 test for independence A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the south preferences were as in the table. Is beef preference independent of geographic region? Geographic Region North South Cut A 100 50 Cut B 150 125 Cut C 25 Remember : exp count = (90) (60) Beef Preference (165) (110) (45) (30) Conditions: 1) The sample is random which is stated in the problem. 2) The sample size should be large. All expected cell counts are at least 5 as shown in the table 3) Each outcome can be classified by region and cut.
c2 test for independence Geographic Region North South Cut A 100 50 Cut B 150 125 Cut C 25 (90) (60) Beef Preference (165) (110) Enter observed counts in Matrix A (45) (30) df = 2 p-value = = .05
Decision Since the p-value < , I reject the null hypothesis at the .05 level. Conclusion There is evidence to conclude that beef preference is not independent of geographic region.
c2 test for homogeneity Used to see if the two populations are the same (homogeneous) Are the proportion of the different outcomes in one population equal to those in another population?
c2 test for homogeneity State the Hypotheses Conditions: Null Hypothesis: H0: The true category proportions are the same for all the populations Alternative Hypothesis: Ha: The true category proportions are not the same for all the populations Conditions: Independent random samples of fixed sizes are taken from two or more large OR two or more treatments are randomly assigned to two or more types of available subjects 2) Each outcome falls into exactly one of several categories, with the categories being the same in all populations. 3) The sample size is large - all expected cell counts are at least 5
c2 test for homogeneity Test Statistic: expected cell count = df = (# of rows -1)(# of column -1) Write the decision and conclusion.
In July 1991 and again in April 2001, the Gallup Poll asked random samples of 1015 adults about their opinions on working parents. The table summarizes responses to the question, “Considering the needs of both parents and children, which of the following so you see as the ideal family in today’s society? Based on these results, do you think there was a change in people’s attitudes during the 10 years between these polls? Use = 0.02 1991 2001 Both work full time 142 131 One works full time, other part time 274 244 One works, other works at home 152 173 One works, other stays home for kids 396 416 No opinion 51 Hypotheses Ho : The proportion of adults who believe which type of family is “ideal” was not different in 1991 and 2001. Ha: The proportion of adults who believe which type of family is “ideal” was different in 1991 and 2001.
In July 1991 and again in April 2001, the Gallup Poll asked random samples of 1015 adults about their opinions on working parents. The table summarizes responses to the question, “Considering the needs of both parents and children, which of the following so you see as the ideal family in today’s society? Based on these results, do you think there was a change in people’s attitudes during the 10 years between these polls? Use = 0.02 1991 2001 Both work full time 142 131 One works full time, other part time 274 244 One works, other works at home 152 173 One works, other stays home for kids 396 416 No opinion 51 (136.5) (136.5) (259) (259) (162.5) (162.5) (406) (406) (51) (51) Conditions: The sample should be random which is stated and independent which I will assume. 2) Each opinion falls into one type of “ideal family” category for both 1991 and 2001. 3) The sample size should be large. All expected cell counts are at least 5 as shown in the table
df = 4 p-value = = .02 1991 2001 Both work full time 142 131 One works full time, other part time 274 244 One works, other works at home 152 173 One works, other stays home for kids 396 416 No opinion 51 (136.5) (136.5) (259) (259) (162.5) (162.5) (406) (406) (51) (51) df = 4 p-value = = .02
Decision Since the p-value > , I fail to reject the null hypothesis at the .02 level. Conclusion There is not sufficient evidence to conclude that the proportion of adults who believed in what type of family is “ideal” was different in 1991 and 2001.