Download presentation
Presentation is loading. Please wait.
Published byScarlett Gibson Modified over 9 years ago
1
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used with any pair of variables that can be treated as categorical. The chi-square test of independence presumes that all variables are categorical, and tests for differences within individual categories of the dependent variable. Two variables are independent if the classification of a case into a particular category of the independent variable has no effect on the probability that the case will fall into any particular category of the dependent variable. When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the dependent variable to be similar for all groups. Suppose we are interested in the relationship between gender and attending college.
2
12/23/2015Slide 2 When the variables are independent, the proportion in both groups is close to the same size as the proportion for the total sample. When group membership makes a difference, the dependent relationship is indicated by one group having a higher proportion than the proportion for the total sample. If there is no relationship between gender and attending college, and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college. If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.
3
12/23/2015Slide 3 Expected frequencies are computed as if there is no difference between the groups, i.e. both groups have the same proportion as the total sample in each category of the dependent variable. Since the proportion of subjects in each category of the independent variable can differ, we take group category into account in computing expected frequencies as well. To summarize, the expected frequencies for each cell are computed to be proportional to both the breakdown for the dependent variable and the breakdown for the independent variable. The chi-square test of independence plugs the observed frequencies and expected frequencies into a formula which computes how the pattern of observed frequencies differs from the pattern of expected frequencies. Probabilities for the test statistic can be obtained from the chi-square probability distribution so that we can test hypotheses. The chi-square test of independence is a test of the influence or impact that a subject’s category on the independent variable has on the category for the dependent variable.
4
12/23/2015Slide 4 The data from “Observed Frequencies for Sample Data” is the source for information to compute the expected frequencies. Percentages are computed for the column of all students and for the row of all GPA’s. These percentages are then multiplied by the total number of students in the sample (453) to compute the expected frequency for each cell in the table.
5
12/23/2015Slide 5 The Chi-square Test of Independence can be used for any level variable, including interval level variables grouped in a frequency distribution. It is most useful for nominal variables for which we do not have another option. Sample size requirement: No cells have an expected frequency less than 5. If the sample size requirement is violated, the chi-square distribution will give us misleading probabilities. The research hypothesis states that the two variables are dependent or related in the population represented by the sample data. This will be true if the observed counts for the categories of the variables in the sample are different from the expected counts. The null hypothesis is that the two variables are independent or unrelated in the population represented by the sample data. This will be true if the observed counts in the sample are equal to the expected counts. The decision rule for the chi-square test of independence is the same as our other statistical tests: reject the null hypothesis if the probability of the test statistic is less than or equal to alpha.
6
12/23/2015Slide 6 If the probability of the test statistic is less than or equal to the probability of the alpha error rate, we reject the null hypothesis and conclude that our data supports the research hypothesis. We conclude that there is a relationship between the variables. If the probability of the test statistic is greater than the probability of the alpha error rate, we fail to reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they are independent. One of the problems in interpreting chi-square tests is the determination of which cell or cells produced the statistically significant difference. Examination of percentages in the contingency table and expected frequency table can be misleading. Researcher often try to identify try to identify which cell or cells are the major contributors to the significant chi-square test by examining the pattern of column percentages.
7
12/23/2015Slide 7 Based on the column percentages, we would identify cells on the married row and the widowed row as the ones producing the significant result because they show the largest differences: 8.2% on the married row (50.9% - 42.7%) and 9.0% on the widowed row (13.1% - 4.1%) However, the residual, or the difference, between the observed frequency and the expected frequency is a more reliable indicator, especially if the residual is converted to a z-score and compared to a critical value equivalent to the alpha for the problem. SPSS prints out the standardized residual (converted to a z-score) computed for each cell. It does not produce the probability or significance.
8
12/23/2015Slide 8 We will use the rule of thumb that we interpret standardized residuals that are greater than 2.0 (or -2.0 if the residual is negative). Returning to the relationship between sex and marital status, the table of standardized residuals would look like the following: Using standardized residuals, we would find that only the cells on the widowed row are the important contributors to the chi-square relationship between sex and marital status. If we interpreted the contribution of the married marital status, our finding would be open to question. Basing the interpretation on column percentages can be misleading.
9
12/23/2015Slide 9 There can be 0, 1, 2, or more cells with standardized residuals to be interpreted. Standardized residuals that have a positive value mean that the cell was over- represented in the actual sample, compared to the expected frequency, i.e. there were more subjects in this category than we expected. Standardized residuals that have a negative value mean that the cell was under- represented in the actual sample, compared to the expected frequency, i.e. there were fewer subjects in this category than we expected. Finally, we note that, similar to our other hypothesis tests, the interpretation of the results of the statistical test is applied to the population from which our sample is drawn. The chi-square test of homogeneity is similar to the chi-square test of independence, but is used to test for similarities or differences over time. We will not have this kind of problem for homework because I have been unable to find suitable data.
10
12/23/2015Slide 10 This is the prototype for a chi-square test of independence, with the answers to the problem filled in.
11
12/23/2015Slide 11 The first paragraph introduces the problem and points to the data in table 1.
12
12/23/2015Slide 12 The second paragraph asks about the number of valid cases and the sample size requirement for the chi-square test. To answer these questions, we will compute the chi-square test of independence in SPSS.
13
12/23/2015Slide 13 To compute the chi-square test of independence, select the Descriptive Statistics > Crosstabs command from the Analyze menu.
14
12/23/2015Slide 14 First, move the dependent variable degree to the Row list box. Second, move the independent variable sex to the Column(s) list box. Third, click on the Statistics button to request the chi-square test.
15
12/23/2015Slide 15 First, mark the check box for the Chi- square statistic. Second, click on the Continue button to close the dialog box.
16
12/23/2015Slide 16 Click on the Cells button to request percents and standardized residuals.
17
12/23/2015Slide 17 Second, mark the check box for Standardized residuals, which we will use to evaluate individual relationships. Third, click on the Continue button to close the dialog box. First, mark the check box for Column percentages, which we will use in the table for the problem.
18
12/23/2015Slide 18 Click on the OK button to produce the output.
19
12/23/2015Slide 19 This is the SPSS output for a chi-square test of independence. The case processing summary tells us the number of cases that had valid data for both variables. Each cell of the contingency table contains the count of cases with the combined characteristics, the percent of cases within each category of the independent variable, and the standardized residual for that cell.
20
12/23/2015Slide 20 SPSS labels the chi-square test that we interpret as the Pearson Chi-Square test. The information that we need for evaluating sample size is contain in a footnote to the table of Chi-Square Tests.
21
12/23/2015Slide 21 Before answering the sample size issues, we complete Table 1 by extracting the counts and percents from Crosstabulation table.
22
12/23/2015Slide 22 Returning to the first question about the number of valid cases, we enter the number form the Case Processing Summary.
23
12/23/2015Slide 23 The critical piece of information about sample size is found in the footnote to the Test Statistics table. 0 cells had an expected frequency less than 5.
24
12/23/2015Slide 24
25
12/23/2015Slide 25 The first sentence in the next paragraph addresses the statistical significance of the chi-square test of the relationship between the two variables.
26
12/23/2015Slide 26 The APA style for reporting a chi-square test is: χ²(degrees of freedom, N=number of cases) = statistical value, p = p value Enter the number of valid cases down in the narrative. Transfer the degrees of freedom to the first blank after χ².
27
12/23/2015Slide 27 Transfer the value for the chi-square statistic to the narrative. Transfer the p-value to the last blank. Select the “=“ symbol from the drop-down menu.
28
12/23/2015Slide 28 Since the p of 0.019 was less than alpha of 0.05, the relationship between the variables was statistically significant.
29
12/23/2015Slide 29 The validity of this statement is based on two requirements: 1)the standardized residual in the target cell (males with graduate degrees) must be a major contributor to the chi- square relationship, i.e. had a standardized residual equal to or less than -2.0 or greater than or equal to +2.0. If the standardized residual is between -2.0 and +2.0, the answer is that they were equally likely (no statistical difference) 2)the direction of the difference (more likely or less likely must match the percentages in Table 1. The final question asks about a specific relationship one category of the independent variable (males) and one category of the dependent variable (graduate degrees) to the other category or categories of the independent variable (females).
30
12/23/2015Slide 30 Survey respondents who were male and who had a graduate degree were over-represented in the table. The standardized residual for the cell for survey respondents who were male and who had a graduate degree was 2.3, which was greater than the criteria of 2.0 commonly used to identify the major contributors to the differences between actual and expected frequencies in the contingency table. Since the sign of the standardized residual was a plus, characterizing the contribution as an over representation (more than expected) is correct.
31
12/23/2015Slide 31 The percentages are copied from table 1 into the problem narrative. Males were more likely to have graduate degrees (14.7%) compared to females (3.8%).
32
12/23/2015Slide 32 Were more likely satisfies both requirements for asserting a finding. The standardized residual for the males/graduate degree cell was larger than 2.0 and the percentage of males was larger than the percentage of females.
33
12/23/2015Slide 33 The green shading of the answers when we submit the problem indicate that our answers are correct.
34
12/23/2015Slide 34 Example of using na: violating sample size requirement If we violate the sample size requirement that there be 5 or more cases in each cell, all of the questions in the following paragraph are na.
35
12/23/2015Slide 35 Example of using na: statistical test not significant If the p-value is greater than alpha, we do not reject the null hypothesis and all of the remaining answers are na.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.