Chi-square (χ 2 ) Fenster
Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for ordinal level data). Tests of Statistical Significance for Nominal Level Data (Note: can also be used for ordinal level data).
Chi-Square Chi-Square is an elegant and beautiful test Chi-Square is an elegant and beautiful test The assumptions required to use the test are very weak. That is to say, we do not have to make many assumptions about how the data are distributed. The assumptions required to use the test are very weak. That is to say, we do not have to make many assumptions about how the data are distributed.
Chi-Square We ask the following question- Are the frequencies empirically obtained (by this we mean OBSERVED) significantly different from those, which would have been EXPECTED under some general set of assumptions: We ask the following question- Are the frequencies empirically obtained (by this we mean OBSERVED) significantly different from those, which would have been EXPECTED under some general set of assumptions:
Chi-Square Assumptions to use Chi-Square Test: Assumptions to use Chi-Square Test: Samples are randomly selected from the population. Samples are randomly selected from the population. EXPECTED frequencies (to be defined later) are greater than 5 in every cell. But even this assumption can be modified with the use of Yates' correction. WE DO NOT NEED TO ASSUME NORMALITY!!!! EXPECTED frequencies (to be defined later) are greater than 5 in every cell. But even this assumption can be modified with the use of Yates' correction. WE DO NOT NEED TO ASSUME NORMALITY!!!!
Chi-Square This may not surprise you. After all, the concept of a normal distribution has no meaning for nominal level data and chi- square is a test for nominal level data. Chi- square is so popular because of this weak set of assumptions. This may not surprise you. After all, the concept of a normal distribution has no meaning for nominal level data and chi- square is a test for nominal level data. Chi- square is so popular because of this weak set of assumptions.
Chi-Square H o for chi-square: If there were no relationship between the dependent and independent variable, the column percentages will not change as we move across levels of the INDEPENDENT variable. H o for chi-square: If there were no relationship between the dependent and independent variable, the column percentages will not change as we move across levels of the INDEPENDENT variable. Note: We covered this earlier in the course. We said we had no relationship between two variables if the column percentages do not change across the independent variables. Note: We covered this earlier in the course. We said we had no relationship between two variables if the column percentages do not change across the independent variables.
Chi-Square We can compute an EXPECTED set of frequencies from the MARGINAL totals of the dependent and independent variables. We can compute an EXPECTED set of frequencies from the MARGINAL totals of the dependent and independent variables. To calculate EXPECTED frequencies we take the row total multiplied by the column total and divide by the grand total To calculate EXPECTED frequencies we take the row total multiplied by the column total and divide by the grand total
Chi-Square Expected frequencies= (Row total) X (Column total) Expected frequencies= (Row total) X (Column total) Grand Total Grand Total OBSERVED Frequencies are those frequencies that are empirically obtained. OBSERVED Frequencies are those frequencies that are empirically obtained. Those are the frequencies that are given to us. Those are the frequencies that are given to us.
Chi-Square Chi-Square= Σ(Observed frequencies- Expected Frequencies) 2 Chi-Square= Σ(Observed frequencies- Expected Frequencies) 2 Expected Frequencies Expected Frequencies
Chi-Square Usually this formula is written Usually this formula is written Chi-Square = Σ (O - E) 2 Chi-Square = Σ (O - E) 2 E E The larger the difference between observed and expected frequencies the larger the value for χ 2. The larger the difference between observed and expected frequencies the larger the value for χ 2.
Chi-Square If you look at a chi-square table, you will see many different χ 2 distributions. If you look at a chi-square table, you will see many different χ 2 distributions. Which one should you use? Which one should you use? You use the χ 2 distribution with the appropriate number of degrees of freedom. You use the χ 2 distribution with the appropriate number of degrees of freedom. For χ 2 degrees of freedom are given with the following formula: df= (r-1) X (c-1) For χ 2 degrees of freedom are given with the following formula: df= (r-1) X (c-1)
Chi-Square That is to say That is to say (1) we take the number of rows we have and subtract one. (1) we take the number of rows we have and subtract one. (2) We take the number of columns we have and subtract one. (2) We take the number of columns we have and subtract one. (3) We then multiply the numbers we get for the first two parts. (3) We then multiply the numbers we get for the first two parts.
Chi-Square Logic of the χ 2 test: Logic of the χ 2 test: We do not expect observed and expected frequencies to be EXACTLY the same. We do not expect observed and expected frequencies to be EXACTLY the same. Observed and expected values can vary simply by sampling variability. Observed and expected values can vary simply by sampling variability. However, if the value of χ 2 turns out to be larger than that expected by chance, we shall be in a position to reject the null hypothesis. However, if the value of χ 2 turns out to be larger than that expected by chance, we shall be in a position to reject the null hypothesis.
Chi-Square EXAMPLE: EXAMPLE: Let us say one was interested in investigating the relationship between gender and opinions on accountability. Let us say one was interested in investigating the relationship between gender and opinions on accountability. Our null hypothesis is that gender makes no difference in attitudes towards accountability. Our research hypothesis is that gender makes a difference in attitudes towards accountability. Our null hypothesis is that gender makes no difference in attitudes towards accountability. Our research hypothesis is that gender makes a difference in attitudes towards accountability.
Chi-Square Gender Opinion on Accountability MaleFemale Row Totals Accountability good for educational system Accountability bad for educational system Col. Totals
Chi-Square It is important to note that the numbers in each cell are actual frequencies rather than percentages. It is important to note that the numbers in each cell are actual frequencies rather than percentages. Let us go through our six-step hypothesis testing method in this case. Let us go through our six-step hypothesis testing method in this case.
Chi-Square Step 2: State the Research hypothesis Step 2: State the Research hypothesis H 1 : Gender does make a difference when predicting to attitudes towards educational accountability. H 1 : Gender does make a difference when predicting to attitudes towards educational accountability. Step 1-State null hypothesis Step 1-State null hypothesis H o : Gender does not make a difference when predicting to attitudes towards educational accountability. H o : Gender does not make a difference when predicting to attitudes towards educational accountability.
Chi-Square Step 3: Select a significance level: Let’s chose α=.01 Step 3: Select a significance level: Let’s chose α=.01 Step 4: Collect and summarize the sample data: Step 4: Collect and summarize the sample data: Calculation of χ 2 : Calculation of χ 2 : Compute out EXPECTED FREQUENCIES for EACH CELL Compute out EXPECTED FREQUENCIES for EACH CELL
Chi-Square Computing out EXPECTED FREQUENCIES Computing out EXPECTED FREQUENCIES cell a- males who believe that that accountability is good for the educational system cell a- males who believe that that accountability is good for the educational system (197) (225) = 96.8 (197) (225) =
Chi-Square b- females who believe that that accountability is good for the educational system b- females who believe that that accountability is good for the educational system (261) (225) = (261) (225) =
Chi-Square c- males who believe that that accountability is bad for the educational system c- males who believe that that accountability is bad for the educational system (197) (233) = (197) (233) =
Chi-Square d- females who believe that that accountability is bad for the educational system d- females who believe that that accountability is bad for the educational system (261) (233) = (261) (233) =
Set up a chi-square table Cell f observed f expected f observed- f expected (f obs- f exp) 2 (f obs- f exp) 2 / f exp A B C D Total
Chi-Square Step 5 Step 5 Obtaining the sampling distribution. Look at a chi-square table. We will use the chi-square test with 1 degree of freedom. Why one? df=(r-1) X (c-1) Obtaining the sampling distribution. Look at a chi-square table. We will use the chi-square test with 1 degree of freedom. Why one? df=(r-1) X (c-1) We have 2 rows and 2 columns. We have 2 rows and 2 columns. so we get df= (2-1) X (2-1)= 1 X 1=1 so we get df= (2-1) X (2-1)= 1 X 1=1 With our choice of α=.01, we get a χ 2 critical of (found in chi-square table, p. 566) With our choice of α=.01, we get a χ 2 critical of (found in chi-square table, p. 566)
Chi-Square If we find a χ 2 greater than or equal to we reject the null hypothesis and conclude that gender does make a difference when predicting to attitudes towards educational accountability. If we find a χ 2 greater than or equal to we reject the null hypothesis and conclude that gender does make a difference when predicting to attitudes towards educational accountability. If we find a χ 2 less than we fail to reject the null hypothesis and conclude that gender does not make a difference when predicting to attitudes towards educational accountability. If we find a χ 2 less than we fail to reject the null hypothesis and conclude that gender does not make a difference when predicting to attitudes towards educational accountability.
Chi-Square Note: All χ 2 tests are one-tailed tests. Note: All χ 2 tests are one-tailed tests. Chi-square can only tell you whether a variable is significant. Chi-square can only tell you whether a variable is significant. Chi-square can not tell you anything about the DIRECTIONALITY of the relationship. Chi-square can not tell you anything about the DIRECTIONALITY of the relationship. You must inspect the column percentages as you move across categories of the independent variable to determine DIRECTIONALITY. You must inspect the column percentages as you move across categories of the independent variable to determine DIRECTIONALITY.
Chi-Square Another way to determine DIRECTIONALITY is to look at the RESIDUALS (you can instruct SPSS to present the residuals on your output file. Another way to determine DIRECTIONALITY is to look at the RESIDUALS (you can instruct SPSS to present the residuals on your output file. RESIDUALS are simply the OBSERVED cell count minus the EXPECTED value.) RESIDUALS are simply the OBSERVED cell count minus the EXPECTED value.) If the RESIDUALS are NEGATIVE, you are getting fewer OBSERVED cases than EXPECTED in a CELL. If the RESIDUALS are NEGATIVE, you are getting fewer OBSERVED cases than EXPECTED in a CELL. If the RESIDUALS are POSITIVE, you are getting more OBSERVED cases than EXPECTED in a CELL. If the RESIDUALS are POSITIVE, you are getting more OBSERVED cases than EXPECTED in a CELL.
Chi-Square To determine DIRECTIONALITY, look at the SIGN changes of the RESIDUALS as you move across categories of the independent variable. To determine DIRECTIONALITY, look at the SIGN changes of the RESIDUALS as you move across categories of the independent variable. Let us assume that the RESIDUALS start out NEGATIVE and end up POSITIVE. This would imply that the independent variable is related to the dependent variable. Let us assume that the RESIDUALS start out NEGATIVE and end up POSITIVE. This would imply that the independent variable is related to the dependent variable.
Chi-Square Step 6: Make a decision: Step 6: Make a decision: χ 2 observed= and χ2 critical= χ 2 observed= and χ2 critical= Decision: REJECT H o : Decision: REJECT H o : χ2 observed is greater than χ 2 critical χ2 observed is greater than χ 2 critical We easily reject the null hypothesis and conclude that gender does make a difference when predicting to attitudes towards educational accountability. We easily reject the null hypothesis and conclude that gender does make a difference when predicting to attitudes towards educational accountability.
Chi-Square Two points to note in this example. Two points to note in this example. We had one degree of freedom. We had one degree of freedom. By one degree of freedom we mean that only one number in the table is actually free to vary. By one degree of freedom we mean that only one number in the table is actually free to vary. Assume we know the row and column totals. Assume we know the row and column totals. Once we know one number in a 2 X 2 table, we can find the other three. Once we know one number in a 2 X 2 table, we can find the other three.
Chi-Square If I knew the row and column totals, there is only one cell that is free to vary. If I knew the row and column totals, there is only one cell that is free to vary. aba+b cdc+d a+cb+da+b+c+d
Chi-Square WE ONLY NEED TO KNOW ONE CELL TO KNOW THE ENTIRE TABLE. THIS IS WHY WE HAD ONE DEGREE OF FREEDOM. WE ONLY NEED TO KNOW ONE CELL TO KNOW THE ENTIRE TABLE. THIS IS WHY WE HAD ONE DEGREE OF FREEDOM.
Chi-Square In our example, f observed - f expected = the same number for each cell (-29.2 or 29.2) because the table had only one degree of freedom. If a table has more than one degree of freedom, f observed- f expected does not necessarily equal the same number in every cell (and will not generally be the same). In our example, f observed - f expected = the same number for each cell (-29.2 or 29.2) because the table had only one degree of freedom. If a table has more than one degree of freedom, f observed- f expected does not necessarily equal the same number in every cell (and will not generally be the same).
Chi-Square How many cells do we need to know in a 3 X 3 table? I told you the formula tells us the answer is (r-1) (c-1) How many cells do we need to know in a 3 X 3 table? I told you the formula tells us the answer is (r-1) (c-1) (3-1) (3-1)=(2) X (2) = 4 (3-1) (3-1)=(2) X (2) = 4
Let us see how we get df to equal 4. abca+b+c defd+e+f ghig+h+i a+d+gb+e+hc+f+i a+b+c+d+e+ f+g+h+i
Chi-Square Let us say we knew cell a. Let us say we knew cell a. Could we know all the other cells in the table? Could we know all the other cells in the table? Not this time. Let’s say we know cells a and b. Not this time. Let’s say we know cells a and b. If we knew cells a and b than we can find out cell c, but we would not know any other cells. If we knew cells a and b than we can find out cell c, but we would not know any other cells. Only if we know four cells: a, b, d, and e would we be able to find the other five cells. This is why we have four degrees of freedom in a 3 X 3 table. Only if we know four cells: a, b, d, and e would we be able to find the other five cells. This is why we have four degrees of freedom in a 3 X 3 table.
Chi-Square One other point about chi-square. One other point about chi-square. Chi-square can tell you whether a relationship is significant. Chi-square can tell you whether a relationship is significant. Chi-square can also tell you what cells are most important in determining the significance of the relationship. In our example we find that all cells contribute to the significance of the relationship. Chi-square can also tell you what cells are most important in determining the significance of the relationship. In our example we find that all cells contribute to the significance of the relationship.
Set up a chi-square table Cell f observed f expected f observed- f expected (f obs- f exp) 2 (f obs- f exp) 2 / f exp A B C D Total
Chi-Square Three of our cells have individual χ 2 greater than needed to establish statistical significance for an entire relationship. Since χ 2 cannot be negative, we can determine if part of our relationship drives the entire relationship to statistical significance. Three of our cells have individual χ 2 greater than needed to establish statistical significance for an entire relationship. Since χ 2 cannot be negative, we can determine if part of our relationship drives the entire relationship to statistical significance.
SPSS Command Syntax for Crosstabs Note: You can get EXPECTED frequencies in SPSS by going into Note: You can get EXPECTED frequencies in SPSS by going into ANALYZE ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVE STATISTICS CROSSTABS and clicking on CROSSTABS CROSSTABS and clicking on CROSSTABS Dependent variable goes into row box Dependent variable goes into row box Independent variable goes into column box Independent variable goes into column box
SPSS Command Syntax for Crosstabs Click on Cells Click on Cells Click on EXPECTED Click on EXPECTED Also click on UNSTANDARDIZED under residuals.) Click Continue. Also click on UNSTANDARDIZED under residuals.) Click Continue. Click on the STATISTICS box Click on the STATISTICS box Click on Chi-Square. Click on Chi-Square. You may also want to click on the Contingency Coefficient and lambda. You may also want to click on the Contingency Coefficient and lambda.