The 2 (chi-squared) test for independence
One way of finding out is to perform a 2 (chi-squared) test for independence. We might want to find out whether or not there is an association between ‘age-group taught’ and ‘gender’. We first set up a null hypothesis, H 0, and an alternative hypothesis, H 1. H 0 always states that the data sets are independent, and H 1 always states that they are related. To set up the test: A random sample of 200 teachers in higher education, secondary schools and primary schools gave the following numbers of men and women in each sector: Higher Education Secondary Education Primary Education Male Female In this case, H 0 could be “The age-group taught is independent of gender”. H 1 could be “There is an association between age-group taught and gender.”
Higher Education Secondary Education Primary Education Male Female We put the data into a table. The elements in the table are our observed data and the table is known as a contingency table.
Higher Education Secondary Education Primary Education TOTAL Male Female TOTAL We put the data into a table. The elements in the table are our observed data and the table is known as a contingency table
Higher Education Secondary Education Primary Education TOTAL Male Female TOTAL From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL We put the data into tables. The elements in the table are our observed data and the table is known as a contingency table
The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 2
Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) Higher Education Secondary Education Primary Education Male Female df = 2
Higher Education Secondary Education Primary Education Male Female Higher Education Secondary Education Primary Education Male Female Contingency Table – Observed DataExpected Frequencies Now we are ready to calculate the 2 value using the formula: 2 calc f o is the observed value f e is the expected value 2 calc Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 In this case the 2 calc value is 11.3, and the critical value at 5% is So we do not accept H 0, the null hypothesis. There is an association between age-group taught and gender.
If the 2 calc value is less than the critical value, we do accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 If the p- value is less than the significance level, we do not accept H 0, the null hypothesis. If the p- value is more than the significance level, we do accept the null hypothesis, so we accept H 1
2 is given to you. p is the probability df is the degree of freedom You can do all this on the GDC: Enter the data into a Matrix MATRIXENTER [EDIT] Enter the size of your matrix ; in this case 2 x 3 (2 rows, 3 columns) Enter your data, pressing after every value. ENTER STAT [TESTS] Scroll up to find 2 ENTER You will now see where your table of expected values will be ; change it if you wish. Otherwise scroll down to Calculate and ENTER To see the table of expected values: Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 ENTER MATRIX
One way of finding out is to perform a 2 (chi-squared) test for independence. We may want to find out whether favourite colour of car and gender are independent or related. We first set up a null hypothesis, H 0, and an alternative hypothesis, H 1. H 0 always states that the data sets are independent, and H 1 always states that they are related. To set up the test: Suppose we collect data on the favourite colour of car for men and women. BlackWhiteRedBlue Male Female In this case, H 0 could be “The favourite colour of car is independent of gender”. H 1 could be “There is an association between favourite colour of car and gender.”
BlackWhiteRedBlue TOTAL Male Female TOTAL
BlackWhiteRedBlue TOTAL Male 130 Female 130 TOTAL From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size 4829 BlackWhiteRedBlue TOTAL Male Female TOTAL
BlackWhiteRedBlue TOTAL Male Female 130 TOTAL The expected frequency for each cell will be: row total x column total total sample size In fact for this table we only need to actually work out three of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is
BlackWhiteRedBlue TOTAL Male Female TOTAL In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) df = 3 BlackWhiteRedBlue Male Female
Contingency Table – Observed DataExpected Frequencies Now we are ready to calculate the 2 value using the formula: f o is the observed value f e is the expected value 2 calc Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 In this case the 2 calc value is 6.13, and the critical value at 5% is So we do accept H 0, the null hypothesis. There is no association between favourite colour of car and gender. BlackWhiteRedBlue Male Female BlackWhiteRedBlue Male Female 2 calc