Download presentation
Presentation is loading. Please wait.
Published byCoral Cummings Modified over 8 years ago
1
The 2 (chi-squared) test for independence
2
One way of finding out is to perform a 2 (chi-squared) test for independence. We might want to find out whether or not there is an association between ‘age-group taught’ and ‘gender’. We first set up a null hypothesis, H 0, and an alternative hypothesis, H 1. H 0 always states that the data sets are independent, and H 1 always states that they are related. To set up the test: A random sample of 200 teachers in higher education, secondary schools and primary schools gave the following numbers of men and women in each sector: Higher Education Secondary Education Primary Education Male 213920 Female 135552 In this case, H 0 could be “The age-group taught is independent of gender”. H 1 could be “There is an association between age-group taught and gender.”
3
Higher Education Secondary Education Primary Education Male 213920 Female 135552 We put the data into a table. The elements in the table are our observed data and the table is known as a contingency table.
4
Higher Education Secondary Education Primary Education TOTAL Male 213920 Female 135552 TOTAL 349472 80 We put the data into a table. The elements in the table are our observed data and the table is known as a contingency table. 120 200
5
Higher Education Secondary Education Primary Education TOTAL Male 21392080 Female 135552120 TOTAL 349472200 From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL 349472200 We put the data into tables. The elements in the table are our observed data and the table is known as a contingency table. 13.637.6
6
The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL 349472200 13.637.628.8 20.456.443.2 In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 2
7
Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 TOTAL 349472200 13.637.6 In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. 28.8 20.456.443.2 This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) Higher Education Secondary Education Primary Education Male 213920 Female 135552 df = 2
8
Higher Education Secondary Education Primary Education Male 213920 Female 135552 Higher Education Secondary Education Primary Education Male 13.637.628.8 Female 20.456.443.2 Contingency Table – Observed DataExpected Frequencies Now we are ready to calculate the 2 value using the formula: 2 calc f o is the observed value f e is the expected value 2 calc Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 In this case the 2 calc value is 11.3, and the critical value at 5% is 5.991. So we do not accept H 0, the null hypothesis. There is an association between age-group taught and gender.
9
If the 2 calc value is less than the critical value, we do accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 If the p- value is less than the significance level, we do not accept H 0, the null hypothesis. If the p- value is more than the significance level, we do accept the null hypothesis, so we accept H 1
10
2 is given to you. p is the probability df is the degree of freedom You can do all this on the GDC: Enter the data into a Matrix MATRIXENTER [EDIT] Enter the size of your matrix ; in this case 2 x 3 (2 rows, 3 columns) Enter your data, pressing after every value. ENTER STAT [TESTS] Scroll up to find 2 ENTER You will now see where your table of expected values will be ; change it if you wish. Otherwise scroll down to Calculate and ENTER To see the table of expected values: Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 ENTER MATRIX
11
One way of finding out is to perform a 2 (chi-squared) test for independence. We may want to find out whether favourite colour of car and gender are independent or related. We first set up a null hypothesis, H 0, and an alternative hypothesis, H 1. H 0 always states that the data sets are independent, and H 1 always states that they are related. To set up the test: Suppose we collect data on the favourite colour of car for men and women. BlackWhiteRedBlue Male51223324 Female 45362227 In this case, H 0 could be “The favourite colour of car is independent of gender”. H 1 could be “There is an association between favourite colour of car and gender.”
12
BlackWhiteRedBlue TOTAL Male51223324 130 Female 45362227 TOTAL 965855 130 26051
13
BlackWhiteRedBlue TOTAL Male 130 Female 130 TOTAL 96585551260 From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size 4829 BlackWhiteRedBlue TOTAL Male51223324 130 Female 45362227130 TOTAL 96585551260 27.5
14
BlackWhiteRedBlue TOTAL Male 482927.5 130 Female 130 TOTAL 96585551260 The expected frequency for each cell will be: row total x column total total sample size 482927.5 In fact for this table we only need to actually work out three of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 3 25.5
15
BlackWhiteRedBlue TOTAL Male 482927.525.5 130 Female 482927.525.5 130 TOTAL 96585551260 In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) df = 3 BlackWhiteRedBlue Male51223324 Female 45362227
16
Contingency Table – Observed DataExpected Frequencies Now we are ready to calculate the 2 value using the formula: f o is the observed value f e is the expected value 2 calc Finally use the table of critical values at the back of your formulae booklet. If the 2 calc value is less than the critical value, we accept H 0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H 1 In this case the 2 calc value is 6.13, and the critical value at 5% is 7.815. So we do accept H 0, the null hypothesis. There is no association between favourite colour of car and gender. BlackWhiteRedBlue Male51223324 Female 45362227 BlackWhiteRedBlue Male482927.525.5 Female 482927.525.5 2 calc
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.