Chi-squared Association Index The other variants of chi-squared (looking for a difference and goodness of fit ) are covered separately
What does it do? There are two ways to use this test: Looking for association between two factors Eg: Are snail shell colour and habitat choice associated? Looking for a difference in population/employment structures Eg: Is the population structure the same in two villages? You do the calculations the same way in both cases! The structures version is strictly for comparing two sets of data that are “on a level” – this method should not be used to compare, say, local and national data – that’s covered in Chi-squared goodness of fit.
Planning to use it? Make sure that… You are working with numbers of things, not, eg area, weight, length, %… You have an average of at least 5 things (people/plants/species…) in each category
How does it work? For Association, you assume (null hypothesis) there is no association For Difference in Structures, you assume (null hypothesis) there is no difference in structures It compares observed values the data you collected expected values what you’d get if there was really no association or no difference in structures
Doing the test These are the stages in doing the test: Write down your hypotheses Work out the expected values Use the chi-squared formula to get a chi-squared value Work out your degrees of freedom Look at the tables Make a decision The underlined terms are hyperlinks to the appropriate slide Examples Association Difference in structures
Hypotheses Association: H0: There is no association H1: There is some association Difference in structures H0: There is no difference in structures H1: There is some difference in structures This is the standard form for hypotheses for this type of chi-squared test – the others are not the same
Expected Values Your data here will be in a table. To work out the expected values: Add up the totals of all the rows, and the totals of all the columns. Also find the overall total of all the data Work out expected values using Eg, to work out the expected value for something in 2nd row, 3rd column, multiply total of 2nd row by total of 3rd column and divide by overall total
Chi-Squared Formula For each cell in your table, work out O = Observed value – your data E = Expected value – which you’ve calculated Then add all your values up. This gives the chi-squared value S = “Sum of”
degrees of freedom = (rows – 1)(columns – 1) The formula here for degrees of freedom is degrees of freedom = (rows – 1)(columns – 1) You do not need to worry about what this means –just make sure you know the formula! But in case you’re interested – the larger your table, the more likely you are to get a “strange” result in one or more cells. The degrees of freedom is a way of allowing for this in the test. Worth emphasising that this is a different formula to the other types of chi-squared
Tables This is a chi-squared table These are your significance levels eg 0.05 = 5% These are your degrees of freedom (df) Perhaps a good time to check they’re OK on reading tables?
Make a decision If the value you calculated is bigger than the tables, you reject your null hypothesis If the value you calculated is smaller than the tables, you accept your null hypothesis. Remember in each case to refer back to the actual example! In all tests except MannWhitney & Wilcoxon, they’ll be rejecting if their value is bigger
Example: Snail shell colour & habitat Samples were taken from limestone woodland and limestone pavement, and the numbers of light- and dark-shelled snails noted. Hypotheses: H0:Shell colour and habitat preference are not associated H1 Shell colour and habitat preference are associated
The data Light Dark Pavement 115 76 Woodland 69 106
Totals Light Dark Totals Pavement 115 76 191 Woodland 69 106 175 If you have any keen mathematicians, they could also look up Yates’ correction (for 2 x 2 tables)
Row Total Column Total Expected Values Row Total Column Total Overall Total Expected value = Expected values: Light Dark Pavement Woodland As a check, they could see that the row & column totals still add up to the same thing
The calculations: (O-E)2/E Light Dark Pavement Woodland
Tables This is a chi-squared table These are your significance levels eg 0.05 = 5% These are your degrees of freedom (df) Perhaps a good time to check they’re OK on reading tables?
The test c2 = 15.776 Degrees of freedom = (2 – 1)(2 – 1) = 1 Critical value (5%) = 3.841 Reject H0 – there is some association between snail shell colour and habitat preference
Tables This is a chi-squared table These are your significance levels eg 0.05 = 5% These are your degrees of freedom (df) Perhaps a good time to check they’re OK on reading tables?
Example: Comparing population structures Data on the population age structures of two villages were obtained. The aim is to assess whether there is any difference in the age structures. Hypotheses: H0:There is no difference in the villages’ population structures H1 There is a difference in the villages’ population structures
The data Age Village A Village B 0 -10 16 25 11-20 12 32 21-30 32 50 0 -10 16 25 11-20 12 32 21-30 32 50 31-40 40 68 41-50 60 70 51+ 40 25
Totals Age A B Total 0 -10 16 25 41 11-20 12 32 44 21-30 32 50 82 31-40 40 68 108 41-50 60 70 130 51+ 40 25 65 Total 200 270 470
Row Total Column Total Expected Values Row Total Column Total Overall Total Expected Value = Age Village A Village B 0 -10 10-20 20-30 30-40 40-50 50+ 34.894 47.106 45.957 62.043 55.319 74.681 27.660 37.340 At this stage, check whether any expected < 5. If so, amalgamate some age categories
The calculations: (O-E)2/E Age Village A Village B 0 -10 10-20 20-30 30-40 40-50 50+ 0.089 2.414 1.788 0.240 0.178 0.772 0.572 0.396 0.293 5.505 4.078
Tables This is a chi-squared table These are your significance levels eg 0.05 = 5% These are your degrees of freedom (df) Perhaps a good time to check they’re OK on reading tables?
The test c2 = 0.120 + 0.089 + 2.414 + 1.788 + 0.240 + 0.178 + 0.772 + 0.572 + 0.396 + 0.293 + 5.505 + 4.078 c2 = 16.445 Degrees of freedom = (6 – 1)(2 – 1) = 5 Critical value (5%) = 11.070 Reject H0 – the population structures of the two villages are different
Tables This is a chi-squared table These are your significance levels eg 0.05 = 5% These are your degrees of freedom (df) Perhaps a good time to check they’re OK on reading tables?