Inferential Statistics 3: The Chi Square Test Advanced Higher Geography Statistics Ollie Bray – Knox Academy, East Lothian
Introduction (1) We often have occasions to make comparisons between two characteristics of something to see if they are linked or related to each other. We often have occasions to make comparisons between two characteristics of something to see if they are linked or related to each other. One way to do this is to work out what we would expect to find if there was no relationship between them (the usual null hypothesis) and what we actually observe. One way to do this is to work out what we would expect to find if there was no relationship between them (the usual null hypothesis) and what we actually observe.
Introduction (2) The test we use to measure the differences between what is observed and what is expected according to an assumed hypothesis is called the chi- square test. The test we use to measure the differences between what is observed and what is expected according to an assumed hypothesis is called the chi- square test.
For Example Some null hypotheses may be: Some null hypotheses may be: –there is no relationship between the height of the land and the vegetation cover. –there is no difference in the location of superstores and small grocers shops –there is no connection between the size of farm and the type of farm
Important The chi square test can only be used on data that has the following characteristics: The chi square test can only be used on data that has the following characteristics: The data must be in the form of frequencies The frequency data must have a precise numerical value and must be organised into categories or groups. The total number of observations must be greater than 20. The expected frequency in any one cell of the table must be greater than 5.
Formula χ 2 = (O – E) 2 E χ 2 = The value of chi square O = The observed value E = The expected value (O – E) 2 = all the values of (O – E) squared then added together
Write down the NULL HYPOTHESIS and ALTERNATIVE HYPOTHESIS and set the LEVEL OF SIGNIFICANCE. Write down the NULL HYPOTHESIS and ALTERNATIVE HYPOTHESIS and set the LEVEL OF SIGNIFICANCE. NH there is no difference in the distribution of old established industries and food processing industries in the postal district of Leicester NH there is no difference in the distribution of old established industries and food processing industries in the postal district of Leicester AH There is a difference in the distribution of old established industries and food processing industries in the postal district of Leicester AH There is a difference in the distribution of old established industries and food processing industries in the postal district of Leicester We will set the level of significance at We will set the level of significance at 0.05.
Construct a table with the information you have observed or obtained. Observed Frequencies (O) Post Codes LE1LE2LE3LE4 LE5 & LE6 Row Total Old Industry Food Industry Column Total (Note: that although there are 3 cells in the table that are not greater than 5, these are observed frequencies. It is only the expected frequencies that have to be greater than 5.)
Work out the expected frequency. Work out the expected frequency. Expected frequency = row total x column total Grand total Post Codes LE1LE2LE3LE4 LE5 & LE6 Row Total Old Industry 7.07 Food Industry Column Total Eg: expected frequency for old industry in LE1 = (50 x 13) / 92 = 7.07
Post Codes LE1LE2LE3LE4 LE5 & LE6 Row Total Old Industry Food Industry Column Total
For each of the cells calculate. For each of the cells calculate. Post Codes LE1LE2LE3LE4 LE5 & LE6 Row Total Old Industry 0.53 Food Industry Column Total Eg: Old industry in LE1 is (9 – 7.07) 2 / 7.07 = 0.53 (O – E) 2 E
Post Codes LE1LE2LE3LE4 LE5 &L E6 Old Industry Food Industry Add up all of the above numbers to obtain the value for chi square: χ 2 =
Look up the significance tables. These will tell you whether to accept the null hypothesis or reject it. Look up the significance tables. These will tell you whether to accept the null hypothesis or reject it. The number of degrees of freedom to use is: the number of rows in the table minus 1, multiplied by the number of columns minus 1. This is (2-1) x (5-1) = 1 x 4 = 4 degrees of freedom. We find that our answer of is greater than the critical value of 9.49 (for 4 degrees of freedom and a significance level of 0.05) and so we reject the null hypothesis.
The distribution of old established industry and food processing industries in Leicester is significantly different. Now you have to look for geographical factors to explain your findings
Your Turn Read page 46, 47 and 48 of Geographical Measurements and Techniques: Statistical Awareness, by LT Scotland, June Read page 46, 47 and 48 of Geographical Measurements and Techniques: Statistical Awareness, by LT Scotland, June Answer Task 1 on page 48. Answer Task 1 on page 48.