Download presentation
Presentation is loading. Please wait.
1
Correlation
2
Correlation This Chapter is on Correlation
We will look at patterns in data on a scatter graph We will be looking at how to calculate the variance and co-variance of variables We will see how to numerically measure the strength of correlation between two variables
3
Correlation Positive Negative None 6A Scatter Graphs
Scatter Graphs are a way of representing 2 sets of data. It is then possible to see whether they are related. Positive Correlation As one variable increases, so does the other Negative Correlation As one variable increases, the other decreases No Correlation There seems to be no pattern linking the two variables Positive Negative None 6A
4
Correlation Scatter Graphs In the study of a city, the population density, in people/hectare, and the distance from the city centre, in km, was investigated by choosing sample areas. The results are as follows: Plot a scatter graph and describe the correlation. Interpret what the correlation means. 50 40 Area A B C D E Distance 0.6 3.8 2.4 3.0 2.0 Pop. Density 50 22 14 20 33 Pop. Density (people/hectare) 30 20 10 Area F G H I J Distance 1.5 1.8 3.4 4.0 0.9 Pop. Density 47 25 8 16 38 1 2 3 4 Distance from centre (km) The correlation is negative, which means that as we get further from the city centre, the population density decreases.
5
Teachings for Exercise 6B and 6C
6
Correlation Variability of Bivariate Data We learnt in chapter 3 that:
In Correlation: Similarly for y: And you can also calculate the Co-variance of both variables (Although remember that this formula changed to make it easier to use) ‘How x varies’ ‘How y varies’ ‘How x and y vary together’ 6B/C
7
Correlation Variability of Bivariate Data
Like in chapter 3, we can use a formula which will make calculations easier BUT: 6B/C
8
Correlation Variability of Bivariate Data
Multiply both sides by ‘n’ The easier formula for variance from chapter 3 For the second fraction, square the top and bottom separately Multiplying both fractions by ‘n’ will cancel a ‘divide by n’ from each of them 6B/C
9
Correlation Variability of Bivariate Data
These are the formulae for Sxx, Syy and Sxy. You are given these in the formula booklet. You do not need to know how to derive them (like we just did!) 6B/C
10
Correlation Variability of Bivariate Data 6B/C
Calculate Sxx, Syy and Sxy, based on the following information. 6B/C
11
Correlation Variability of Bivariate Data 6B/C
The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need Baby A B C D E F Head size (x) 31 33 30 31 35 30 Gestation period (y) 36 37 38 38 40 40 x2 961 1089 900 961 1225 900 y2 1296 1369 1444 1444 1600 1600 xy 1116 1221 1140 1178 1400 1200 6B/C
12
Correlation Variability of Bivariate Data 6B/C
The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need 6B/C
13
Correlation Product Moment Correlation Coefficient
We can test the correlation of data by calculating the Product Moment Correlation Coefficient. This uses Sxx, Syy and Sxy. The value of this number tells you what the correlation is and how strong it is. The closer to 1, the stronger the positive correlation. The same applies for -1 and negative correlation. A value close to 0 implies no linear correlation. Negative Correlation No Linear Correlation Positive Correlation -1 1 6B/C
15
There is positive correlation, as x increases, y does as well.
Product Moment Correlation Coefficient Given the following data, calculate the Product Moment Correlation Coefficient. There is positive correlation, as x increases, y does as well. 6B/C
16
Correlation Limitations of the Product Moment Correlation Coefficient
Sometimes it may indicate Correlation between unrelated variables Cars on a particular street have increased, as have the sales of DVDs in town The PMCC would indicate positive correlation where the two are most likely not linked The speed of computers has increased, as has life expectancy amongst people These are not directly linked, but are both due to scientific developments 6B/C
17
Correlation Using Coding with the PMCC
Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 6D
18
Correlation Using Coding with the PMCC
Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 6D
19
Correlation Using Coding with the PMCC
Calculating the PMCC from this table, using coding. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 6D
20
So coding will not affect the PMCC!
Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 So coding will not affect the PMCC! 6D
21
Summary We have looked at plotting scatter graphs
We have looked at calculating measures of variance, Sxx, Syy and Sxy We have also seen types of correlation and how to recognise them on a graph We have calculated the Product Moment Correlation Coefficient, and interpreted it. It is a numerical measure of correlation.
22
Spearman’s Rank Correlation
CEV
23
Correlation So far, we have considered the relationship of bivariate data which can be plotted directly, e.g. Results of a set of pupils in two tests, price vs age of cars etc. Sometimes we do not have data which suits this ideal, but we might still want to compare two sets of related data.
24
Example Consider a manufacturer experimenting with different flavours of a drink. Two tasters put eight flavours, labelled A-H, in order of preference (starting with their favourite). The results are given in the table below: Taster 1 D C G B A E F H Taster 2
25
Example Taster 1 D C G B A E F H Taster 2 Unsurprisingly the two tasters do not agree exactly. However, there is clearly some consensus between them on the more pleasant flavours (C and D), and the least pleasant (e.g. F). It would be useful to measure how well the tasters agree. This is where we use the idea of “ranking” the flavours for each taster, according to where it appears in their list.
26
Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x
Rank for Taster 2, y A B C D E F G H
27
Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x
Rank for Taster 2, y A 5 B 4 C 2 D 1 E 6 F 7 G 3 H 8
28
Example Taster 1 D C G B A E F H Taster 2 Flavour Rank for Taster 1, x
Rank for Taster 2, y A 5 7 B 4 3 C 2 1 D E 6 F 8 G H
29
Example If we plotted the scatter diagram for this information, we would see some positive correlation. It would seem reasonable to measure the degree of agreement by calculating the PMCC for the two sets of ranks. However, to emphasise that x and y are ranks rather than continuous variables, the coefficient is given a new symbol, rs. We can calculate rs using the same formula as for the PMCC. However, there is a simpler formula...
30
Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 B 4 3 C 2 1 D E 6 F 8 G H Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item
31
Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 -2 B 4 3 1 C 2 D -1 E 6 F 8 G H Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item
32
Note: Σdi = 0 always! Example Flavour Rank for Taster 1 (x) Rank for Taster 2 (y) di di² A 5 7 -2 4 B 3 1 C 2 D -1 E 6 F 8 G H 9 Spearman’s Rank Correlation Coefficient is: where di = xi – yi for the ith item
33
Example Note: Σdi = 0 always! Flavour Rank for Taster 1 (x)
Rank for Taster 2 (y) di di² A 5 7 -2 4 B 3 1 C 2 D -1 E 6 F 8 G H 9
34
Interpreting rs When interpreting Spearman’s Rank Correlation Coefficient, it is important to remember that this value is the PMCC of the ranks. If the rank orders for the two tasters had been identical, then the points on the scatter diagram would fall exactly on the line y = x, and rs would equal 1. If one rank order was the exact reverse of the other, similarly, then all the points would fall on y = -x and rs = -1. If there was little agreement between the two rank orders, rs would be close to 0.
35
Comparing PMCC and Spearman’s
For some sets of data it would be reasonable to calculate both r and rs, for example the table below. r = rs = 0.745 Student A B C D E F G H I J Pure 42 21 25 32 34 27 23 40 20 16 Stats 41 36 29 35 24 22 47 30 Student A B C D E F G H I J Pure Rank 1 8 6 4 3 5 7 2 9 10 Stats Rank
36
Comparing PMCC and Spearman’s
These two values are not the same, because they are measuring different things. r is measuring how close the points on the scatter diagram are to a straight line, which is the strength of the linear relationship between x and y, whereas rs is measuring the tendency for y to increase as x increases, not necessarily in a linear way. NOTE: For negative correlation, rs measures the tendency for y to decrease as x increases.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.