Correlation – Pearson’s
What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1 and 1 Perfect negative correlation +1 0 No correlation Perfect positive correlation
Planning to use it? You have continuous data (eg lengths, weights…) – it isn’t valid otherwise You have at least 5 data pairs (more is better) You want to use Pearson’s rather than rank correlation – does the scatter diagram look close to a straight line? Make sure that…
How does it work? You assume (null hypothesis) there is no correlation The test involves calculating totals from your data and substituting into a formula. This works out how far off a straight line your points are The calculation can be done automatically on a spreadsheet, and on many graphic calculators
Doing the test These are the stages in doing the test: 1.Write down your hypotheseshypotheses 2.Work out the totals needed for the formulatotals 3.Use the formula to get a value for the correlationformula 4.Look at the tablestables 5.Make a decisiondecision Click here Click here for an example Click here Click here to find out how to calculate a best-fit line
Hypotheses H 0: r = 0 (there is no correlation) For H 1, you have a choice, depending on what alternative you were looking for. H 1: r > 0 (positive correlation) orH 1: r < 0 (negative correlation) orH 0: r 0 (some correlation) If you have a good scientific reason for expecting a particular kind of correlation, use one of the first two. If not, use the r 0
Totals Get your data in table form like this, and complete the extra columns shown xyx 2 y 2 xy Total each column. This gives you x, y, x 2, y 2, and xy
Formula n = number of data pairs x = sum of x-values, y = sum of y values etc
Tables This is a Pearson’s correlation coefficient table This is your number of pairs These are your significance levels eg 0.05 = 5%
Make a decision If your value is bigger than the tables value (ignoring signs), then you can reject the null hypothesis. Otherwise you must accept it. Make sure you choose the right tables value – it depends whether your test is 1 or 2 tailed: If you are using H 1 : r > 0 or H 1 : r < 0, you are doing a 1-tailed test If you are using H 1 : r 0, you are doing a 2-tailed test
Soil Salinity & Plant Height The data below were collected on soil salinity and plant height. Hypotheses: H 0: r = 0 (no correlation) H 1 r 0 (some correlation)
Totals Soil Salinity (x) Plant Height (y) x y xy x = 78 y = 265 x 2 = 1438 y 2 = xy = 2582 NB: You HAVE to work out y 2 by squaring all the values and adding up. You CAN’T work out the sum of y, then square.
Formula We now put all the totals into the formula: Click here Click here for some hints on working this out on a calculator
Pearson’s on the Calculator First check if the calculator is “scientific” – that is, it automatically does multiplication before addition Try 3. If you get 14, it does multiplication 1 st If you get 18, it doesn’t Work out the top of the fraction. For a scientific calculator, put it in exactly as shown ((78)(65) means 78 65) For a non-scientific calculator, put in brackets 2582 – (1/6 78 65) (-863) Work out each part of the bottom of the fraction. Non-scientific calculator: (1/6 (78 2 )) (424, ) Multiply the two parts from the bottom together ( ) Take the square root of previous answer – keep answer in memory ( ) Divide top of fraction by previous answer
The test We have used H 1 r 0 – so it is a 2-tailed test Tables value (5% level): Our value: So we can reject H 0 – there is some correlation
Calculating a Best-Fit Line If Pearson’s is significant, then it’s valid to calculate a best fit (regression) line The line has equation y = a + bx where a and b can be calculated This lets you make predictions of the height of a plant given the soil salinity, by putting values of x into the equation
Finding the Line The line has equation y = a + bx So for the soil salinity, the line is: So the equation is: y = – 2.035x