Spearman’s Rank For relationship data
Non-parametric i.e. no assumptions are made about data fitting a normal distribution You must have more than 5 pairs of data (10+ better) Measures the strength and direction of the relationship between two variables Using Spearman’s Rank
Bedload particle size (cm) Distance Downstream (km) Velocity (m/s) Distance Downstream (km) Discharge (cumecs) Number of passing dog walkers No correlation r s = 0 3 The value for r s (spearman rank) will be between +1 and indicates a perfect positive correlation -1 indicates a perfect negative correlation 0 indicates no correlation at all Positive correlation r s = +1 Negative correlation r s = -1
rsrs =1- 6 d 2 n (n 2 - 1) The Equation
rsrs =1- 6 d 2 n (n 2 - 1) Where: r s = Spearman Rank Correlation Coefficient d 2 = Sum of the squared differences between ranks n = Number of pairs of observations in the sample The Equation
1. Establish the Null Hypothesis H 0 (this is always the negative form. i.e. there is no significant correlation between the variables) and the alternative hypothesis (H 1 ). H 0 - There is no significant correlation between variable X and variable Y H 1 - There is a significant correlation between variable X and variable Y Method
Distance from source (km) Rank R 1 PO4 ppm Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y) Copy your data into the table below as variable x and variable y and label the data sets
3. Rank the individual data sets in sets in increasing order as separate sets of data (i.e. Give the lowest data value the lowest rank) Take each variable in turn Lowest value gets a rank of 1 When you have data values that are the same, they must have the same rank The same thing is done for all data values that are the same Distance from source (km) Rank R 1 PO4 ppm Rank R 2 (variable x)(variable y) Distance from source (km) Rank R 1 PO4 ppm Rank R 2 (variable x)(variable y) If the next two data values were not the same we would be assigning ranks 5 and = 11 so we will divide this rank equally between the data values (there are 2 data values so we divide 11 by 2) 11 / 2 = 5.5 so both the data values are assigned a rank of 5.5
Distance from source (km) Rank R 1 PO4 ppm Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y) The assigned ranks should be recorded in the table
Distance from source (km) Rank R 1 PO4 ppm Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y) Calculate the difference between each pair of ranks R 1 -R 2 (if done correctly the differences should equal zero) Take each variable in turn and record the differences in column d
Distance from source (km) Rank R 1 PO4 ppm Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y) Square the differences in column d Record in column d 2
Distance from source (km) Rank R 1 PO4 ppm Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y) Calculate Sum of d 2 Add up all the values in the d 2 column
Substitute the numbers calculated for the symbols in the equation Work out each part in turn e.g. 1. Work out 6 x d 2 2. Work out n2 3. Work out n2 – 1 4. Work out n x answer to step 3 rsrs =1- 6 d 2 n (n 2 - 1) 5. Work out answer to step one divided by the answer to step 4 6. Work out 1 – the answer to step 5 6. Calculate the r s value d 2 = x 69.5 = 417 n = = 35 6 x 35 = / 210 = = 0.986
If it is a negative number then you have a negative correlation If it is a positive number then you have a positive correlation (You do not need to worry about + and – for the next bit!) Is your r s value positive or negative?
If r s is greater than or equal to the critical value, then there is a significant correlation and the null hypothesis can be rejected Compare your r s value against the table of critical values
Significance level Number of pairs of measurements (n) p = 0.05 (95%) (+ or -) p = 0.01 (99%) (+ or -) Critical values for Spearman’s Rank Correlation Coefficient
If we have a significant correlation at 95% we can go back and check if we have a significant correlation at 99% as well (so we can be 99% confident our results were not due to chance) Check the P-0.05 (95%) confidence level first This means we are 95% confident our results were not due to chance
Is our r s value smaller or larger than our critical value from the critical value table? If the r s value is greater than or equal to the critical value then the null hypothesis can be rejected – There is a significant correlation If the r s value is NOT greater than or equal to the critical value then the null hypothesis cannot be rejected – There is no significant correlation
Use the following data to calculate r s independently Light (Lux) Rank R 1 Hedera helix leaf area cm 2 Rank R 2 d (R 1 - R 2 ) d2d2 (variable x)(variable y)
Key questions Is there a significant correlation? Which data value/s would you consider to be anomalous and why? Which graph would you use to present this data?