Download presentation
Presentation is loading. Please wait.
1
Stats Club Marnie Brennan
Correlation Stats Club Marnie Brennan
2
Have you used correlation before?
3
References Petrie and Sabin - Medical Statistics at a Glance: Chapter 26 Good Petrie and Watson - Statistics for Veterinary and Animal Science: Chapter 10 & 12 (12.7) Good Kirkwood and Sterne – Essential Medical Statistics: Chapter 10 & 30 Thrusfield – Veterinary Epidemiology: Chapter 14 (pges )
4
Other reads Mathematics Learning Support Centre – Statistics: Correlation - Good Bewick, V, Cheek, L and Ball, J (2003) Statistics review 7: Correlation and regression. Critical Care, Vol. 7, Swinscow, TDV (1976) Statistics at square one: XVIII – Correlation. British Medical Journal, Vol. 2, Swinscow, TDV (1976) Statistics at square one: XIX – Correlation (continued). British Medical Journal, Vol. 2, Swinscow, TDV (1976) Statistics at square one: XIX – Correlation (concluded). British Medical Journal, Vol. 2,
6
What is correlation? Use it to look at the relationship between two continuous (numerical) variables To see if you have a linear relationship between them If you interchanged the x and y axes, you would still have the same relationship To measure the degree of the relationship
7
How does it differ from other calculations?
T-tests and ANOVAs These measure the differences between subsets within your data/between groups Regression (this will be covered in a later session) This describes the linear relationship between two variables Describes how one variable (independent) predicts the other (dependent) You cannot interchange the variables between the x and y axes
8
When would you use correlation?
To see if there is a relationship between two variables, and how strong it is As a prequel to linear regression If variables are highly correlated (or collinear), this will effect how they interact in a linear regression calculation Therefore, you need to know whether variables are correlated or not before you do a linear regression Other reasons??
9
When wouldn’t you use correlation!
To compare two different methods of measurement on the same thing (reliability) E.g. How many neutrophils are in a blood sample? Compare using an IDEXX machine versus counting on a blood smear To compare the same method of measurement but used multiple times (reproducibility) E.g. How many neutrophils are in a blood sample? Preparing two blood smears from the same sample, and comparing the results Kappa (and associated) analysis – covered in a previous session Categorical data
10
First step – descriptive stats
Scatter plot What does the relationship look like?
11
Shape and what does it mean?
Does it increase or decrease as the values get higher? Does the relationship look linear? How ‘steep’ is the slope of the shape? Are there any outliers? Taken from Petrie and Sabin
12
Scatter plot examples
13
Demonstration if required….
14
Demonstration if required….
GenStat Graphics, 2D Scatter Plot (Y variate, X variate) Run
15
Measurement for correlation
Correlation coefficient Numerical representation of the degree of association between the variables Between -1 and +1 Positive – the relationship is increasing with increasing values Negative – the relationship is decreasing with decreasing values It is dimensionless (spooky….) No units of measurement
16
What are the assumptions that have to be satisfied?
There is a linear relationship between the variables There are no outliers present There are no subgroups within the data that affect the relationship e.g. sex Multiple data from the same subjects Taken from Petrie and Sabin
17
Represented as r (rho) Calculation of correlation coefficient
Pearson’s coefficient – r If our null hypothesis is that there is no relationship e.g. the correlation coefficient is zero At least one variable has to be normally distributed Taken from Petrie and Sabin
18
P-values and confidence intervals
As standard with p-values (you can calculate confidence intervals – not normally included) You state your cut off e.g. if p<0.05, reject the null hypothesis and conclude that there is a relationship A significant result is not necessarily CAUSAL There is an ASSOCIATION
19
Minitab
20
Minitab – Pearsons
21
SPSS
22
SPSS - Pearsons
23
GenStat GenStat Stat, summary, correlations
24
What if our assumptions cannot be met?
Use Spearman’s rank correlation coefficient (or Kendall’s coefficient) Rank your variables from lowest to highest, and then calculate the correlation coefficient Still between -1 and 1 Remove outliers Rule of thumb – if the outliers are outside +/-2 standard deviations away from the group mean, you can remove them Transform data into a linear relationship????
25
Minitab – Spearman’s
26
Minitab – Spearman’s
27
SPSS – Spearman’s
28
SPSS – Spearman’s
29
GenStat GenStat Stat, summary, correlations Rank??
30
How does this fit with what you do or have seen/experienced?
31
Summary Make sure you are using correlation in the correct way
Eyeball first with a scatter plot – look for strength, slope, relationship, outliers If it doesn’t fit the assumptions, use a non-parametric equivalent e.g. Spearman’s
32
Next time… Linear regression……
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.