Use Pearson’s correlation Let’s say you want to test the association between cortisol levels in the blood and hours per week studying statistics Use Pearson’s correlation
Pearson correlation coefficient Used to test for linear associations between two continuous, (normally distributed) variables Unitless Values range from – 1 to + 1 0 indicates no linear correlation + 1 indicates perfect positive linear correlation – 1 indicates perfect negative linear correlation Negative association Positive association -1 +1 Stronger Weaker Weaker Stronger No association: Value under H0
Same line, difference correlation
How Pearson correlation works Establish alpha (say, 0.05). Start with a null hypothesis. H0: There is no linear association between cortisol levels and time spent in the wards. ρxy = 0 3. Compute a test statistic, called Pearson’s r.
Final steps for Pearson correlation 4. Compare rxy to a known distribution of Pearson correlation coefficients to obtain a p-value. 5. Make a decision about rejecting H0. As usual, if p > α, we do not reject H0; if p < α, we reject H0. Source: http://www.radford.edu/~jaspelme/statsbook/Chapter%20files/Table_of_Critical_Values_for_r.pdf
Stressed medical students example Establish alpha: α = 0.05. Write your null hypothesis: There is no association between average number of hours per week spent at the wards and cortisol levels. (ρxy = 0) Compute rxy, the test statistic. rxy = 0.736
(degrees of freedom = n – 2) Last steps 4. Compare rxy to a known distribution of r. (degrees of freedom = n – 2) 5. Make a decision about H0: Since p > α, we do not reject H0. rxy = 0.736
Correlation coefficient interpretations rxy rxy = 1 = - 1 ≈ 0.8 ≈ - 0.8 ≈ 0.5 ≈ - 0.5 ≈ 0 ≈ - 0.2
Caveat #1: Slope of the line The slope of the best-fit line does not dictate the strength of the association Only the relative distance of the data points from the best-fit determines the association rxy = 1 for all
Caveat #2: Must be a linear association Pearson’s r measures the strength of the linear association between two continuous variables Some variables may be related to each other, but not linearly Some associations may be positive or negative, but not linearly related rxy = 0 for all
Caveat #3: Outliers rxy = 0.80 rxy = 0.88 rxy = 0.54 Outliers often distort the linear association rxy = 0.80 rxy = 0.88 rxy = 0.54