Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond to larger/smaller values of the other variable. r measures the strenght. From plus one to minus one, zero – no relationship; one – on a straight line. p measures stat significance, significance of r differing from zero. Parameetric or Pearson correlation assumes normal distribution of both variables.
We start calculating Pearson’s r from calculating covariance: ... which is not convenient, so let’s rescale it: .... and the result is between –1 ja +1
r = 0.96 r = 0.53 r = 0.43 r = -1 r = -0.83 r = -0.96
Non-parametric correlation relies on ranks. Single observations far away do not disturb. Usually Spearman’s (rank) correlation. Power is lower, But also real differences – what to think about a non-linear relationship? Ordinal variables. Philosophical aspect – we can describe the same thing differently in mathematical terms!
We report the result: “between …. there was a correlation (r= , N=, p= )” or if non-parametric then “..... (rs= ; N=, p= )” Symmetrical and dimensionless. To appoximate the relationship by a function - regression. Least-squares method – residuals predicting: predicted value. The fitted line has two parameters: intercept and slope (b). Slope has a unit, value depends on the units of the axes.
eggs laid weight, kg y = 2,04x – 1,2
wool production, kg hours basked y = -0,195x + 7,1
Test following the path of ANOVA F=MSmodel/MSerror SStotal=SSmodel+SSerror, R2 = SSmodel/SStotal model acconts ... % of variance. Two ways to express strength – slope and R2 , p does not measure the strength of the relationship.
Presenting results “weight depended on length (b=..., R2= ....., df=....., F= ..., p<0.001)” equation: length = 3.78*temperature + 47.6 Standard error of slope Intercept zero – proportional, if x changes k times, then also y changes k. Regression is not symmetrical!
Assumptions of regression analysis are as follows: - residuals should be normally distributed; - variance of residuals must be independent on the values of x – otherwise heteroscedastic. - no other dependence on x; Distribution of x variable not important. Transformations – but do not forget when writing the equation. Regression through the origin.