Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and linear regression

Similar presentations


Presentation on theme: "Correlation and linear regression"— Presentation transcript:

1 Correlation and linear regression
Lecture 12 Correlation and linear regression The least squares method of Carl Friedrich Gauß. OLRy y = ax + b Dy2 Dy

2 Correlation coefficient
Covariance Variance Correlation coefficient Slope a and coefficient of correlation r are zero if the covariance is zero. Coefficient of determination

3 Relationships between macropterous, dimorphic and brachypterous ground beetles on 17 Mazurian lake islands Positive correlation; r =r2= 0.41 The regression is weak. Macropterous species richness explains only 17% of the variance in brachypterous species richness. We have some islands without brachypterous species. We really don’t know what is the independent variable. There is no clear cut logical connection. Positive correlation; r =r2= 0.67 The regression is moderate. Macropterous species richness explains only 45% of the variance in dimorphic species richness. The relationship appears to be non-linear. Log-transformation is indicated (no zero counts). We really don’t know what is the independent variable. There is no clear cut logical connection.

4 Negative correlation; r =r2= -0.48
The regression is weak. Island isolation explains only 23% of the variance in brachypterous species richness. We have two apparent outliers. Without them the whole relationship would vanish, it est R2  0. Outliers have to be eliminated fom regression analysis. We have a clear hypothesis about the logical relationships. Isolation should be the predictor of species richness. No correlation; r =r2= 0.06 The regression slope is nearly zero. Area explains less than 1% of the variance in brachypterous species richness. We have a clear hypothesis about the logical relationships. Area should be the predictor of species richness.

5 The matrix perspective
X is not quadratic. It doesn’t possess an inverse

6 Variance Covariance

7 is square and symmetric
Covariances Variances The covariance matrix is square and symmetric

8 Non-linear relationships
Ground beetles on Mazurian lake islands Linear function Logarithmic function Power function The species – individuals relationship are obviously non-linear. The power function has the highest R2 and explains therefore most of the variance in species richness. The coefficient of determination is a measure of goodness of fit. Intercept Slope

9 Having more than one predictor
Describe species richness in dependence of numbers of individuals, area, and isolation of islands. We need a clear hypothesis about dependent and independent predictors. Use a block diagram. Individuals Area Isolation Species

10 Collinearity Predictors are not independent. Numbers of individuals depends on area and degree of isolation. We need linear relationships Individuals Area Isolation Species We use ln transformed variables of species, area, and individuals. Check for multicollinearity using a correlation matrix. We check for non-linearities using plots. The correlation between area and individuals is highly significant. The probability of H0 = Of the predictors area and individuals are highly correlated. In linear regression analysis correlations of predictors below 0.7 are acceptable.

11 The final data for our analysis
The predictor variables have to contain different information. If X is singular no inverse exists The vector Y contains the response variable The matrix X contains the effect (predictor) variables Multiple linear regression The model

12 The probability that R2 is zero is only 0.01%.
With 99.9% R2 > 0 and hence statistically significant. The model explains 78.6 % of variance in species richness. 21.4% of avriance remains unexplained. The probabilities that the coefficients deviate from zero. Isolation is not a significant predictor.

13 What distance to minimize?
OLRy Dy2 Dx2 OLRx Model I regression

14 RMA Dx Dy Reduced major axis regression is the geometric average of aOLRy and aOLRx Model II regression

15 Past standard output of linear regression
Reduced major axis Parameters and standard errors Parametric probability for r = 0 Permutation test for statistical significance Both tests indicate that Brach and Macro are not significantly correlated. The RMA regression slope is insignificant. We don’t have a clear hypothesis about the causal relationships. In this case RMA is indicated.

16 Permutation test for statistical significance
Observed r S N2.5 = 25 S N2.5 = 25 Lower CL m > 0 Upper CL Calculating confidence limits Randomize 1000 times x or y. Calculate each time r. Plot the statistical distribution and calculate the lower and upper confidence limits. Rank all 1000 coefficients of correlation and take the values at rank positions 25 and 975.

17 The RMA regression has a much steeper slope.
This slope is often intuitively better. Upper CL The coefficient of correlation is independent of the regression method Lower CL In OLRy regression insignificance of slope means also insignificance of r and R2. The 95% confidence limit of the regression slope mark the 95% probability that the regression slope is within these limits. The lower CL is negative, hence the zero slope is with the 95% CL.

18 Outliers should be eliminated from regression analysis.
OLRy Outliers have an overproportional influence on correlation and regression. Dy2 Dy Outliers should be eliminated from regression analysis. rPearson = 0.79 Normal correlation on ranked data Instead of the Pearson coefficient of correlations use Spearman’s rank order correlation. rSpearman = 0.77

19 Home work and literature
Refresh: Coefficient of correlation Pearson correlation Spearman correlation Linear regression Non-linear regression Model I and model II regression RMA regression Prepare to the next lecture: F-test F-distribution Variance Literature: Łomnicki: Statystyka dla biologów


Download ppt "Correlation and linear regression"

Similar presentations


Ads by Google