Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variance and covariance Sums of squares General linear models.

Similar presentations


Presentation on theme: "Variance and covariance Sums of squares General linear models."— Presentation transcript:

1 Variance and covariance Sums of squares General linear models

2

3 The coefficient of correlation For a matrix X that contains several variables holds The matrix R is a symmetric distance matrix that contains all correlations between the variables The diagonal matrix  X contains the standard deviations as entries. X-M is called the central matrix. We deal with samples

4 Linear regression European bat species and environmental correlates

5 There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line There is a hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line There is no clear hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line Assumptions of linear regression

6 N=62 Matrix approach to linear regression X is not a square matrix, hence X -1 doesn’t exist.

7 The species – area relationship of European bats What about the part of variance explained by our model? 1.16: Average number of species per unit area (species density) 0.24: spatial species turnover

8

9 How to interpret the coefficient of determination Statistical testing is done by an F or a t-test. Total variance Rest (unexplained) variance Residual (explained) variance

10

11 The general linear model A model that assumes that a dependent variable Y can be expressed by a linear combination of predictor variables X is called a linear model. The vector E contains the error terms of each regression. Aim is to minimize E.

12 The general linear model If the errors of the preictor variables are Gaussian the error term e should also be Gaussian and means and variances are additive Total variance Explained variance Unexplained (rest) variance

13 1.Model formulation 2.Estimation of model parameters 3.Estimation of statistical significance Multiple regression

14 Multiple R and R 2

15 Adjusted R 2 R: correlation matrix n: number of cases k: number of independent variables in the model D<0 is statistically not significant and should be eliminated from the model.

16 A mixed model

17 The final model Is this model realistic? Very low species density (log-scale!) Realistic increase of species richness with area Increase of species richness with winter length Increase of species richness at higher latitudes A peak of species richness at intermediate latitudes The model makes realistic predictions. Problem might arise from the intercorrelation between the predictor variables (multicollinearity). We solve the problem by a step-wise approach eliminating the variables that are either not significant or give unreasonable parameter values The variance explanation of this final model is higher than that of the previous one.

18 Multiple regression solves systems of intrinsically linear algebraic equations The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable. Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model. Statistical inference assumes that errors have a normal distribution around the mean. The model assumes linear (or algebraic) dependencies. Check first for non-linearities. Check the distribution of residuals Y exp -Y obs. This distribution should be random. Check the parameters whether they have realistic values. Multiple regression is a hypothesis testing and not a hypothesis generating technique!! Polynomial regression General additive model

19 Standardized coefficients of correlation Z-tranformed distributions have a mean of 0 an a standard deviation of 1. In the case of bivariate regression Y = aX+b, R xx = 1. Hence B=R XY. Hence the use of Z-transformed values results in standardized correlations coefficients, termed  -values


Download ppt "Variance and covariance Sums of squares General linear models."

Similar presentations


Ads by Google