Download presentation
Presentation is loading. Please wait.
1
Variance and covariance M contains the mean Sums of squares General additive models
3
The coefficient of correlation For a matrix X that contains several variables holds The matrix R is a symmetric distance matrix that contains all correlations between the variables The diagonal matrix X contains the standard deviations as entries. X-M is called the central matrix. We deal with samples
5
Pre-and postmultiplication Premultiplication Postmultiplication For diagonal matrices X holds
6
Linear regression European bat species and environmental correlates
7
N=62 Matrix approach to linear regression X is not a square matrix, hence X -1 doesn’t exist.
8
The species – area relationship of European bats What about the part of variance explained by our model? 1.16: Average number of species per unit area (species density) 0.24: spatial species turnover
10
How to interpret the coefficient of determination Statistical testing is done by an F or a t-test. Total variance Rest (unexplained) variance Residual (explained) variance
12
The general linear model A model that assumes that a dependent variable Y can be expressed by a linear combination of predictor variables X is called a linear model. The vector E contains the error terms of each regression. Aim is to minimize E.
13
The general linear model If the errors of the preictor variables are Gaussian the error term e should also be Gaussian and means and variances are additive Total variance Explained variance Unexplained (rest) variance
14
1.Model formulation 2.Estimation of model parameters 3.Estimation of statistical significance Multiple regression
15
Multiple R and R 2
16
The coefficient of determination y x1x1 x2x2 xmxm The correlation matrix can be devided into four compartments.
18
Adjusted R 2 R: correlation matrix n: number of cases k: number of independent variables in the model D<0 is statistically not significant and should be eliminated from the model.
19
A mixed model
20
The final model Is this model realistic? Negative species density Realistic increase of species richness with area Increase of species richness with winter length Increase of species richness at higher latitudes A peak of species richness at intermediate latitudes The model makes a series of unrealistic predictions. Our initial assumptions are wrong despite of the high degree of variance explanation Our problem arises in part from the intercorrelation between the predictor variables (multicollinearity). We solve the problem by a step- wise approach eliminating the variables that are either not significant or give unreasonable parameter values The variance explanation of this final model is higher than that of the previous one.
21
Multiple regression solves systems of intrinsically linear algebraic equations The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable. Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model. Statistical inference assumes that errors have a normal distribution around the mean. The model assumes linear (or algebraic) dependencies. Check first for non-linearities. Check the distribution of residuals Y exp -Y obs. This distribution should be random. Check the parameters whether they have realistic values. Multiple regression is a hypothesis testing and not a hypothesis generating technique!! Polynomial regression General additive model
22
Standardized coefficients of correlation Z-tranformed distributions have a mean of 0 an a standard deviation of 1. In the case of bivariate regression Y = aX+b, R xx = 1. Hence B=R XY. Hence the use of Z-transformed values results in standardized correlations coefficients, termed -values
23
How to interpret beta-values If then Beta values are generalisations of simple coefficients of correlation. However, there is an important difference. The higher the correlation between two or more predicator variables (multicollinearity) is, the less will r depend on the correlation between X and Y. Hence other variables might have more and more influence on r and b. For high levels of multicollinearity it might therefore become more and more difficult to interpret beta-values in terms of correlations. Because beta-values are standardized b-values they should allow comparisons to be make about the relative influence of predicator variables. High levels of multicollinearity might let to misinterpretations. Beta values above one are always a sign of too high multicollinearity Hence high levels of multicollinearity might reduce the exactness of beta-weight estimates change the probabilities of making type I and type II errors make it more difficult to interpret beta-values. We might apply an additional parameter, the so-called coefficient of structure. The coefficient of structure c i is defined as where r iY denotes the simple correlation between predicator variable i and the dependent variable Y and R 2 the coefficient of determination of the multiple regression. Coefficients of structure measure therefore the fraction of total variability a given predictor variable explains. Again, the interpretation of c i is not always unequivocal at high levels of multicollinearity.
24
Partial correlations Semipartial correlation A semipartial correlation correlates a variable with one residual only. The partial correlation r xy/z is the correlation of the residuals X and Y
25
Path analysis and linear structure models Multiple regression Path analysis tries to do something that is logically impossible, to derive causal relationships from sets of observations. Path analysis defines a whole model and tries to separate correlations into direct and indirect effects The error term e contain the part of the variance in Y that is not explained by the model. These errors are called residuals Regression analysis does not study the relationships between the predictor variables
26
Path analysis is largely based on the computation of partial coefficients of correlation. Path coefficients Path analysis is a model confirmatory tool. It should not be used to generate models or even to seek for models that fit the data set. We start from regression functions
27
From Z-transformed values we get eZ Y = 0 Z Y Z Y = 1 Z X Z Y = r XY Path analysis is a nice tool to generate hypotheses. It fails at low coefficients of correlation and circular model structures.
28
Non-metric multiple regression
29
Statistical inference Rounding errors due to different precisions cause the residual variance to be larger than the total variance.
30
Logistic and other regression techniques We use odds The logistic regression model
32
Generalized non-linear regression models A special regression model that is used in pharmacology b 0 is the maximum response at dose saturation. b 1 is the concentration that produces a half maximum response. b 2 determines the slope of the function, that means it is a measure how fast the response increases with increasing drug dose.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.