Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.

Linear regression models

Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single predictor variable (x-axis) To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained To predict new values of Y from new values of X

The linear regression model is: ß 0 = population intercept (when x i =0) ß 1 = population slope, measures the change in Y per unit change in X ε i = random or unexplained error associated with the i th observation.

Linear relationship Y X ß0ß0 ß1ß1 1.0

Linear models approximate non-linear functions over a limited domain extrapolation interpolation

μ yi = β o + β 1 *x i + є i x1x1 x2x2 μ y1 μ y2 yiyi y i pred y i –y pred = residual xixi Fitting data to a linear model

The residual The residual sum of squares

The “best fit” estimates are the values that minimize the residual sum of squares (RSS) between each observed value and the predicted value of the model

Sum of squares Sun of cross products

Variance Covariance

Least-squares parameter estimates where

To solve the intercept

Variance of the error of regression

Variance components and Coefficient of determination

Coefficient of determination

Product-moment correlation coefficient

ANOVA table for regression SourceDegrees of freedom Sum of squaresMean square Expected mean square F ratio Regression 1 Residual n-2 Total n-1

Publication form of ANOVA table for regression Source Sum of Squaresdf Mean SquareFSig. Regression 11.4791 21.0440.00035 Residual 8.18215.545 Total 19.66116

Variance of estimated intercept

Variance of the slope estimator

Variance of the fitted value

Regression

Assumptions of regression The linear model correctly describes the functional relationship between X and Y The X variable is measured without error For a given value of X, the sampled Y values are independent with normally distributed errors Variances are constant along the regression line

Residual plot for species-area relationship

The influence function

Logistic regression

Height vs. survival in Hypericum cumulicola

Multiple regression

Relative abundance of C 3 and C 4 plants Paruelo & Lauenroth (1996) Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C 3 grasses and C 4 grasses.

data Relative abundance of PTFs (based on cover, biomass, and primary production) for each site Longitude Latitude Mean annual temperature Mean annual precipitation Winter (%) precipitation Summer (%) precipitation Biomes (grassland, shrubland) 73 sites across temperate central North America Response variablePredictor variables

Box 6.1 Relative abundance transformed ln(dat+1) because positively skewed

Collinearity Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers) Standard errors of the estimated regression slopes are inflated

Detecting collinearlity Check tolerance values Plot the variables Examine a matrix of correlation coefficients between predictor variables

Dealing with collinearity Omit predictor variables if they are highly correlated with other predictor variables that remain in the model

Additive model (log 10 C 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (map)+ β 4 (mat)+ β 5 (JJAmap)+ β 6 (DJFmap) (lnC 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (map)+ β 4 (mat)+ β 5 (JJAmap)+ β 6 (DJFmap) Adding 0.1

Additive model (log 10 C 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (map)+ β 4 (mat)+ β 5 (JJAmap)+ β 6 (DJFmap) (lnC 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (map)+ β 4 (mat)+ β 5 (JJAmap)+ β 6 (DJFmap) Adding 0.1 Adding 1

(lnC 3 )= β o + β 1 (lat)+ β 2 (long)+ β 3 (latxlong) After centering both lat and long

If we omit the interaction and refit the model, the partial regression slope for latitude changes The collinearity problems disappear

R 2 =0.514

Matrix algebra approach to OLS estimation of multiple regression models Y=bX+ε X’Xb=X’Y b=(X’X) -1 (X’Y)

The forward selection is

The backward selection is

Adjusted r 2

Akaike information criteria

Bayesian information criteria

Hierarchical partitioning and model selection No predmodelr2r2 Adjr 2 CpCp AICSchwarz BIC 1 Lon 0.00005-0.014072.89-159.90-155.32 1 Lat 0.46180.45427.37-205.12-200.54 1 Lon x Lat 0.0062-0.007872.01-160.35-155.77 2 Lon + Lat 0.46710.45198.61-203.85-196.98 3 Long +Lat + Lon x Lat 0.51370.49264-208.53-199.67

Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.

Similar presentations

Presentation on theme: "Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.

Similar presentations

Presentation on theme: "Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single."— Presentation transcript:

Similar presentations

About project

Feedback