Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.

Similar presentations


Presentation on theme: "Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable."— Presentation transcript:

1 Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable and p independent variables.

2 Multiple Regression Model Y i is value of dependent variable for i-th unit. The values x i1, x i2, …, x ip are values of the independent variables. Z i is an unobservable error:

3 Objectives Estimate the regression coefficients β 0, β 1, …, β p. Estimate σ (crucial for tests). Test whether the regression coefficients β 1, …, β p are all simultaneously zero (note that the intercept was left out). Test whether some of the regression coefficients β q, …, β p are zero.

4 Assumptions for Multiple Regression Regression function is linear. Error terms are independent. Constant error variance. Distribution of errors is normal.

5 Context of your second project Artificial data set, available on web site. Each set is individual. –If you analyze the wrong data set, no credit! Three dependent variables. –Three separate sections of your report! Six independent variables. 500 data points with replicated observations.

6 Check Scatterplots Use scatterplot matrix to get a brief summary look. –Graphs, scatterplot, matrix. If Y vs x i is flat and patternless, then your interpretation is that the regression coefficient of x i is xero. Two of the dependent variables are random samples.

7 Table of regression coefficients Contains the OLS estimates. The line (constant) refers to β 0, the intercept. There is a line for each variable in the model that refers to β q, the partial regression coefficient (slope) of the q-th independent variable.

8 Table of regression coefficients Five columns of numbers Two are labeled “unstandardized coefficients” –B column contains the OLS estimates. –Std. Error contains the estimated standard deviation.

9 Table of regression coefficients One is the standardized coefficient. –Scale free coefficient often used in social science studies for comparison across studies. There is a column for t. –As usual, t=(B-0)/(se B). There is a column for sig. –Interpret as a p-value.

10 Interpretation There appears to be an association between an independent variable and the dependent variable if the observed significance level is small for that coefficient. Specify which variable has associations and the significant independent variables.

11 Refinement of Model Rerun regression using only those variables that appear to be significant. Usually, the database of a study has many variables that have no association with the dependent variable. Most clients prefer that these variables not be used. –There are some technical problems with this approach that are widely ignored.

12 Strategy of Stepwise Regression Let the computer do the work. In regression box, specify stepwise. The computer will see whether additional variables can be added or added variables deleted. There are three basic strategies: forward selection, backward selection, and stepwise.

13 Using Stepwise Regression Examine final model selected. Note which variables are included. Examine information for excluded variables. –Check whether there is any possibility that one of the variables left out might matter.

14 Checking the Model Residual plots. Diagnostics. Lack of Fit test.

15 Residual Plots Always plot unstandardized residuals against unstandardized predicted. Plot unstandardized residuals against each independent variable in model. If there is a time order to data, plot residuals in time order.

16 Diagnostics Check for outliers. Check for influential points. –Cook’s distance is useful. Deleting point with largest Cook’s distance causes the greatest change in the coefficients. Box plot of residuals. Q-Q plot of residuals.

17 Lack of Fit Test Need replicated points (same settings of independent variables with different runs determining dependent variable). Your data has replicated points. Design your studies so that you can do a lack of fit test.

18 Approximate Lack of Fit Test Statistics, Compare Means, One-way anova. Dependent variable is residuals from regression model that you think is correct. Independent variable is the second column of your data set. Click OK.

19 Interpretation of Approximate Lack of Fit Test If F test near one (observed significance level large), then the model that generated the residuals “appears to be adequate.” That is, there is no empirical reason to go on. If F test is larger than one (small observed significance level), model should be improved.

20 Theory behind Lack of Fit Test One way analysis of variance. Covered next class. Happy Thanksgiving.


Download ppt "Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable."

Similar presentations


Ads by Google