Download presentation
Presentation is loading. Please wait.
Published byPaula Maxwell Modified over 9 years ago
1
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable and p independent variables.
2
Multiple Regression Model Y i is value of dependent variable for i-th unit. The values x i1, x i2, …, x ip are values of the independent variables. Z i is an unobservable error:
3
Objectives Estimate the regression coefficients β 0, β 1, …, β p. Estimate σ (crucial for tests). Test whether the regression coefficients β 1, …, β p are all simultaneously zero (note that the intercept was left out). Test whether some of the regression coefficients β q, …, β p are zero.
4
Assumptions for Multiple Regression Regression function is linear. Error terms are independent. Constant error variance. Distribution of errors is normal.
5
Context of your second project Artificial data set, available on web site. Each set is individual. –If you analyze the wrong data set, no credit! Three dependent variables. –Three separate sections of your report! Six independent variables. 500 data points with replicated observations.
6
Check Scatterplots Use scatterplot matrix to get a brief summary look. –Graphs, scatterplot, matrix. If Y vs x i is flat and patternless, then your interpretation is that the regression coefficient of x i is xero. Two of the dependent variables are random samples.
7
Strategy 1 Enter all six independent variables (columns three through eight). –Statistics, regression, linear. Examine R 2 (easier to use sig of F statistic). If R 2 large (sig small), then that variable is not a random sample.
8
Analysis of variance table Three rows: regression, residual, and total. Five columns –degrees of freedom –sum of squares –mean square –F –sig
9
Table of regression coefficients Contains the OLS estimates. The line (constant) refers to β 0, the intercept. There is a line for each variable in the model that refers to β q, the partial regression coefficient (slope) of the q-th independent variable.
10
Table of regression coefficients Five columns of numbers Two are labeled “unstandardized coefficients” –B column contains the OLS estimates. –Std. Error contains the estimated standard deviation.
11
Table of regression coefficients One is the standardized coefficient. –Scale free coefficient often used in social science studies for comparison across studies. There is a column for t. –As usual, t=(B-0)/(se B). There is a column for sig. –Interpret as a p-value.
12
Interpretation There appears to be an association between an independent variable and the dependent variable if the observed significance level is small for that coefficient. Specify which variable has associations and the significant independent variables.
13
Refinement of Model Rerun regression using only those variables that appear to be significant. Usually, the database of a study has many variables that have no association with the dependent variable. Most clients prefer that these variables not be used. –There are some technical problems with this approach that are widely ignored.
14
Partial correlation coefficient Correlation between Y and X 2, “controlling for” X 1 (holding the variable “constant”) given by the equation:
15
Strategy 2: Stepwise Regression Let the computer do the work. In regression box, specify stepwise. The computer will see whether additional variables can be added or added variables deleted. There are three basic strategies: forward selection, backward selection, and stepwise.
16
Stepwise regression strategy Find independent variable with largest correlation with Y. Check whether that is significant. If no, stop. If yes, check second variable.
17
Stepwise regression strategy Find independent variable with highest partial correlation, controlling for first. If not significant, stop. If significant, check for a third variable. Find independent variable with highest partial controlling for first two.
18
Stepwise regression strategy Check whether its addition is significant. If no, stop. If yes, see whether the first or second step variable still adds. Continuing interating until there are no variables that can be added or deleted.
19
Using Stepwise Regression Examine final model selected. Note which variables are included. Examine information for excluded variables. –Check whether there is any possibility that one of the variables left out might matter.
20
Checking the Model Residual plots. Diagnostics. Lack of Fit test. More next class and after the exam.
21
Univariate Linear Regression Problem Model: Y= 0 + 1 X+ Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both null and alternative. Under null, var(Y)=σ 0 2. Under alternative, β 1 >0, and var(Y)=σ 1 2.
22
Step 1: Choose the test statistic and specify its null distribution Use conditions of the null to find:
23
Bringing sample size into regression design The sample size n is hidden in the regression results. That is, let:
24
Step 2: Define the critical value For the univariate linear regression test:
25
Step 3: Define the Rejection Rule Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.
26
Step 4: Specify the Distribution of Test Statistic under Alternative Use conditions of the null to find:
27
Step 5: Define a Type II Error For the univariate linear regression test:
28
Step 6: Find β For a univariate linear regression test:
29
Step 7: Phrase requirement on β That is, choose n so that (after algebraic clearing out):
30
Univariate Linear Regression Note that the σ 0 factor is changed to σ 0 /σ X. There is a similar adjustment for the alternative standard deviation.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.