Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept. 2010
Why Multiple Regression? In real world, using only one predictor (IV) to interpret or predict a outcome variable (DV) is rare. Mostly, we need several IV’s. Multiple regression (Pearson, 1908) is to investigate the relationship between several independent or predictor variables and a dependent or criterion variable.
The prediction equation in multiple regression Y’ = predicted Y score a = intercept b 1 … b k = regression coefficients X 1 … X k = scores of IVs With two IV’s:
Calculation of basic statistics 1 Calculation with two IV’s is similar to one IV. However, it is not hard but tedious. We need knowledge of matrix operations to perform calculations with 3 or more IV’s. Good news is that we can have the computer do the calculations!
Calculation of basic statistics 2
Calculation of basic statistics 3
Why calculations, as always? Intercept (a) & regression coefficients (b’s) !
Brain exercise Now, we have the regression line! What’s next? The predicted Y or Y’! Then what? Deviation due to regression ( ) and the regression sum of squares ( ). Deviation due to residuals ( ) and the residual sum of squares ( ).
Sum of squares Recall that we have plenty ways to calculate the sum of squares. Some methods allow us to calculate sum of squares without using Y’: Remember, we need Y’ to calculate residuals, which are essential for regression diagnostics (chapter 3).
Squared multiple correlation coefficient R-square indicates the proportion of variance of the DV (Y) accounted for by the IV’s (X’s). Note that R 2 is equivalent to for two IV’s.
Test of significance of R 2 F test: if R 2 is significantly different from 0. Rule of thumb: We reject H 0 when the calculated F is greater than the table (critical) value or the calculated probability is less than α. significance level fail to reject H 0 reject H 0 F critical Probability, p
Test of significance of individual b’s T-test (mostly two-tailed, except that we can rule out one direction): if b is significantly different from 0. Rule of thumb: We reject H 0 when the absolute value of calculated T is greater than the table (critical) value or the calculated probability is less than α. fail to reject H 0 reject H 0
Test of R 2 vs. test of b Test of R 2 is equivalent to testing all the b’s simultaneously. Test of a given b for significance is to determine whether it differs from 0 while controlling for the effects of the other IV’s. For simple linear regression, they are equivalent ( ).
Confidence interval Definition: If an experiment was repeated many times, 100(1-α)% of these intervals would contain µ. If the CI does not include 0, we reject H 0 and conclude that the given regression coefficient significantly differs from 0.
Test of increments in proportion of variance accounted for (R 2 change) In multiple linear regression, we could test amount of R 2 increases or decreases when a given IV or a set of variables are added to or deleted from the regression equation.
Test of increments in proportion of variance accounted for (R 2 change) The test is equivalent to testing significance of individual b if one IV is added to or deleted from the regression equation. Note that the R 2 change caused by a given IV or a set of IV’s depends on the order of addition or deletion.
Commonly used methods of adding or deleting variables Enter: enter all IV’s at once in a single model Stepwise: enter IV’s one by one in several models commonly based on R 2 Forward: enter IV’s one by one based on strength of correlation with DV. Backward: enter all IV’s and delete weakest one unless it significantly affects the model. Hierarchical: enter IV’s (one or more at a time) according to certain theoretical framework.
Standardized regression coefficient (β, beta) In SPSS (now PASW) output, we have something like this: Is it a population parameter?
Standardized regression coefficient (β, beta) Sample unstandardized regression coefficient (b) is the expected change in Y associated with one measurement unit change of in X. Sample standardized regression coefficient (β) is the expected change in standard deviation of Y associated with a change of one standard deviation in X.
Standardized regression coefficient (β, beta) The regression equation now is: Note that the α disappears because standardized score for a constant is always 0. β could be used to determine the relative contribution of individual IV to account for variance in DV.
What about the correlation coefficients (r’s)? Later, we will discuss the correlation coefficients in details, mostly in chapter 7 (Statistical Control: Partial and Semipartial Correlation).
Remarks Multiple regression is an upgraded version of simple linear regression and its interpretation is similar to simple linear regression. We need emphasize on contributions of each individual IV’s. To some extent, multiple IV’s have better explanation and prediction on the DV – it is not always true.