Multiple Regression – Part II

1 Multiple Regression – Part II
Multiple Regression – Part II

2 Major Types of Multiple Regression
Standard multiple regression Sequential (hierarchical) regression Statistical (stepwise) regression R² = a + b + c + d + e R²= the squared multiple correlation; it is the proportion of variation in the DV that is predictable from the best linear combination of the IVs (i.e. coefficient of determination). R = correlation between the observed and predicted Y values (R = ryŶ ) a x2

3 Adjusted R2 Adjusted R2 = modification of R2 that adjusts for the number of terms in a model. R2 always increases when a new term is added to a model, but adjusted R2 increases only if the new term improves the model more than would be expected by chance. [Standard Error of the Estimate is the = Standard error of the predicted score (Ŷ)]

4 Standard (Simultaneous) Multiple Regression
all IVs enter into the regression equation at once; each one is assessed as if it had entered the regression after all other IVs had entered. each IV is assigned only the area of its unique contribution; the overlapping areas (b & d) contribute to R² but are not assigned to any of the individual IVs

5 Table 1: Regression of (DV) Assessment of Socialism in 2003 on (IVs) Social Status, controlling for Gender and Age **p <0.001; *p < 0.05; Interpretation of beta (standardized) coefficients: for a one standard deviation unit increase in X, we get a Beta standard deviation change in Y; Since variables are transformed into z-scores (i.e. standradized), we can assess their relative impact on the DV (assuming they are uncorrelated with each other) Independent variables Linear regression DV = scores from 1 to 5 B (unstandardized coefficient) Standard Error BETA (standardized coefficient) Model I: Effect of Social Status without Controlling for Lagged Assessment of Socialism Gender (Male=1) -0.044 0.069 -0.023 Age 0.011** 0.003 0.135 Social Status -0.207** 0.034 -0.217 Constant 2.504 0.131 N = 742; Fit statistics F= 15.5 (df=3) Adjusted R2=0.06 5

6 Sequential (hierarchical) Multiple Regression
- researcher specifies the order in which IVs are added to the equation; each IV/IVs are assessed in terms of what they add to the equation at their own point of entry; If X1 is entered 1st, then X2, then X3: X1 gets credit for a and b; X2 for c and d; X3 for e. IVs can be added one at a time, or in blocks a

7 Std. Error of the Estimate
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics R Square Change F Change df1 df2 Sig. F Change 1 ,109a ,012 ,011 ,80166 9,128 2 1524 ,000 ,200b ,040 ,037 ,79101 ,028 14,772 3 1521 a. Predictors: (Constant), age1998, gender 1998, female=0 b. Predictors: (Constant), age1998, gender 1998, female=0, tertiary 1998 = 1, else =0, emlement 1998 = 1, else =0 The Regression SUM of SQUARES, SS(regression) = SS(total) + SS(residual) SSregression = Sum (Ŷ – Ybar)² = portion of variation in Y explained by the use of the IVs as predictors; SStotal = Sum (Y - Ybar)² SSresidual = Sum (Y- Ŷ)² - the squared sum of errors in predictions R² = SSreg/SStotal

8 ANOVA The Regression MEAN SQUARE : MSS(regression) = SS(regression) / df, df = k where k = no. of variables The MEAN square residual (error): MSS(residual) = SS(residual) / df, df= n - (k + 1) where n = no. of cases and k= no. of variables. Model Sum of Squares df Mean Square F Sig. 1 Regression 11,732 2 5,866 9,128 ,000a Residual (error) 979,415 1524 ,643 Total 991,147 1526 39,460 5 7,892 12,613 ,000b 951,687 1521 ,626 c. Dependent Variable: eval soc 1998 categories

9 Hypothesis Testing with (Multiple) Regression F – test
The null hypothesis for the regression model: Ho: b1 = b2 = … = bk = 0 MSS(model) F = MSS(residual) The sampling distribution of this statistic is the F-distribution

10 Unstandardized Coefficients Standardized Coefficients
Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 1,661 ,095 17,510 ,000 gender 1998, female=0 -,010 ,041 -,006 -,242 ,809 age1998 ,008 ,002 ,109 4,270 2 1,762 ,096 18,330 Gender (female=0) -,007 -,004 -,171 ,864 Age in 1998 ,006 ,090 3,330 ,001 Elementar educ 1998 = 1, else =0 ,070 ,054 ,036 1,282 ,200 tertiary educ 1998 = 1, else =0 -,223 ,052 -,115 -4,258 Estimated income for 1998 (in z-scores) -,058 ,020 -,077 -2,960 ,003 a. Dependent Variable: eval soc 1998 categories

11 t – test for the effect of each independent variable
The Null Hypothesis for individual IVs The test of H0: bi = 0 evaluates whether Y and X are statistically dependent, ignoring other variables. We use the t statistic b t = σB where σB is a standard error of B SS(residual) σB = n - 2 LIMITATION: tests is sensitive only to unique variance an IV adds to R2. A v. impt. variable that shares variance with another IV in the analysis may be nonsignificant although the 2 IV in combination are responsible in large part for the size of R2. That’s why good idea to report correl. Btw. each IV and DV In sequential regression (and statistical), t is only for b and beta NOT for the squared semi-partial correlation. Regression coeff are indep. of order of entry of IVs in regression model, while sri2 = directly depends on order of IV in the analysis. SPSS provides significance tests for sri2 in Sig F Change (Model Summary)

12 Assessing the importance of IVs
if IVs are uncorrelated w. each other: compare standardized coefficients (betas); higher absolute values of betas reflect greater impact; if the IVs are correlated w. each other: compare total relation of the IV with the DV, and of IVs with each other using bivariate correlations; compare the unique contribution of an IV to predicting the DV = generally assessed through partial or semi-partial correlations In partial correlation (pr), the contribution of the other IVs is taken out of both the IV and the DV; In semi-partial correlation (sr), the contribution of the other IVs is taken out of only the IV  (squared) sr shows the unique contribution of the IV to the total variance of the DV

13 Assessing the importance of IVs – continued
In standard multiple regression, sr² = the unique contribution of the IV to R² in that set of IVs (for an IV, sr² = the amount by which R² is reduced, if that IV is deleted from the equation) If IVs are correlated: usually, sum of sri² < R² the difference R² - sum of sri² for all IVs = shared variance (i.e. variance contributed to R² by 2/more variables) Sequential regression: sri² = amount of variance added to R² by each IV at the point that it is added to the model In SPSS output sri² is „R² Change” for each IV in „Model Summary” Table

14 It suppresses variance that is irrelevant to prediction of DV
Suppressor Variables = IV which helps predicting DV & increases R² due to its correlation with other IVs. It suppresses variance that is irrelevant to prediction of DV traditional/classical suppression; cooperative/reciprocal suppression; negative/net suppression Output: compare simple correlation btw. each IV & DV, with the standardized regression coefficient (beta weight) for the IV. If beta = significant, look if: the absolute value of the simple correlation btw. IV and DV = much smaller than beta; the simple correlation ceoff. & beta have opposite signs. If more than 2,3 IVs - difficult to identify suppressor.

15 Interaction Terms; centering
if reasonable to assume that the importance of IV1 varies over the range of IV2  interaction (compute IV1_2= IV1 * IV2). Centering: convert the IVs that form the interaction to deviation scores (Xi – Mean for X)  each variable will have mean=0 Why do it? possible problems w. multicollinearity does not affect correlation w. other variables; unstandardized regression coeff (bs) for the simple terms (b1, b2) are the same as when uncentered; affects bs for interactions (& powers) of IVs included in the regression; the betas (stadradized coeff) are different for all effects

