Download presentation
Presentation is loading. Please wait.
1
Multiple Regression – Part II tomescu.1@sociology.osu.edu
Advanced Statistical Methods: Continuous Variables Multiple Regression – Part II
2
Major Types of Multiple Regression
Standard multiple regression Sequential (hierarchical) regression Statistical (stepwise) regression R² = a + b + c + d + e R²= the squared multiple correlation; it is the proportion of variation in the DV that is predictable from the best linear combination of the IVs (i.e. coefficient of determination). R = correlation between the observed and predicted Y values (R = ryŶ ) a x2
3
Adjusted R2 Adjusted R2 = modification of R2 that adjusts for the number of terms in a model. R2 always increases when a new term is added to a model, but adjusted R2 increases only if the new term improves the model more than would be expected by chance. [Standard Error of the Estimate is the = Standard error of the predicted score (Ŷ)]
4
Standard (Simultaneous) Multiple Regression
all IVs enter into the regression equation at once; each one is assessed as if it had entered the regression after all other IVs had entered. each IV is assigned only the area of its unique contribution; the overlapping areas (b & d) contribute to R² but are not assigned to any of the individual IVs
5
Table 1: Regression of (DV) Assessment of Socialism in 2003 on (IVs) Social Status, controlling for Gender and Age **p <0.001; *p < 0.05; Interpretation of beta (standardized) coefficients: for a one standard deviation unit increase in X, we get a Beta standard deviation change in Y; Since variables are transformed into z-scores (i.e. standradized), we can assess their relative impact on the DV (assuming they are uncorrelated with each other) Independent variables Linear regression DV = scores from 1 to 5 B (unstandardized coefficient) Standard Error BETA (standardized coefficient) Model I: Effect of Social Status without Controlling for Lagged Assessment of Socialism Gender (Male=1) -0.044 0.069 -0.023 Age 0.011** 0.003 0.135 Social Status -0.207** 0.034 -0.217 Constant 2.504 0.131 N = 742; Fit statistics F= 15.5 (df=3) Adjusted R2=0.06 5
6
Sequential (hierarchical) Multiple Regression
- researcher specifies the order in which IVs are added to the equation; each IV/IVs are assessed in terms of what they add to the equation at their own point of entry; If X1 is entered 1st, then X2, then X3: X1 gets credit for a and b; X2 for c and d; X3 for e. IVs can be added one at a time, or in blocks a
7
Std. Error of the Estimate
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics R Square Change F Change df1 df2 Sig. F Change 1 ,109a ,012 ,011 ,80166 9,128 2 1524 ,000 ,200b ,040 ,037 ,79101 ,028 14,772 3 1521 a. Predictors: (Constant), age1998, gender 1998, female=0 b. Predictors: (Constant), age1998, gender 1998, female=0, tertiary 1998 = 1, else =0, emlement 1998 = 1, else =0 The Regression SUM of SQUARES, SS(regression) = SS(total) + SS(residual) SSregression = Sum (Ŷ – Ybar)² = portion of variation in Y explained by the use of the IVs as predictors; SStotal = Sum (Y - Ybar)² SSresidual = Sum (Y- Ŷ)² - the squared sum of errors in predictions R² = SSreg/SStotal
8
ANOVA The Regression MEAN SQUARE : MSS(regression) = SS(regression) / df, df = k where k = no. of variables The MEAN square residual (error): MSS(residual) = SS(residual) / df, df= n - (k + 1) where n = no. of cases and k= no. of variables. Model Sum of Squares df Mean Square F Sig. 1 Regression 11,732 2 5,866 9,128 ,000a Residual (error) 979,415 1524 ,643 Total 991,147 1526 39,460 5 7,892 12,613 ,000b 951,687 1521 ,626 c. Dependent Variable: eval soc 1998 categories
9
Hypothesis Testing with (Multiple) Regression F – test
The null hypothesis for the regression model: Ho: b1 = b2 = … = bk = 0 MSS(model) F = MSS(residual) The sampling distribution of this statistic is the F-distribution
10
Unstandardized Coefficients Standardized Coefficients
Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 1,661 ,095 17,510 ,000 gender 1998, female=0 -,010 ,041 -,006 -,242 ,809 age1998 ,008 ,002 ,109 4,270 2 1,762 ,096 18,330 Gender (female=0) -,007 -,004 -,171 ,864 Age in 1998 ,006 ,090 3,330 ,001 Elementar educ 1998 = 1, else =0 ,070 ,054 ,036 1,282 ,200 tertiary educ 1998 = 1, else =0 -,223 ,052 -,115 -4,258 Estimated income for 1998 (in z-scores) -,058 ,020 -,077 -2,960 ,003 a. Dependent Variable: eval soc 1998 categories
11
t – test for the effect of each independent variable
The Null Hypothesis for individual IVs The test of H0: bi = 0 evaluates whether Y and X are statistically dependent, ignoring other variables. We use the t statistic b t = σB where σB is a standard error of B SS(residual) σB = n - 2 LIMITATION: tests is sensitive only to unique variance an IV adds to R2. A v. impt. variable that shares variance with another IV in the analysis may be nonsignificant although the 2 IV in combination are responsible in large part for the size of R2. That’s why good idea to report correl. Btw. each IV and DV In sequential regression (and statistical), t is only for b and beta NOT for the squared semi-partial correlation. Regression coeff are indep. of order of entry of IVs in regression model, while sri2 = directly depends on order of IV in the analysis. SPSS provides significance tests for sri2 in Sig F Change (Model Summary)
12
Assessing the importance of IVs
if IVs are uncorrelated w. each other: compare standardized coefficients (betas); higher absolute values of betas reflect greater impact; if the IVs are correlated w. each other: compare total relation of the IV with the DV, and of IVs with each other using bivariate correlations; compare the unique contribution of an IV to predicting the DV = generally assessed through partial or semi-partial correlations In partial correlation (pr), the contribution of the other IVs is taken out of both the IV and the DV; In semi-partial correlation (sr), the contribution of the other IVs is taken out of only the IV (squared) sr shows the unique contribution of the IV to the total variance of the DV
13
Assessing the importance of IVs – continued
In standard multiple regression, sr² = the unique contribution of the IV to R² in that set of IVs (for an IV, sr² = the amount by which R² is reduced, if that IV is deleted from the equation) If IVs are correlated: usually, sum of sri² < R² the difference R² - sum of sri² for all IVs = shared variance (i.e. variance contributed to R² by 2/more variables) Sequential regression: sri² = amount of variance added to R² by each IV at the point that it is added to the model In SPSS output sri² is „R² Change” for each IV in „Model Summary” Table
14
It suppresses variance that is irrelevant to prediction of DV
Suppressor Variables = IV which helps predicting DV & increases R² due to its correlation with other IVs. It suppresses variance that is irrelevant to prediction of DV traditional/classical suppression; cooperative/reciprocal suppression; negative/net suppression Output: compare simple correlation btw. each IV & DV, with the standardized regression coefficient (beta weight) for the IV. If beta = significant, look if: the absolute value of the simple correlation btw. IV and DV = much smaller than beta; the simple correlation ceoff. & beta have opposite signs. If more than 2,3 IVs - difficult to identify suppressor.
15
Interaction Terms; centering
if reasonable to assume that the importance of IV1 varies over the range of IV2 interaction (compute IV1_2= IV1 * IV2). Centering: convert the IVs that form the interaction to deviation scores (Xi – Mean for X) each variable will have mean=0 Why do it? possible problems w. multicollinearity does not affect correlation w. other variables; unstandardized regression coeff (bs) for the simple terms (b1, b2) are the same as when uncentered; affects bs for interactions (& powers) of IVs included in the regression; the betas (stadradized coeff) are different for all effects
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.