Multiple Regression: II Three Types of Regression PSGE 7211
Goals I am still confused about… (10 minutes) Why are common causes important to include in my regression equation? Simultaneous regression analyses with 3 and 4 IVs How do I run a sequential regression analysis? What does it do? Why is stepwise regression the work of the devil? Prep for HW 4 Sequential regression analysis Checking for multicollinearity
Still confused… Computing scales Interpreting SPSS output (regression equation) Bs vs. Betas (what does standardized mean?)
Decimal Tabs http://www.dummies.com/how-to/content/how- to-work-with-word-2010s-decimal-tab.navId- 405488.html http://cybertext.wordpress.com/2009/02/04/wor d-align-decimal-numbers/
Writing up regression analyses Results Overview of purpose of the study Preliminary analyses Descriptives Correlational analyses Main analyses Regression analyses
No Multicollinearity Tolerance = ranges from 0 (no independence from other variables) to 1 (complete independence) Multicollinearity - when Tolerance is low (closer to 0; < .4) and VIF is high (> 2.5, become really concerned around 6, 7)
Multicollinearity Present ?
Parental Ed Grades Time HW
HWin ParEd GPA HWout Three IVs DV: 10th grade GPA (English, Math, Science, Social Studies) IVs: Parental Education (control) Time Spent on HW in School Time Spent on HW outside of School HWin ParEd GPA HWout
*Note the change in scales Frequencies *Note the change in scales
Predictions?
Regression Results
Note: Betas displayed (not Bs) Regression Results HWin ParEd GPA .23*** .26*** HWout Note: Betas displayed (not Bs)
Magnitude of Effects Significance level vs. Effect Size Keith’s rules: Applies to influences on school learning, relevance to other areas? Betas < .05: too small to be considered meaningful Betas > .05: small, but meaningful if statistically significant Betas > .10: moderate Betas > .25: large
HWin GPA HWout Four IVs ParEd PrAchvt DV: 10th grade GPA (English, Math, Science, Social Studies) IVs: Parental Education (control) Prior Achievement (control) Time Spent on HW in School Time Spent on HW outside of School HWin ParEd GPA PrAchvt HWout
Regression Results What changed?
Regression Results HWin ParEd .08* GPA .40*** PrAchvt .19*** HWout
Comparison of Coefficients Regression coefficients will often change depending on the variables including the regression analysis Without prior achievement, parent education has strong to moderate effect
Why do Coefficients Change?
Common Cause Common cause: important common causes must be included to interpret regression coefficients as valid effects
Indirect Effects The regression weight for parental involvement changed because of previous achievement; Parental education affects grades indirectly through prior achievement
Should R2 be low or high? For explanation, a high R2 is less important that proper variable selection; A high R2 is important for prediction R2 should be within expected range; explaining 25% of the variance may be surprisingly high or low A firm grounding in theory and prior research is critical for explanation
Multiple Regression IV1 IV1 IV2*** IV2 IV2 IV4** IV3 IV4 IV3 IV4 Simultaneous Sequential Stepwise Block 1 IV1 IV2 IV1 IV2 IV3 IV4 IV2*** IV4** Block 2 IV3 IV4 Block 3 IV1 x IV3
The Variables DV: 10th grade Social Studies Achievement (Standardized score on History, Civics, and Geography Test) IVs: SES (control) Previous Achievement (control) Self-esteem/Self-concept Locus of Control
All coefficients p<.001 Predictions? All coefficients p<.001
Simultaneous Regression Results
Regression Results
Purpose Simultaneous regression is useful for explanatory research to determine the extent of influence of one or more variables on a DV Can be used for prediction: Useful for determining the relative influence of variables
Strengths Useful for explanation, especially when guided by theory Allows comparison of relative effects/importance of variables Policy/intervention implications Estimates direct effects Order of variables in model unimportant
Weaknesses Results change based on which variables are included in analysis Implies a theoretical model (you should know what that model is) Estimates only direct effects
What to interpret Overall R2 Significance of Model and Regression Coefficients Magnitude of Bs: Useful for intervention; policy when variables are assessed on meaningful metric Magnitude of Betas: Useful for determining relative influence
Sequential Regression Results
Sequential Regression Results is it significant?
Sequential Regression Results
Sequential MR, Different Order
Sequential MR, Different Order
Sequential MR, Another Order
Why Order of Entry is Important The variance symbolized by area 3 is attributed to whichever variable, X1 or X2, is entered first
Sequential v. Simultaneous MR Simultaneous MR estimates direct paths (a), whereas sequential MR also estimates variance due to total effects (direct effect + indirect effect; a + e x d)
Order of Entry Sequential MR implies a model; You should think about what that model is! As you read others’ research, see if you can draw their “model” is it reasonable?
Variations of Sequential MR Interpretation from last step The regression coefficient from each step may be interpreted as the “total” effect of a variable Can enter variables in blocks; enter background/control variables simultaneously, variables of interest sequentially Enter variables simultaneous, enter interactions/curves sequentially (Chapters 7&8)
Total Effects
Unique Variance Some researchers add each variable last in a sequential regression to determine its “unique” effects/variance You can get the same info in simultaneous regression, if you request semipartial (part) correlations and square the part correlations to determine unique variance (see pg. 89 of text)
Purpose Explanation Is a variable (or block of variables) important for an outcome? Does a variable explain variance beyond that explained by other influences? Test for significance of interactions and curves Prediction Does a variable aid in predicting some criterion? Useful for determining the relative influence of variables
Strengths Useful for explanation, especially when guided by theory Useful for testing interactions and curves Estimates total effects in implied model
Weaknesses/Caveats Results change ( ) based on which variables are included in analysis Can over or underestimate effects based on order of entry Order of entry implies a theoretical model
What to interpret Statistical significance of Magnitude of May interpret coefficients... -- From final block (same as simultaneous) -- From each bock (total effects, given a model)
Stepwise MR Variables are entered in steps, one at a time Computer determines order of entry based on contribution of each variable to explained variance Varieties: Forward, Backward, Stepwise (combination)
Stepwise MR
Stepwise MR, continued
Purpose Is there a purpose? Mostly “theoretical garbage” (Wolfie, 1980) and “tool of the devil” (Keith, 2006) Could be used if prediction is your sole goal...however, you can also use simultaneous and sequential MR for that purpose
Strengths (?) Which subset of variables useful for efficient prediction Doesn’t require thought or theory
Weaknesses/Caveats Doesn’t require thought or theory Give up control to computer Cannot use for valid explanation Theoretical garbage
Final Thoughts See summary table on page 100 of textbook Consider purpose of your research, then choose appropriate method Can combine methods
HW 4 Using a dataset of your choice, run a sequential or hierarchical multiple regression analysis with three or more independent (continuous) variables. As always, interpret your regression analysis using APA style (see HW 4 wikipage for more information). As part of your interpretation, discuss how and why you chose the variables in your regression model. What theoretical model is implied by your analysis? How did you decide which variables to include and the order of entry? You must support your decision with references to theory and past research (this need not exceed a couple of paragraphs). Now that you have conducted one sequential analysis, change the order of entry of the variables. In other words, run an alternative sequential regression analysis. Do you note any changes in the significance of your regression coefficients? Why or why not?