Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.

Slides:



Advertisements
Similar presentations
Soc 3306a Lecture 6: Introduction to Multivariate Relationships Control with Bivariate Tables Simple Control in Regression.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Soc 3306a: Path Analysis Using Multiple Regression and Path Analysis to Model Causality.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
BA 555 Practical Business Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter Topics Types of Regression Models
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Example of Simple and Multiple Regression
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Soc 3306a Lecture 8: Multivariate 1 Using Multiple Regression and Path Analysis to Model Causality.
Soc 3306a Lecture 10: Multivariate 3 Types of Relationships in Multiple Regression.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Chapter 13 Multiple Regression
Discussion of time series and panel models
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Bivariate & Multivariate Regression Analysis
Correlation and Simple Linear Regression
The Correlation Coefficient (r)
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression – Part II
Prepared by Lee Revere and John Large
Individual Assignment 6
The Correlation Coefficient (r)
Presentation transcript:

Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients

Assumptions for Multiple Regression Random sample Distribution of y is relatively normal  Check histogram for DV Standard deviation of y is constant for each value of x  Check scatterplots (Figure 1)

Problems to Watch For… Violation of assumptions, especially normality of DV and heteroscedasticity (Figure 1) Simpson’s Paradox (Figure 3) Multicollinearity (Figure 1 and 2)

Building a Model in SPSS (Figure 2) Should be driven by your theory You can add your variables on at a time, checking at each step whether there is significant improvement in the explanatory power of the model. Use Method=Enter. In Block 1, enter your main IV. Under Statistics, ask for R 2 change. Click next, and enter additional IV. Check the Change Statistics in the Model Summary watch changes in R 2 and coefficients (esp. partial correlations) carefully.

Multiple Correlation R (Figure 2) Measures correlation of all IV’s with DV Is the correlation of y values with the predicted y values Always positive (between 0 and +1)

Coefficient of Determination R 2 (Figure 2) Measures the proportional reduction in error (PRE) in predicting y using the prediction equation (taking x into account) rather than the mean of y R 2 = (TSS – SSE)/TSS This is the explained variation in y

TSS, SSE and RSS TSS = Total variability around the mean of y SSE = Residual sum of squares or error  This is the unexplained variability RSS = TSS – SSE  This is the regression sum of squares  The explained variability in y

F Statistic and p-value Look at the ANOVA table (Figure 2) F is the ratio of the regression mean square (RSS/df) and the residual (error) mean square (SSE/df) The larger the F, the smaller the p-value Small p-value (<.05,.01, or.001) is strong evidence for the significance of the model

Slope (b), β, t-statistic and p-value (Coefficients Table in Figure 2) Slope is measured in actual units of variables. Change in y for 1 unit of x In multiple regression, each slope is controlled for all other x variables β is standardized slope – can compare strength t = b/se with df= n-(k+1), note: k = # of predictors Small p-value indicates significant relationship with y, controlling for other variables in model Note: in bivariate regression, t 2 = F and β = r

Multicollinearity (Figure 1 and 2) Two independent variables in the model, i.e. x 1 and x 2, are correlated with y but also highly correlated (> ) with each other Both are explaining the same proportion of variation in y but adding x 2 to the model does not increase explanatory value (R, R 2 ) Check correlation between IV’s in correlation matrix. Ask for and check partial correlations in multiple regression (Part and Partial under Statistics) If partial correlation in multiple model much lower than bivariate correlation, multicollinearity indicated

Types of Multivariate Relationships 1. Spuriousness 2. Causal chains (intervening variable) 3. Multiple causes (independent effects) 4. Suppressor variables 5. Interaction effects Multiple regression can test for all of these

1. Spuriousness (Figure 3) A spurious relationship means model is incorrectly specified Indicated by change in the sign of partial correlations Can also check the partial regression plots (ask for all partial plots under Plots) The bivariate relationship between acceleration time and vehicle weight was negative (as weight went up, time to accelerate to 60 mph went down) – but makes no sense! When horsepower was added to the model, partial relationship of Acc x Wt became positive When relationship changes (ie – to +) or disappears a spurious relationship may be present In this case, variation in both Acceleration and Weight caused by Horsepower The situation in Figure 3 is called Simpson’s Paradox

2. Causal Chains Intervening variable changes relationship between x and y A relationship exists between X1 and Y at the bivariate level, but disappears with the addition of control variable(s) Results can look the same as spuriousness Major difference is interpretation (see Agresti Ch. 11) Need to rely on theory Bivariate: X 1  Y Multivariate: X 1  X 2  Y Although effect of X 1 on Y disappears, X 1 is still part of “causal explanation” as an “indirect” cause

2. Causal Chains (cont.) Two possibilities for causal chains: If slope of X 1 no longer significant after introducing X 2, we have an indirect causal effect  X 1  X 2  Y Or if the strength of the slope is weaker yet still significant X 1 has both an indirect and direct causal effect  X 1 Y X 2

3. Multiple Causes Y (DV) has multiple causes Independent variables have relatively separate effects on the dependent variable Introduction of controls does little to change bivariate correlations and the bivariate slopes stay similar Compare bivariate to partial correlations in multiple model and compare slopes in the bivariate and multiple models

4. Suppressor Variables Initially slope between X 1 and Y non- significant When add control variable X 2 the slope becomes significant X 2 is associated with both X 1 and Y which hides the initial relationship  X 2 Y X 1

4. Interactions Not all IV effects on Y are independent and often IV’s interact with one another in their effect on Y Usually suggested by theory An interaction is present when you enter control variable and the original bivariate association differs by level of the control variable Does the slope of X 1 differ by category of X 2 when explaining Y? Can test this by introducing “interaction terms” into the multiple regression model (for example see optional reading Agresti Ch. 11 p )

Interactions (cont.) Interaction term is the cross-product of X 1 and X 2 and is entered into model together with X 1 and X 2 (go to Transform>Compute variable…) Regression model becomes: E(y) = a + b 1 x 1 + b 2 x 2 + b 3 x 1 x 2 Produces main effects and an interaction effect If interaction not significant, drop from model since the effects of X 1 and X 2 are independent of one another, and interpret main effects See Figure 4

Interpreting Interactions If interaction slope is significant, main effects should be interpreted in context of the interaction model. See Figure 5. E(y) = a + b 1 x 1 + b 2 x 2 + b 3 x 1 x 2 Income (Y) is determined by Respondent’s Education (x 1 ), Spouse’s Education (x 2 ) and the interaction of x 1 x 2 By setting x 2 at distinct levels (i.e. 10 and 20 years), can calculate or graph the changing slopes for x 1 (again, see Agresti)

A Few Tips for SPSS Mini 6 Review the relevant powerpoint slides and accompanying handouts Read assignment over carefully before starting. When creating your model, build your model carefully one block at a time. Watch for spurious relationships. Revise model if needed. Drop any unnecessary variables (i.e. evidence of multicollinearity or new variables that do not appreciably increase R 2.) Keep your model simple. Aim for good explanatory value with the least variables possible.