Multiple Regression PSYC 4310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Regression single and multiple. Overview Defined: A model for predicting one variable from other variable(s). Variables:IV(s) is continuous, DV is continuous.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Chapter 12 Simple Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 6: Multiple Regression
Multiple Linear Regression
Multiple Regression and Correlation Analysis
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Dr. Andy Field.
Linear Regression/Correlation
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation & Regression
CHAPTER 5 REGRESSION Discovering Statistics Using SPSS.
INFERENTIAL STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Bivariate Relationships 1 PSYC 4310/6310.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
Chapter 16 Data Analysis: Testing for Associations.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression David A. Kenny January 12, 2014.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Correlation, Bivariate Regression, and Multiple Regression
Multiple Regression Prof. Andy Field.
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
بحث في التحليل الاحصائي SPSS بعنوان :
Multiple Regression – Part II
CHAPTER 29: Multiple Regression*
Product moment correlation
Presentation transcript:

Multiple Regression PSYC 4310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher

Multiple Regression: Basic Characteristics 1 Continuous DV (Outcome Variable) 2 or more Quantitative IVs (Predictors) General Form of the Equation: outcomei = (model) + errori Ypred = (b0 + b1X1 + b2X2+ … + bnXn) + i record salespred = b0 + b1ad budgeti + b2airplayi+ 

Scatterplot of the relationship between record sales, advertising budget and radio play Slope of bAdvert Slope of bAirplay

Partitioning the Variance: Sums of Squares, R, and R2 SST Represents the total amount of differences between the observed values and the mean value of the outcome variable. SSR Represents the degree of inaccuracy when the best model is fitted to the data. SSR uses the differences between the observed data and the regression line. SSM Shows the reduction in inaccuracy resulting from fitting the regression model to the data. SSM uses the differences between the values of Y predicted by the model (the regression line) and the mean. A large SSM implies the regression model predicts the outcome variable better than the mean. Multiple R The correlation between the observed values of Y (outcome variable) and values of Y predicted by the multiple regression model. It is a gauge of how well the model predicts the observed data. R2 is the amount of variation in the outcome variable accounted for by the model.

Variance Partitioning Variance in the outcome variable (DV) is due to action of all IV’s plus some error: Var Y Newspaper Readership Var X3 Age Var X1 Income Var X2 Gender X1 B1 B2 Y X2 B3 X3

Covariation Var Y Var X3 Var X1 Var X2 Error Var Y Cov X3Y Cov X1X3Y Newspaper Readership Var X3 Age Var X1 Income Var X2 Gender Error Var Y Cov X3Y Cov X1X3Y Cov X2Y Cov X1X2Y Cov X1Y

Partial Statistics Partial Correlations and Regression Coefficients Effect of all other IV’s are held constant when estimating the effect of each target IV. Covariation of other IV’s with DV is subtracted out. Partial correlations describe the independent effect of the IV on the DV, controlling for the effects of all other IV’s

Part (semi-Partial) Statistics Part (semi-partial) r Effect of other IV’s are NOT held constant. Semi-partial r’s indicate the marginal (additional/unique) effect of a particular IV on the DV, allowing all other IV’s to operate normally.

Methods of Regression: Predictor Selection and Model Entry Rules Selecting Predictors More is not better! Select the most important ones based on past research findings. Entering variables into the Model When predictors are uncorrelated order makes no difference. Rare to have completely uncorrelated variables, so method of entry becomes crucial.

Methods of Regression Hierarchical (blockwise entry) Predictors selected and entered by researcher based on knowledge of their relative importance in predicting the outcome. Forced entry (Enter) All predictors forced into model simultaneously. Stepwise (mathematically determined entry) Forward method Backward method Stepwise method

Hierarchical / Blockwise Entry Researcher decides order. Known predictors usually entered first, in order of their importance in predicting the outcome. Additional predictors can be added all at once, stepwise, or hierarchically (i.e., most important first).

Forced Entry (Enter) All predictors forced into the model simultaneously. Default option Method most appropriate for testing theory (Studenmund Cassidy, 1987)

Stepwise Entry: Forward Method Procedure Initial model contains only the intercept (b0). SPSS next selects predictor that best predicts the outcome variable by selecting the predictor with the highest simple correlation with the outcome variable. Subsequent predictors selected on the basis of the size of their semi-partial correlation with the outcome variable. Semi-partial correlation measures how much of the remaining unexplained variance in the outcome is explained by each additional predictor. Process repeated until all predictors that contribute significant unique variance to the model have been included in the model.

Stepwise Entry: Backward Method Procedure SPSS places all predictors in the model and then computes the contribution of each one by evaluating the t-test for each predictor. Significance values are compared against a removal criterion. Predictors not meeting the criterion are removed. (In SPSS the default probability to eliminate a variable is called pout = p  0.10. (probability out). SPSS re-estimates the regression equation with the remaining predictor variables. Process repeats until all the predictors in the equation are statistically significant, and all outside the equation are not. Preferable to Forward method because of suppressor effects (occur when a predictor has a significant effect, but only when another variable is held constant).

Suppressor Variables: Defined Suppressor variables increase the size of regression coefficients associated with other IVs or set of variables (Conger, 1974). Suppressor variables could be termed enhancers (McFatter, 1979) when they correlate with other IVs, and account for (or suppress) outcome-irrelevant variation (unexplained variance) in one or more other predictors, thereby improving the overall predictive power of the model. A variable may act as a suppressor (enhancer)—even when the suppressor has a significant zero-order correlation with an outcome variable—by improving the relationship of other independent variables with an outcome variable.

Stepwise Entry: Stepwise Method Procedure Same as the Forward method, except that each time a predictor is added to the equation, a removal test is made of the least useful predictor. The regression equation is constantly reassessed to see whether any redundant predictors can be removed.

Assessing the Model I: Does the model fit the observed data Assessing the Model I: Does the model fit the observed data? Outliers & Influential Cases The mayor of London at the turn of the 20th century is interested in how drinking affects mortality. London is divided into eight regions termed “boroughs” and so he measures the number of pubs and the number of deaths over a period of time in each one.

Statistical Oddity?

Regression Diagnostics: Outliers and Residuals If a model fits the sample data well, residuals (error) should be small. Cases with large residuals could be outliers. Unstandardized residuals: measured in the same units as the outcome variable, so aren’t comparable across different models. Useful in terms of their relative size. Standardized residuals: Created by transforming unstandardized residuals into standard deviation units. In a normally distributed sample: 95% of z-scores should lie between -1.96 and +1.96 (shouldn’t be more than 5%) 99.% of z-scores should lie between -2.58 and +2.58 (shouldn’t be more than 1%) 99.9% of z-scores should lie between -3.29 and +3.29 (always a problem if exceeded) Studentized residuals: The unstandardized residual divided by an estimate of its standard deviation that varies point by point. More precise estimate of the error variance of a specific case.

Regression Diagnostics: Influential Cases Several residual statistics are used to assess the influence of a particular case. Adjusted predicted value: If a specific case doesn’t exert a large influence on the model, and the model is calculated WITHOUT the particular case, we would expect the adjusted predicted value of the outcome variable to be very similar. DFFit: The difference between the adjusted predicted value and the original predicted value. Mahalanobis distances: measures the distance of cases from the means of the predictor variables (values above 25 are problematic, even with large samples and more than 5 predictors). Cook’s Distance: measure of the overall influence of a case on the model. Values greater than 1 may be problematic (Cook & Weisberg, 1982). Leverage: Measures the influence of the observed value of the outcome variable over the predicted values. Values range between “0” (no influence) to “1” (complete influence over predictor).

Assessing the Model II: Checking Assumptions Drawing conclusions about the population Variable Types: IVs must be quantitative or categorical; DV must be quantitative, continuous and unbounded. Non-zero variance: Predictors must have some variation. No perfect collinearity: Predictors should not correlate too highly. Can be tested with the VIF (variance inflation factor). Indicates whether a predictor has a strong relationship with the other predictors. Values over 10 are worrisome. Homoscedasticity: Residuals at each level of the predictor(s) should have the same variance. Independent errors: The residual terms for any two observations should be independent (uncorrelated). Tested with the Durbin-Watson test, which ranges from 0 to 4. Value of 2 means residuals are uncorrelated. Values greater than 2 indicate a negative correlation between adjacent residuals; values below 2 indicate a positive correlation. Normally distributed errors: Residuals are assumed to be random, normally distributed variables with a mean of 0. Independence: All values of the DV are assumed to be independent. Linearity: Assumes the relationship being modeled is linear.

Multiple Regression Using SPSS Record2.sav

Model Fit: Omnibus test of the model’s ability to predict the DV. Estimates: Provides estimated coefficients of the regression model, test statistics and their significance. Confidence Intervals: Useful tool for assessing likely value of the regression coefficients in the population. Model Fit: Omnibus test of the model’s ability to predict the DV. R-squared Change: R2 resulting from inclusion of a new predictor. Descriptives: Table of means, standard deviations, number of observations and correlation matrix. Part and partial correlations: Produces zero-order correlations, partial correlations and part correlations between each predictor and the DV. Collinearity diagnostics: VIF (variance inflation factor), tolerance, eigenvalues of the scaled, uncentred cross-products matrix, condition indexes, and variance proportions. Durbin-Watson: Tests the assumption of independent errors. Case-wise diagnostics: Lists the observed value of the outcome, the predicted value of the outcome, the difference between these values, and this difference standardized.

Interpreting Multiple Regression What can we learn from examining the correlations between the predictors?

Multiple Regression: Model Summary Should be close to 2; less than 1 or greater than 3 poses a problem.

Multiple Regression: Model Parameters

Multiple Regression: Casewise Diagnostics Allows us to examine the residual statistics for extreme cases. We changed the default criterion from 3 to 2. Given a sample of 200, we would expect fewer than 5% of cases to have standardized residuals greater than approximately +/- 2 standard deviations.

Multiple Regression: ChildAgression.sav A study was carried out to explore the relationship between Aggression and several potential predictor variables in 666 children that had an older sibling. Potential predictor variables measured were: Parenting_Style (high score = bad parenting) Computer_Games (high scores = more time playing computer games) Television (high score = more time watching television) Diet (high score = the child has good diet) Sibling_Aggression (high score = more aggression in older siblings) Past research indicated that parenting style and sibling aggression were good predictors of levels of aggression in younger children. All other variables were treated in an exploratory fashion. How will you analyze these data?

Past research indicated that parenting style and sibling aggression were good predictors of aggression, so these should be entered in Block 1.

How did you decide to add the three remaining variables How did you decide to add the three remaining variables? Hierarchically or Simultaneously? Did the word problem provide you with any hints?

Multiple Regression: Syntax Be sure to check the Syntax to make sure you selected the desired analysis options.

Multiple Regression: Descriptive Statistics

Multiple Regression: Correlation Results Is multicollinearity a problem? How can you tell?

Multiple Regression: Summary of Model

Multiple Regression: Regression Coefficients Collinearity Diagnostics: VIF (variance inflation factor) indicates whether a predictor has a strong linear relationship with the other predictors. No larger than 10 for any value; average VIF should be 1 or lower. Tolerance: The reciprocal of VIF, values below 0.1 indicate serious problems. Partial correlations: Relationships between each predictor and the outcome variable, controlling for the effects of the other predictors. Part correlations: Relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome. In other words, the unique relationship that each predictor has with the outcome.

Multiple Regression: Casewise Diagnostics “Extreme” cases: Cases with standardized residuals less than -2 or greater than 2. We would expect 95% of cases to have standardized residuals within about +/-2. In our sample, 36 of 666 cases are extreme for a rate of 5.4%

Multiple Regression: Reporting the Results The ANOVA for the full model was significant, F(5,660)=11.88, p<.01. As illustrated in the model summary, the linear combination of the complete set of predictors (i.e., sibling aggression, parenting style, use of computer games, good diet, time spent watching television) accounted for a moderate portion of the variance in aggression, R2 = .08. The significant R2-change following the addition of use of computer games, good diet, time spent watching television, F(3,660)=7.03, p<.01, indicates these predictors explained an additional 3% of the variance in aggression beyond that explained by sibling aggression and parenting style.

Multiple Regression: Reporting the Results B SE B  t Sig. Block 1 Constant -.01 .01 -0.48 .63 Parenting Style .06 .19** 5.06 .00 Sibling Aggression .09 .04 .10* 2.49 .02 Block 2 -0.42 .68 .18** 3.89 .08 .08* 2.11 Time Watching TV .03 .05 0.72 .48 Use of Computer Games .14 .15** 3.85 Good Diet -.11 -.12** -2.87 An analysis of the regression coefficients for the full model showed that all predictors except for time watching TV contributed significantly to the model (p’s < .05). As shown in the table above, parenting style, use of computer games, and sibling aggression were positively related to aggression, whereas good diet was negatively related to aggression.