Multiple Regression Dr. Andy Field.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 5: Week 20: 15 th February OLS (2): assessing goodness of fit, extension to multiple regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 6: Multiple Regression
Chapter Topics Types of Regression Models
Lecture 23 Multiple Regression (Sections )
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Business Statistics - QBM117 Statistical inference for regression.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
CHAPTER 5 REGRESSION Discovering Statistics Using SPSS.
Correlation and Linear Regression
Inference for regression - Simple linear regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Linear Regression and Correlation
Multiple Regression PSYC 4310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Lecture 4 Introduction to Multiple Regression
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Linear Regression Chapter 8. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Linear Regression Chapter 7. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Multiple Regression 8 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav Domene.sav.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Inference for Least Squares Lines
Correlation, Bivariate Regression, and Multiple Regression
Multiple Regression Prof. Andy Field.
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Regression Analysis Simple Linear Regression
Linear Regression Prof. Andy Field.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression.
Chapter 12: Regression Diagnostics
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Product moment correlation
Chapter Fourteen McGraw-Hill/Irwin
Presentation transcript:

Multiple Regression Dr. Andy Field

Aims Understand When To Use Multiple Regression. Understand the multiple regression equation and what the betas represent. Understand Different Methods of Regression Hierarchical Stepwise Forced Entry Understand How to do a Multiple Regression on PASW/SPSS Understand how to Interpret multiple regression. Understand the Assumptions of Multiple Regression and how to test them Slide 2

What is Multiple Regression? Linear Regression is a model to predict the value of one variable from another. Multiple Regression is a natural extension of this model: We use it to predict values of an outcome from several predictors. It is a hypothetical model of the relationship between several variables. Slide 3

Regression: An Example A record company boss was interested in predicting record sales from advertising. Data 200 different album releases Outcome variable: Sales (CDs and Downloads) in the week after release Predictor variables The amount (in £s) spent promoting the record before release (see last lecture) Number of plays on the radio (new variable)

The Model with One Predictor Slide 5

Multiple Regression as an Equation With multiple regression the relationship is described using a variation of the equation of a straight line. A similar equation can be derived in which each predictor variable has its own coefficient, and the outcome variable is predicted from a combination of all the variables multiplied by their respective coefficients plus a residual term (see equation above). Y is the outcome variable, 1 is the coefficient of the first predictor (X1), 2 is the coefficient of the second predictor (X2), n is the coefficient of the nth predictor (Xn), and i is the difference between the predicted and the observed value of Y for the ith subject. In this case, the model fitted is more complicated, but the basic principle is the same as simple regression. That is, we seek to find the linear combination of predictors that correlate maximally with the outcome variable. Slide 6

b0 b0 is the intercept. The intercept is the value of the Y variable when all Xs = 0. This is the point at which the regression plane crosses the Y-axis (vertical). Slide 7

Beta Values b1 is the regression coefficient for variable 1. bn is the regression coefficient for nth variable. Slide 8

The Model with Two Predictors bAdverts b0 bairplay Slide 9

Methods of Regression Hierarchical: Forced Entry: Stepwise: Experimenter decides the order in which variables are entered into the model. Forced Entry: All predictors are entered simultaneously. Stepwise: Predictors are selected using their semi-partial correlation with the outcome. Slide 10

Hierarchical Regression Known predictors (based on past research) are entered into the regression model first. New predictors are then entered in a separate step/block. Experimenter makes the decisions. Slide 12

Hierarchical Regression It is the best method: Based on theory testing. You can see the unique predictive influence of a new variable on the outcome because known predictors are held constant in the model. Bad Point: Relies on the experimenter knowing what they’re doing! Slide 13

Forced Entry Regression All variables are entered into the model simultaneously. The results obtained depend on the variables entered into the model. It is important, therefore, to have good theoretical reasons for including a particular variable. Slide 14

Stepwise Regression I Variables are entered into the model based on mathematical criteria. Computer selects variables in steps. Step 1 SPSS looks for the predictor that can explain the most variance in the outcome variable. Slide 15

Exam Performance Revision Time Difficulty Previous Exam Variance explained (1.7%) Exam Performance Variance explained (1.3%) Variance accounted for by Revision Time (33.1%) We’re trying to explain exam performance in terms of Revision time, previous exam performance, and difficulty of the exam. First, look at the individual shared variance between exam performance and the three predictors. Q. How do we do this? R. Pearson’s correlation. Revision Time

Stepwise Regression II Having selected the 1st predictor, a second one is chosen from the remaining predictors. The semi-partial correlation is used as a criterion for selection. Slide 18

Semi-Partial Correlation measures the relationship between two variables, controlling for the effect that a third variable has on them both. A semi-partial correlation: Measures the relationship between two variables controlling for the effect that a third variable has on only one of the others. Slide 19

Semi-Partial Correlation Revision Exam Anxiety Exam Anxiety Semi-Partial Correlation Partial Correlation Slide 20

Semi-Partial Correlation in Regression The semi-partial correlation Measures the relationship between a predictor and the outcome, controlling for the relationship between that predictor and any others already in the model. It measures the unique contribution of a predictor to explaining the variance of the outcome. Slide 21

Slide 22

Problems with Stepwise Methods Rely on a mathematical criterion. Variable selection may depend upon only slight differences in the Semi-partial correlation. These slight numerical differences can lead to major theoretical differences. Should be used only for exploration Slide 23

Doing Multiple Regression Slide 24

Doing Multiple Regression Slide 25

Regression Statistics

Regression Diagnostics

Output: Model Summary Slide 28

R and R2 R The correlation between the observed values of the outcome, and the values predicted by the model. R2 Yhe proportion of variance accounted for by the model. Adj. R2 An estimate of R2 in the population (shrinkage). Slide 29

Output: ANOVA Slide 30

Analysis of Variance: ANOVA The F-test looks at whether the variance explained by the model (SSM) is significantly greater than the error within the model (SSR). It tells us whether using the regression model is significantly better at predicting values of the outcome than using the mean. Slide 31

Output: betas Slide 32

How to Interpret Beta Values the change in the outcome associated with a unit change in the predictor. Standardised beta values: tell us the same but expressed as standard deviations. Slide 33

Beta Values b1= 0.087. So, as advertising increases by £1, record sales increase by 0.087 units. b2= 3589. So, each time (per week) a song is played on radio 1 its sales increase by 3589 units. Slide 34

Constructing a Model Slide 35

Standardised Beta Values 1= 0.523 As advertising increases by 1 standard deviation, record sales increase by 0.523 of a standard deviation. 2= 0.546 When the number of plays on radio increases by 1 s.d. its sales increase by 0.546 standard deviations. Slide 36

Interpreting Standardised Betas As advertising increases by £485,655, record sales increase by 0.523  80,699 = 42,206. If the number of plays on radio 1 per week increases by 12, record sales increase by 0.546  80,699 = 44,062. Slide 37

Reporting the Model

How well does the Model fit the data? There are two ways to assess the accuracy of the model in the sample: Residual Statistics Standardized Residuals Influential cases Cook’s distance Slide 39

Standardized Residuals In an average sample, 95% of standardized residuals should lie between  2. 99% of standardized residuals should lie between  2.5. Outliers Any case for which the absolute value of the standardized residual is 3 or more, is likely to be an outlier. Slide 40

Cook’s Distance Measures the influence of a single case on the model as a whole. Weisberg (1982): Absolute values greater than 1 may be cause for concern. Slide 41

Generalization When we run regression, we hope to be able to generalize the sample model to the entire population. To do this, several assumptions must be met. Violating these assumptions stops us generalizing conclusions to our target population. Slide 42

Straightforward Assumptions Variable Type: Outcome must be continuous Predictors can be continuous or dichotomous. Non-Zero Variance: Predictors must not have zero variance. Linearity: The relationship we model is, in reality, linear. Independence: All values of the outcome should come from a different person. Slide 43

The More Tricky Assumptions No Multicollinearity: Predictors must not be highly correlated. Homoscedasticity: For each value of the predictors the variance of the error term should be constant. Independent Errors: For any pair of observations, the error terms should be uncorrelated. Normally-distributed Errors Slide 44

Multicollinearity Multicollinearity exists if predictors are highly correlated. This assumption can be checked with collinearity diagnostics. Slide 45

Tolerance should be more than 0.2 (Menard, 1995) VIF should be less than 10 (Myers, 1990)

Checking Assumptions about Errors Homoscedacity/Independence of Errors: Plot ZRESID against ZPRED. Normality of Errors: Normal probability plot. Slide 47

Regression Plots

Homoscedasticity: ZRESID vs. ZPRED Bad Good

Normality of Errors: Histograms Good Bad

Normality of Errors: Normal Probability Plot Good Bad