Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6: Multiple Regression

Similar presentations


Presentation on theme: "Lecture 6: Multiple Regression"— Presentation transcript:

1 Lecture 6: Multiple Regression
Laura McAvinue School of Psychology Trinity College Dublin

2 Previous Lectures Relationship between two variables Correlation
Measure of strength of association between two variables Simple linear regression Measure of the ability of one variable (X) to predict the other variable (Y) Computes a regression equation that describes the relationship between the response variable (Y) and the predictor variable (X) by expressing Y as a function of X

3 Multiple Regression Used when there is more than one predictor variable Two purposes To predict Y, given a combination of predictor variables To assess the relative importance of each predictor variable in explaining the response variable Y

4 Simple Linear Regression
Regression Equations Simple Linear Regression Multiple Regression Y = a + b1X1 + b2X2 +… + bkXk b1 = Regression coefficient for first predictor variable, X1 b2 = Regression coefficient for second predictor variable, X2 a = Intercept, value of Y when all predictor variables are 0

5 Statistical Models Running a regression analysis is not a simple matter of inputting data, clicking a button and obtaining a ‘fixed’ model of the data You create the model of your data Subjective process in many respects You shape the model you create Your job is to create the model that best describes the data

6 Multiple Regression Assessing the relative contribution of each predictor variable to the response variable Which variable contributes most? Which is the second biggest predictor? Which variables don’t seem to contribute to prediction? Problem The order with which you input the variables into the analysis influences the model Variable entered first is attributed more variance By the time the last variable is entered, there might be very little variance left to explain

7 Variance in Y related to X1 Variance in Y related to X2
Multiple Correlation The predictor variables are correlated with each other and with the response variable Which predictor variable gets credit for this shared variance? Variance in Y related to shared variance between X1 & X2 Which variable gets credit, X1 or X2?

8 Different Methods of Multiple Regression
Hierarchical Regression Entry / Standard Regression Sequential Methods Forward Addition Backward Selection Stepwise Combinatorial Approach

9 Hierarchical Regression
You decide the order in which the variables are entered Based on theory / prior research Allows you to assess whether each predictor adds anything to the model, given the predictors that are already in the model

10 Entry / Standard Regression
Computer package enters all predictor variables into the model simultaneously Creates a regression equation including all predictor variables Allows us to assess the unique contribution of each predictor variable when all other variables are held constant Advantages & Disadvantages Easy to see which variables significantly predict the response variable May not create the best model for predicting Y as it will include variables that don’t significantly predict Y

11 Sequential Models Aim to create the ‘best model’
The combination of variables that best predicts the response variable Build several models in a series of steps, adding or deleting variables at each step, depending on their contribution to predicting the response variable Final model includes only variables which significantly and uniquely predict the response variable

12 Sequential Methods Forward Addition
Begins with only one variable in the model The variable that makes the biggest contribution to the response variable (highest r) Adds the variable with the next highest contribution Continues to add variables until there are no more variables that make a significant contribution to the response variable over and above the variables that are already in the equation

13 Sequential Methods Backward Selection
Begins with all predictor variables in the model and successively deletes variables until only significant ones remain Stepwise Regression Similar to previous two but more versatile Generally moves forward, adding significant variables, but can move backward to eliminate a variable if it no longer significantly predicts when another variable is added

14 Sequential Methods Drawbacks
Inclusion in the model depends on mathematical criterion rather than psychological theory or research Variable selection could depend upon tiny differences in correlation between each predictor variable and the response variable Slight numerical differences could therefore lead to major differences in theoretical interpretation Difficult to replicate results

15 Combinatorial Methods
Best Subsets Method Computes models with all possible combinations of the predictor variables and chooses the model that explains most variance in the response variable

16 Critical Considerations for MR
Sample size Distribution requirements: Residuals Data must be normally distributed Outliers Multi-collinearity

17 Sample Size Ratio of cases to predictors should be substantial
Stevens (1996) advised about 15 participants per predictor variable Size matters: The more people in your sample the better the chance of the results being replicated However, an even bigger ratio is needed when Response variable has skewed data distribution Poor reliability in measures - substantial measurement error reduces size of true relationships of variables Stepwise methods (45-50 participants per predictor)

18 Residuals Y = a + b1X1 + b2X2 +… + bkXk + e
Recall the Method of Least Squares Fits the regression line by minimising the prediction error of the line Minimises the sum of squares of the residuals (Y-Y’)2 Fits a line of the form Y = a + b1X1 + b2X2 +… + bkXk + e Assumes: Y = Fit noise

19 Residuals Method of Least Squares models the noise (e) in the data using the normal distribution Assumes the noise is normally distributed with mean of 0 and variance σ2 If this assumption is violated, the results of your regression analysis may not be valid You need to check this by plotting the residuals Standardised Residual Plots Histogram Normal Probability Plot

20 Histogram

21 Normal Probability Plot
Plots the residual value that was obtained for each data point (observed) against the value you would expect if the residuals were normally distributed (expected) Should be a straight diagonal line

22 Outliers Data points that lie far from the rest of the data and have large residuals Big influence on regression analysis You can check for outliers Scatterplots examining relationship between response variable and predictor variables separately Casewise diagnostics in SPSS Plots of the standardised residuals

23 Plot of Standardised Residuals
Plot of Standardized Predicted Values X Studentised Deleted Residuals (Residual scores divided by their standard deviation, which is calculated leaving out any suspiciously outlying data points) Based on the assumption of normality: 99.9% of residuals should lie within +3 & - 3 standard deviations Any point outside this range is an outlier -1 1 3 2 4 -2 -3

24 Multi-Collinearity Occurs when predictor variables are highly correlated with one another High bivariate correlations (.7 / .8 or above) High multivariate correlation Not a desired feature of the dataset Some predictor variables are redundant Statistically, leads to unstable results

25 Multi-Collinearity To assess whether multi-collinearity is present
Examine the bivariate correlations between predictor variables Tolerance Statistic 1 – Multiple correlation (correlation between each predictor variable and all others) If low, then multiple correlation must be high and multi-collinearity is a problem Solution Leave out one of the predictor variables Combine two highly correlated predictor variables

26 Let’s take an example Interested in a theory which suggests that a person’s level of optimism (X1) and the social support (X2) that he/she has in his/her life predicts how long he/she will survive (Y) after being diagnosed with cancer. Three steps to Regression Analysis: A. Examine the relationship between the predictor and response variables separately B. Perform and interpret the multiple regression C. Assess the appropriateness of the regression analysis

27 Let’s take an example Open the following dataset
Software / Kevin Thomas / Multiple Regression Dataset Run Correlations between… Survival & Optimism Survival & Social Support

28 Create Scatterplots & fit regression line
Graphs / Scatter / Simple Scatter / y = Survival, X = Predictor Variable Fit regression line: Double click on chart, then Elements / Fit line at total

29 Step 2: The Multiple Regression
Analyse, Regression, Linear Dependent variable: Survival Independent variable: Social, optimism Method: Enter (gives a standard multiple regression) Statistics Regression Coefficients Estimates  Model fit  Descriptives

30 Answer the questions on your worksheet
1. Does this model (i.e. combination of social support and optimism) significantly predict the response variable (survival in months)? Yes, F (2, 199) = 67.73, p < .001

31 Answer the questions on your worksheet
2. What percentage of variance in the response variable, survival in months, is explained by this model? R Square adjusted = Estimate of the population proportion of variation in survival due to optimism & support Penalises for number of variables in the model 40.1%

32 Answer the questions on your worksheet
3. Write the regression equation Survival in months = 3.67(optimism) (social support)

33 Answer the questions on your worksheet
4. What does this equation tell us about the relationship between months of survival and social support? As social support increases by one unit, survival in months increases by almost 13 months

34 Answer the questions on your worksheet
5. Do both variables significantly predict survival in months? Yes, for optimism, t = 10, p < .001 & for social support, t = 4.026, p < .001

35 Answer the questions on your worksheet
6. Which of the predictor variables contributes most to the response variable? Beta = Standardized Regression Coefficient (B / Std. Error) Can be used to compare strength of contribution of predictor variables Optimism has a Beta value of .558 and so, contributes more than social support, which has a Beta value of .225

36 Answer the questions on your worksheet
7. Use the regression equation to make the following prediction: If a person has an optimism score of 10 and a social support score of 2, how long would you expect them to survive? Survival in months = 3.67(optimism) (social support) Survival in months = 3.67(10) (2) Survival in months = Survival in months = 67.02 67 months!

37 Answer the questions on your worksheet
8. What is the standard error of this prediction? 62.43 months

38 Step 2: Assess the appropriateness of the Analysis
Distribution of Residuals Outliers Multi-collinearity Re-run regression but this time… Statistics Collinearity Diagnostics Residuals, casewise diagnostics Outliers outside 3 standard deviations Plots Histogram Normal Probability Plot Plot of Standardized Predicted Values (Y: ZPRED) by Studentized Deleted Residuals (X: SDRESID)

39 Distribution of Residuals

40 Outliers All residuals lie within -3 and 3 standard deviations
Note that you expect 1% of cases to lie outside this area so in a large sample, if you have one or two, that could be ok

41 All residuals lie within -3 and 3 standard deviations
Outliers All residuals lie within -3 and 3 standard deviations

42 Multi-Collinearity Bivariate correlations seem to be low (r = .182) even though significant (p = .01) Tolerance is high, meaning that the multiple correlation is small, meaning that multi-collinearity is not a feature of this dataset

43 Summary Multiple Regression Statistical modelling Three steps
To predict Y given a combination of predictor variables To assess the relative importance of each predictor variable in explaining the response variable Statistical modelling Different Methods Three steps Examine the relationship between the predictor and response variables separately Perform and interpret the multiple regression Assess the appropriateness of the regression analysis There are a number of critical considerations


Download ppt "Lecture 6: Multiple Regression"

Similar presentations


Ads by Google