Download presentation
Presentation is loading. Please wait.
Published byCuthbert Morton Modified over 9 years ago
1
MULTIPLE REGRESSION Using more than one variable to predict another
2
Last Week Coefficient of Determination r2r2 Explained variance between 2 variables Simple Linear Regression y = mx + b Predicting one variable from another Based on explained variance – if r 2 is large, should be a good predictor Predicting one dependent variable from one independent variable SEE, residuals
3
Tonight Predicting one DV from one IV is simple linear regression Predicting one DV from multiple IV’s is called multiple linear regression More IV’s usually allow for a better prediction of the DV If IV A explains 20% of the variance (r 2 = 0.20) and IV B explains 30% of the variance (r 2 = 0.30), then Can I use both to predict the dependent variable?
4
Example: Activity Dataset To demonstrate, we’ll use the same data as last week, on the pedometer and armband Goal: To predict Armband calories (real calories expended) as accurately as possible Lets start by trying to predict Armband calories with body weight Complete simple linear regression with body weight
5
Simple Regression Here is the simple regression output from using Body Weight (kg) to predict Armband Calories
6
Simple Regression Results using Body Weight (kg): r 2 = 0.155 SEE = 400.5 calories Can we improve on this equation by adding in new variables? First, we have to determine if other variables in the dataset might be related to Armband Calories.. Use correlation matrix
7
Correlations Notice several variables have some association with armband calories: Variablerr2r2 Height0.2250.05 Weight0.3930.15 BMI0.3780.14 PedSteps0.7820.61 PedCalories0.8530.73
8
Create new regression equation Simple regression equation looks like: y = mx + b Multiple regression equation looks like: y = m 1 x 1 + m 2 X 2 + b Subscript is used to help organize the data All we are doing is adding an additional variable into our equation. That new variable will have it’s own slope, m 2 For the sake of simplicity, lets add in pedometer steps as X 2
9
OUTPUT…
10
Multiple Regression Output
11
Simple to Multiple Results using Body Weight (kg): r 2 = 0.155 SEE = 400.5 calories Results using Body Weight and Pedometer Steps: r 2 = 0.672 SEE = 251.7 calories r 2 change = 0.672 – 0.155 = 0.517 If 2 variables are good – would 3 be even better?
12
Adding one more in… In addition to body weight (x 1 ) and pedometer steps (x 2 ), lets add in age (x 3 )
13
Multiple Regression Output 2
14
Simple to Multiple Results using Body Weight (kg): r 2 = 0.155 SEE = 400.5 calories Results using Body Weight and Pedometer Steps: r 2 = 0.672 SEE = 251.7 calories r 2 change = 0.517 Results using Body Weight, PedSteps, Age r 2 = 0.689 SEE = 247.7 r 2 change = 0.689 – 0.672 = 0.017
15
Multiple Regression Decisions Should we recommend that age is used in the model? These decisions can be difficult “Model Building” or “Model Reduction” is more of an art than a science Consider p-value of age in model = 0.104 r 2 change by adding age = 0.017, or 1.7% of variance More coefficients (predictors) make the model more complicated to use and interpret Does it make sense to include age? Should age be related to caloric expenditure?
16
Other Regression Issues Sample Size With too small a sample, you lack the statistical power to generalize your results to other samples/the whole population You increase your risk of Type II Error (failing to reject the alternative hypothesis when true) In multiple regression, the more variables you use in your model the greater your risk of Type II Error This is a complicated issue, but essentially you need large samples to use several predictors Guidelines…
17
Other Regression Issues Sample Size Tabachnick & Fidell (1996): N > 50 + 8m N=appropriate sample size, m=# of IV’s So, if you use 3 predictors (like we just did in our example): 50 + 8*3 = 74 subjects You can find several different ‘guess-timates’, I usually just try have 30 subjects, plus another 30 for each variable in the model (ie, 30 + 30m) I like to play it safe…
18
Other Regression Issues Multiple Regression has the same statistical assumptions as correlation/regression Check for normal distribution, outliers, etc… One new concern with multiple regression is the idea of Collinearity You have to be careful that your IV’s (predictor variables) are not highly correlated with each other Can cause a model to overestimate r 2 Can also cause one new variable to eliminate another
19
Example Collinearity Results of MLR using Body Weight, PedSteps, Age r 2 = 0.689 SEE = 247.7 Imagine we want to add in one other variable, Pedometer Calories Look at the correlation matrix first…
20
Notice that Armband calories is highly correlated with both Pedometer Steps and Pedometer Calories Initially, this looks great because we might have two very good predictors to use But, notice that Pedometer Calories is very highly correlated with Pedometer Steps These two variables are probably collinear – they are very similar and may not explain ‘unique’ variance
21
Here is the MLR result with Weight, Steps, and Age: Here is the MLR result by adding Pedometer calories in the model: Pedometer calories becomes the only significant predictor in the model. In other words, the variance in the other 3 variables can be explained by Pedometer Calories – not all 4 variables add ‘unique’ variance to the model
22
Example Collinearity Results of MLR using Body Weight, PedSteps, Age r 2 = 0.689 SEE = 247.7 Results of MLR using Body Weight, PedSteps, Age, and PedCalories r 2 = 0.745 SEE = 226.2 Results of MLR using just PedCalories (eliminates collinearity) r 2 = 0.727 SEE = 227.5 Which model is the best model? Remember, we’d like to pick the strongest predictor model with the fewest number of predictor variables
23
Model Building Collinearity makes model building more difficult 1) When you add in new variables you have to look at r 2, r 2 change, and SEE – but you also have to notice what’s happening to the other IV’s in the model 2) Sometimes, you need to remove variables that used to be good predictors 3) This is why the model with the most variables is not always the best model – sometimes you can do just as well with 1 or 2 variables
24
What to do about Collinearity? Your approach: Use a correlation matrix to examine the variables BEFORE you try to build your model 1) Check the IV’s correlations with the DV (high correlations will probably be best predictors) but… 2) Check the IV’s correlations with the other IV’s (high correlations probably indicate collinearity) If you do find that two IV’s are highly correlated, be aware that having them both in the model is probably not the best approach (pick the best one and keep it) QUESTIONS…?
25
Upcoming… In-class activity on MLR… Homework (not turned in due to exam): Cronk Section 5.4 OPTIONAL: Holcomb Exercises 31 and 32 Multiple correlation, NOT full multiple linear regression Similar to MLR, but looks at the model’s r instead of making a prediction equation Mid-Term Exam next week Group differences after spring break (t-test, ANOVA, etc…)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.