Download presentation
Presentation is loading. Please wait.
Published bySamantha Hunter Modified over 9 years ago
1
Multiple regression
2
Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale. Predictor (X 1 ): Brain size based on MRI scans (given as count/10,000) Predictor (X 2 ): Height in inches Predictor (X 3 ): Weight in pounds
3
Scatter matrix plots Scatter plots of response versus predictor helps in determining nature and strength of relationships. Scatter plots of predictor versus predictor helps in studying their relationships, as well as identifying scope of model and outliers.
4
Scatter matrix plot
5
Matrix plot in Minitab Select Graph >> Matrix plot … Specify all of the variables (response and predictors) you want graphed. Select OK.
6
Correlation matrix Correlations: PIQ, MRI, Height, Weight PIQ MRI Height MRI 0.378 Height -0.093 0.588 Weight 0.003 0.513 0.700 Cell Contents: Pearson correlation
7
Correlation matrix in Minitab Stat >> Basic statistics >> Correlation… Select all of the variables (response and predictors). To get a “crisper” table, de-select default “Display p-values”
8
Linear regression model with 3 predictors where: Y i = intelligence (PIQ) if student i X i1 = brain size of student i (MRI) X i2 = height of student i (Height) X i3 = weight of student i (Weight)
9
Fitting multiple regression model in Minitab It’s basically the same as fitting a simple linear regression model. Stat >> Regression >> Regression… Select response and all predictors. Specify all desired options as you would for simple linear regression.
10
The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 MRI 2.0604 0.5634 3.66 0.001 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 How likely is it that b 3 = 0.0006 would be as extreme as it is (?!) if β 3 = 0?
11
Confidence intervals for β k Sample estimate ± margin of error Predictor Coef SE Coef T P Weight 0.0006 0.1971 0.00 0.998
12
The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 MRI 2.0606 0.5466 3.77 0.001 Height -2.7299 0.9932 -2.75 0.009 S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5% Coefficient of (multiple) determination Adjusted coefficient of (multiple) determination
13
Coefficient of (multiple) determination Basically same as before. R 2 = SSR/SSTO = proportionate reduction in total variation in Y associated with using set of X 1, …, X p-1 variables. Again, a large R 2 value does not necessarily imply that the fitted model is a useful one.
14
Adjusted coefficient of multiple determination Problem: adding more X variables can only increase R 2, because SSTO never changes for a given set of data. But, the remaining error (quantified by SSE) can only get smaller (or stay the same) when more predictor variables are considered. Solution: adjust R 2 to take into account the number of predictors in the model.
15
Adjusted coefficient of multiple determination
16
PIQ = 111 + 2.06 MRI - 2.73 Height S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5% Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 Calculation of R 2 (adj): Interpretation of R 2 (adj):
17
Impact of the adjustment It’s a trade-off. R-Sq(adj) may even become smaller when another predictor variable is introduced into the model. The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5% The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%
18
The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 Is there a relationship between the response variable and the set of predictor variables? How likely is it that the sample would yield such an extreme F-statistic if the null hypothesis were true?
19
Caution when predicting or estimating response
20
What is scope of model?
21
Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 113.16 3.21 (106.64,119.68) (73.02,153.30) 2 108.99 4.33 (100.19,117.78) (68.41,149.56) Values of Predictors for New Observations New Obs MRI Height 1 91.0 68.0 2 85.0 65.0 S = 19.51
22
Diagnostics and remedial measures Most procedures carry directly over (with minor modification) from simple linear regression to multiple linear regression. But, some procedures are specific only to multiple linear regression (chapters 9, 10)
23
Residuals against each predictor Gives an indication of the adequacy of the regression function with respect to each specific predictor variable.
25
Unusual Observations Obs MRI PIQ Fit SEFit Residual StResid 13 86 147.00 95.31 5.34 51.69 2.75R R denotes an obs’n with a large standardized residual
28
Residuals versus omitted predictors As usual. Plus, also consider plotting residuals against interaction terms, such as X 1 X 2, because they too are potentially important omitted variables.
29
Regression interaction terms in Minitab Use the calculator to create a new variable (MRI*Ht). Select Calc >> Calculator. Specify “Store result in variable” (MRI*Ht) Specify Expression: MRI*Height Select OK. The new (interaction) predictor variable will appear in worksheet.
32
Modified Levene test MRIGrp 1: le 90.5 2: gt 90.5
33
LOF Test Requires that there are at least some repeats of the same values across all predictor variables. X 1 = 59, X 2 = 63 and X 1 =59 and X 2 =63 is an example of a repeat. X 1 = 59, X 2 = 63 and X 1 =59 and X 2 =66 is not an example of a repeat.
34
Row MRI Height 1 81.69 64.5 2 103.84 73.3 3 96.54 68.8 4 95.15 65.0 5 92.88 69.0 6 99.13 64.5 7 85.43 66.0 8 90.49 66.3 9 95.55 68.8 10 83.39 64.5 11 107.95 70.0 12 92.41 69.0 13 85.65 70.5 14 87.89 66.0 15 86.54 68.0 16 85.22 68.5 17 94.51 73.5 18 80.80 66.3 19 88.91 70.0 20 90.59 76.5 Row MRI Height 21 79.06 62.0 22 95.50 68.0 23 83.18 63.0 24 93.55 72.0 25 79.86 68.0 26 106.25 77.0 27 79.35 63.0 28 86.67 66.5 29 85.78 62.5 30 94.96 67.0 31 99.79 75.5 32 88.00 69.0 33 83.43 66.5 34 94.81 66.5 35 94.94 70.5 36 89.40 64.5 37 93.00 74.0 38 93.59 75.5
35
Attempted LOF Test The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 No replicates. Cannot do pure error test.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.