Regression model with multiple predictors Set 8 Regression model with multiple predictors
Multivariate Data +1 of variables One response (dependent) variable y = price of an item purchased by the individual K explanatory (independent) variables x1 = income of an individual x2 = education of an individual . xK = age of an individual Data for individual i : (x1i ,x2i ,. . . , xKi ,yi )
Graphs and Summary Measures Matrix plot Correlation matrix
Multiple Linear Regression Model yi = b0 + b1x1i+. . . + bKxKi + ei Response = Model + Error term xk is an explanatory (independent) variable b0 and b1, . . ., bK are unknown parameters b0, intercept b1, . . ., bK, are regression coefficients ei, unknown error, E(ei)=0, Var(ei)= s2
Least Square (LS) Estimation Estimate the unknown regression my|x’s = b0 + b1x1 + . . . + bKxK by the LS regression line Given in regression output LS estimate of the intercept parameter b0 LS model passes through the mean values
Statistics for each coefficient bj Estimate bj for each coefficient bj Standard error of the estimate for each coefficient: SE(bj) T statistic for bj = 0 (xj can be dropped from the model) Statistics that can be computed Margin of error using t table: ME(bj) = t times SE(bj), df=n-K-1 Interval estimate for bj: [bj - ME(bj), bj + ME(bj)] All values inside the interval for bj are acceptable Any value outside of the interval for bj is not acceptable If zero is inside the interval, xj can be dropped from the model
ANOVA Table Sums of Square SS Total = SS Error + SS Regression Degrees of freedom: n - 1 = n – K-1 + K Given in regression output
ANOVA Table Mean Squares & F Mean Square Error s2 is the LS estimate of s2 Standard Error of regression Mean Square regression F-ratio (Test for the regression relationship) Null model H: b1 = b2 = . . . = bK = 0 df1=K, df2=n-K-1 Given in regression output
Analysis of Variance (ANOVA) Table Source df Regression K Error n-K-1 Total n-1 SS SSR SSE SST MS MSR MSE F ratio
R2 and Adjusted R2 Fraction of variation of Y explained by the regression model R2 increases as predictors are added to the model F-ratio Adjusted R2 R2adj takes the number of predictors into account Useful for comparing models with different numbers of predictors Given in regression output
The Special Case of K=1 R2=Corr2(Y, X) F ratio = (T ratio)2 When there is only one variable in the model, K=1 R2=Corr2(Y, X) F ratio = (T ratio)2 When df1=1, F for a=.05 is equal to t2 for a=.025 with df2
Partial F Test Test for dropping more than one variable from the model Compute the estimate of the model without the variables to be dropped (reduce model) Partial F test df1=Kfull-Kreduced, df2=n - Kfull -1 (Same as SSEfull ) Reject the reduced model if F is large Sequential SS: Place predictors to be tested last in the models Alternative formula with R2’s of the full and reduced models
Margins of Error for Predictions Margin of error using t table: df=n-K-1 Prediction of the mean outcome t (SE of prediction of the mean outcome) Interval estimate Given as an option output Prediction of a single outcome t (SE of prediction of the single outcome) The interval for prediction for a single outcome is wider than the interval for the mean outcome
Residuals Fitted values LS Residuals
Residual Diagnostics Distribution of the residuals should be normal Distribution plot Normal probability plot Tests of normality Residuals must be patternless Time series (sequence) plot of residuals Plot of residuals against the fitted values Plot of residuals against each xj