Lab 4 Multiple Linear Regression
Meaning An extension of simple linear regression It models the mean of a response variable as a linear function of several explanatory variables
Ways of analysis Matrix of scatterplots Matrix of correlations Regression: fit the model (variable selection); interpret the model, t-test & f-test in regression; prediction; diagnostics (linearity, constant var, normality, independence, outliers).
The independent variable, the response The response: iq The independent variables: MILK: 0=no breast milk, 1=yes FEM: 0=male kid, 1=female WEEKS: weeks in ventilation SOCIAL: mum’s social class 1,2,3,4 with 1 being the highest RANK: birth order of the kid EDUC: mum’s education level 1,2,3,4,5 with 5 being the highest
Matrix of scatterplots
Correlation among iq, weeks, social, educ, rank
Matrix of correlations
Regression-fit the model Procedure Analyze Regression Linear Methods of determining independent variables
Methods (details in instruction 4 P18) Enter: The model is obtained with all specified variables. This is the default method. Stepwise Remove Backward: The variables are removed from the model one by one if the meet the criterion for removal (a maximum significance level or a minimum F value). Forward:
Regression-interpret model Interpretation of the output 1. variables entered/removed 2. model summaries (R, R^2) 3. ANOVA test (f-test)
Note on f-test To test overall significance of the model its null distribution: f-distribution To further construct extra-sum-of- squares f-test
4. Coefficients (estimation, t-test, CI of coefficients) t-test in i-th row CI of coefficients
Note on t-test and CI of coefficients t-test to test the significance of a single independent variable can be one-sided its null distribution: t-distribution 95% CI of coefficients estimation of the range of its coefficient with 95% confidence i.e. the 95% changing range of Y with 1 unit increase in its corresponding X
Regression-prediction Point estimation Confidence interval of the mean (CI) Prediction interval of one observation (PI) e.g.
Multiple Regression-Diagnostics Obtain plots to test the validity of the assumptions Linearity: Residuals vs predicted value (Y) / explanatory variable (X) Constant variance: Residuals vs predicted value (Y) / explanatory variable (X) Normality: QQ plot of residuals Independence: residuals versus the time order of the observations Outliers and influential observations:
What is an influential observation? An observation is influential if removing it markedly changes the estimated coefficients of the regression model. An outlier may be an influential observation.
To identify outliers and/or influential observations Studentized Residuals A case may be considered an outlier if the absolute value of its studentized residual exceeds 2. Leverage Values The leverage for an observation is larger than 2p/n would imply the observation has a high potential for influence. Cook ’ s Distances If Cook ’ s distance is close to or larger than 1, the case may be considered influential.
Miscellanies Multicollinearity it exists if the correlation between independent variables is close to or higher than 0.85 Remember to use Ln(WEEKS) from Question 5
Miscellanies Understanding meaning of 95% CI of coefficients Identify “full model” and “reduced model” when doing extra-sum-of- squares f-test