Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Chapter 10 Simple Regression.
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
BA 555 Practical Business Analysis
Lecture 25 Multiple Regression Diagnostics (Sections )
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Pertemua 19 Regresi Linier
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Inference for regression - Simple linear regression
Chapter 12 Multiple Regression and Model Building.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 12: Correlation and Linear Regression 1.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
CHAPTER 12 More About Regression
(Residuals and
Regression model Y represents a value of the response variable.
Diagnostics and Transformation for SLR
CHAPTER 29: Multiple Regression*
CHAPTER 12 More About Regression
Chapter 13 Additional Topics in Regression Analysis
CHAPTER 12 More About Regression
Inferences for Regression
Diagnostics and Transformation for SLR
Inference for Regression
Presentation transcript:

Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).

Assumptions of Multiple Linear Regression Model 1.Linearity: 2.Constant variance: The standard deviation of Y for the subpopulation of units with is the same for all subpopulations. 3.Normality: The distribution of Y for the subpopulation of units with is normally distributed for all subpopulations. 4.The observations are independent [For time series; we will cover later.]

Assumptions for linear regression and their importance to inferences InferenceAssumptions that are important Point prediction, point estimation Linearity, independence Confidence interval for slope, hypothesis test for slope, confidence interval for mean response Linearity, constant variance, independence, normality (only if n<30) Prediction intervalLinearity, constant variance, independence, normality

Fast Food Chain Data

Checking Linearity Plot residuals versus each of the explanatory variables. Each of these plots should look like random scatter, with no pattern in the mean of the residuals. If residual plots show a problem, then we could try to transform the x-variable and/or the y-variable. Residual Plot: Use Fit Y by X with Y being Residuals. Fit Line will draw horizontal Line.

Residual Plots in JMP After Fit Model, click red triangle next to Response, click Save Columns and click Residuals. Use Fit Y by X with Y=Residuals and X the explanatory variable of interest. Fit Line will draw a horizontal line with intercept zero. It is a property of the residuals from multiple linear regression that a least squares regression of the residuals on an explanatory variable has slope zero and intercept zero.

Residual by Predicted Plot Fit Model displays the Residual by Predicted Plot automatically in its output. The plot is a plot of the residuals versus the predicted Y’s, We can think of the predicted Y’s as summarizing all the information in the X’s. As usual we would like this plot to show random scatter. Pattern in the mean of the residuals as the predicted Y’s increase: Indicates problem with linearity. Look at residual plots versus each explanatory variable to isolate problem and consider transformations. Pattern in the spread of the residuals: Indicates problem with constant variance.

Corrections for Violations of the Linearity Assumption When the residual plot shows a pattern in the mean of the residuals for one of the explanatory variables X j, we should consider: –Transforming the X j. –Adding polynomial variables in X j — –Transforming Y After making the transformation/adding polynomials, we need to refit the model and look at the new residual plot vs. X to see if linearity has been achieved.

Quadratic Polynomials for Age and Income

Linearity now appears to be satisfied.

Checking Constant Variance Assumption Residual plot versus explanatory variables should exhibit constant variance. Residual plot versus predicted values should exhibit constant variance (this plot is often most useful for detecting nonconstant variance)

Heteroscedasticity When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y The spread increases with y ^ y ^ Residual ^ y

A brief list of transformations »y’ = y 1/2 (for y > 0) Use when the RMSE increases with »y’ = log y (for y > 0) Use when the RMSE increases with Use when the residual distribution is skewed to the right. »y’ = y 2 Use when the RMSE is decreasing with, or Use when the residual distribution is left skewed Reducing Nonconstant Variance/Nonnormality by Transformations

Checking whether a transformation of Y works for remedying Non- constant variance 1.Create a new column with the transformation of the Y variable by right clicking in the new column and clicking Formula and putting in the appropriate formula for the transformation (Note: Log is contained in the class of transcendental functions) 2.Fit the regression of the transformation of Y on the X variables 3.Check the residual by predicted plot to see if the spread of the residuals appears constant over the range of predicted values.

Interpreting coefficients when response is logged, explanatory variables not logged

Checking Normality To check normality, we use a normal quantile plot. Normality is a reaonable assumption if all the residuals are within the dashed confidence bands. If a residual is outside the dashed confidence bands, it indicates a violaton of normality.

Normality does not appear to hold. Some of the residuals fall outside the dotted line confidence bands.

Normality appears to hold. All residuals within dotted confidence bands.

Importance of Normality and Corrections for Normality For point estimation/confidence intervals/tests of coefficients and confidence intervals for mean response, normality of residuals is only important for small samples because of Central Limit Theorem. Guideline: Do not need to worry about normality if there are 30 observations plus 10 additional observations for each additional explanatory variable in multiple regression beyond the first one. For prediction intervals, normality is critical for all sample sizes. Corrections for normality: transformations of y variable.

A brief list of transformations »y’ = y 1/2 (for y > 0) Use when the spread of residuals increases with »y’ = log y (for y > 0) Use when the spread of residuals increases with Use when the distribution of the residuals is skewed to the right. »y’ = y 2 Use when the spread of residuals is decreasing with, Use when the error distribution is left skewed Reducing Nonconstant Variance/Nonnormality by Transformations

Order of Correction of Violations of Assumptions in Multiple Regression First, focus on correcting a violation of the linearity assumption. Then, focus on correction violations of constant variance after the linearity assumption is satifised. If constant variance is achieved, make sure that linearity still holds approximately. Then, focus on correctiong violations of normality. If normality is achieved, make sure that linearity and constant variance still approximately hold.