Chapter 3 Multiple Linear Regression

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Chapter 12 Simple Linear Regression
Probability & Statistical Inference Lecture 9
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
The General Linear Model. The Simple Linear Model Linear Regression.
Chapter 12 Simple Linear Regression
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
The Simple Regression Model
Multiple Linear Regression
Introduction to Probability and Statistics Linear Regression and Correlation.
Multiple Regression and Correlation Analysis
Chapter 9 Multicollinearity
Linear Regression Analysis 5E Montgomery, Peck & Vining Hidden Extrapolation in Multiple Regression In prediction, exercise care about potentially.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
BCOR 1020 Business Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Design of Engineering Experiments Part 5 – The 2k Factorial Design
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Linear Regression Analysis 5E Montgomery, Peck & Vining
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Applied Statistics and Probability for Engineers
Regression and Correlation
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Chapter 11: Simple Linear Regression
Regression Diagnostics
Essentials of Modern Business Statistics (7e)
John Loucks St. Edward’s University . SLIDES . BY.
Elementary Statistics
Slides by JOHN LOUCKS St. Edward’s University.
Business Statistics Multiple Regression This lecture flows well with
Regression Model Building - Diagnostics
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Prepared by Lee Revere and John Large
Multiple Regression Models
Simple Linear Regression
Regression Model Building - Diagnostics
Example 3.3 Delivery Time Data
Model Adequacy Checking
MGS 3100 Business Analysis Regression Feb 18, 2016
St. Edward’s University
Presentation transcript:

Chapter 3 Multiple Linear Regression Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends on temperature and the catalyst concentration. A multiple regression model that might describe this relationship is This is a multiple linear regression model in two variables. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models Figure 3.1 (a) The regression plane for the model E(y)= 50+10x1+7x2 . (b) The contour plot. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models In general, the multiple linear regression model with k regressors is Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models Linear regression models may also contain interaction effects: If we let x3 = x1x2 and 3 = 12, then the model can be written in the form Linear Regression Analysis 5E Montgomery, Peck & Vining

3.1 Multiple Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

3.2 Estimation of the Model Parameters 3.2.1 Least Squares Estimation of the Regression Coefficients Notation n – number of observations available k – number of regressor variables y – response or dependent variable xij – ith observation or level of regressor j. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of Regression Coefficients Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients The sample regression model can be written as Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients The least squares function is The function S must be minimized with respect to the coefficients. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients The least squares estimates of the coefficients must satisfy Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients Simplifying, we obtain the least squares normal equations: The ordinary least squares estimators are the solutions to the normal equations. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients Matrix notation is typically used: Let where Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients These are the least-squares normal equations. The solution is Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.1 Least Squares Estimation of the Regression Coefficients The n residuals can be written in matrix form as There will be some situations where an alternative form will prove useful Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1. The Delivery Time Data The model of interest is y = 0 + 1x1+ 2x2 +  Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1. The Delivery Time Data Figure 3.4 Scatterplot matrix for the delivery time data from Example 3.1. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1 The Delivery Time Data Figure 3.5 Three-dimensional scatterplot of the delivery time data from Example 3.1. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1 The Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1 The Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3-1 The Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining MINITAB Output Linear Regression Analysis 5E Montgomery, Peck & Vining

3.2.2 Geometrical Interpretation of Least-Squares Linear Regression Analysis 5E Montgomery, Peck & Vining

3.2.3 Properties of Least-Squares Estimators Statistical Properties Variances/Covariances Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.4 Estimation of 2 The residual sum of squares can be shown to be: The residual mean square for the model with p parameters is: Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.2.4 Estimation of 2 Recall that the estimator of 2 is model dependent - that is, change the form of the model and the estimate of 2 will invariably change. Note that the variance estimate is a function of the errors; “unexplained noise about the fitted regression line” Linear Regression Analysis 5E Montgomery, Peck & Vining

Example 3.2 Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

Example 3.2 Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression Scatter diagrams of the regressor variable(s) against the response may be of little value in multiple regression. These plots can actually be misleading If there is an interdependency between two or more regressor variables, the true relationship between xi and y may be masked. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Illustration of the Inadequacy of Scatter Diagrams in Multiple Regression Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3 Hypothesis Testing in Multiple Linear Regression Once we have estimated the parameters in the model, we face two immediate questions: 1. What is the overall adequacy of the model? 2. Which specific regressors seem important? Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3 Hypothesis Testing in Multiple Linear Regression This section considers four cases: Test for Significance of Regression (sometimes called the global test of model adequacy) Tests on Individual Regression Coefficients (or groups of coefficients) Special Case of Hypothesis Testing with Orthogonal Columns in X Testing the General Linear Hypothesis Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression The test for significance is a test to determine if there is a linear relationship between the response and any of the regressor variables The hypotheses are H0: 1 = 2 = …= k = 0 H1: j  0 for at least one j Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression As in Chapter 2, the total sum of squares can be partitioned in two parts: SST = SSR + SSRes This leads to an ANOVA procedure with the test (F) statistic Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression The standard ANOVA is conducted with Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression ANOVA Table: Reject H0 if Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Example 3.3 Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

Example 3.3 Delivery Time Data To test H0: 1 = 2 = 0, we calculate the F–statistic: Linear Regression Analysis 5E Montgomery, Peck & Vining

Example 3.3 Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression R2 is calculated exactly as in simple linear regression R2 can be inflated simply by adding more terms to the model (even insignificant terms) Adjusted R2 Penalizes you for added terms to the model that are not significant Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.1 Test for Significance of Regression Adjusted R2 Example Say SST = 245.00, n = 15 Suppose that for a model with three regressors, the SSres = 90.00, then Now suppose that a fourth regressor has been added, and the SSRes = 88.00 Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.2 Tests on Individual Regression Coefficients Hypothesis test on any single regression coefficient: Test Statistic: Reject H0 if |t0| > This is a partial or marginal test! See Example 3.4, pg. 88, text Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining The Extra Sum of Squares method can also be used to test hypotheses on individual model parameters or groups of parameters Full model Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

3.3.3 Special Case of Orthogonal Columns in X If the columns X1 are orthogonal to the columns in X2, the sum of squares due to 2 that is free of any dependence on the the regressors in X1. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example Consider a dataset with four regressor variables and a single response. Fit the equation with all regressors and find that: y = - 19.9 + 0.0123x1 + 27.3x2 - 0.0655x3 - 0.196x4 Looking at the t-tests, suppose that x3 is insignificant. So it is removed. What is the equation now? Generally, it is not y = - 19.9 + 0.0123x1 + 27.3x2 - 0.196x4 Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example The model must be refit with the insignificant regressors left out of the model. The regression equation is y = - 24.9 + 0.0117x1 + 31.0x2 - 0.217x4 The refitting must be done since the coefficient estimates for an individual regressor depend on all of the regressors, xj Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example However, if the columns are orthogonal to each other, then there is no need to refit. Can you think of some situations where we would have orthogonal columns? Linear Regression Analysis 5E Montgomery, Peck & Vining

3.4.1. Confidence Intervals on the Regression Coefficients A 100(1-) percent C.I. for the regression coefficient, j is: Or, Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

3.4.2. Confidence Interval Estimation of the Mean Response 100(1-) percent CI on the mean response at the point x01, x02, …, x0k is See Example 3-9 on page 95 and the discussion that follows Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

3.4.3. Simultaneous Confidence Intervals on Regression Coefficients It can be shown that From this result, the joint confidence region for all parameters in  is Linear Regression Analysis 5E Montgomery, Peck & Vining

3.5 Prediction of New Observations A 100(1-) percent prediction interval for a future observation is Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

3.6 Hidden Extrapolation in Multiple Regression In prediction, exercise care about potentially extrapolating beyond the region containing the original observations. Figure 3.10 An example of extrapolation in multiple regression. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.6 Hidden Extrapolation in Multiple Regression We will define the smallest convex set containing all of the original n data points (xi1, xi2, … xik), i = 1, 2, …, n, as the regressor variable hull RVH. If a point x01, x02, …, x0k lies inside or on the boundary of the RVH, then prediction or estimation involves interpolation, while if this point lies outside the RVH, extrapolation is required. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.6 Hidden Extrapolation in Multiple Regression Diagonal elements of the matrix H = X(X’X)-1X’ can aid in determining if hidden extrapolation exists: The set of points x (not necessarily data points used to fit the model) that satisfy is an ellipsoid enclosing all points inside the RVH. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.6 Hidden Extrapolation in Multiple Regression Let x0 be a point at which prediction or estimation is of interest. Then If h00 > hmax then the point is a point of extrapolation. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3.13 Consider prediction or estimation at: Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining #9 d c a b Figure 3.10 Scatterplot of cases and distance for the delivery time data. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients It is often difficult to directly compare regression coefficients due to possible varying dimensions. It may be beneficial to work with dimensionless regression coefficients. Dimensionless regression coefficients are often referred to as standardized regression coefficients. Two common methods of scaling: Unit normal scaling Unit length scaling Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Normal Scaling The first approach employs unit normal scaling for the regressors and the response variable. That is, Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Normal Scaling where Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Normal Scaling All of the scaled regressors and the scaled response have sample mean equal to zero and sample variance equal to 1. The model becomes The least squares estimator: Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Length Scaling In unit length scaling: Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Length Scaling Each regressor has mean 0 and length The regression model becomes The vector of least squares regression coefficients: Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Unit Length Scaling In unit length scaling, the W’W matrix is in the form of a correlation matrix: where rij is the simple correlation between xi and xj. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.7 Standardized Regression Coefficients Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3.14 Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3.14 Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3.14 Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining Example 3.14 Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.8 Multicollinearity A serious problem that may dramatically impact the usefulness of a regression model is multicollinearity, or near-linear dependence among the regression variables. Multicollinearity implies near-linear dependence among the regressors. The regressors are the columns of the X matrix, so clearly an exact linear dependence would result in a singular X’X. The presence of multicollinearity can dramatically impact the ability to estimate regression coefficients and other uses of the regression model. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.8 Multicollinearity Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.8 Multicollinearity The main diagonal elements of the inverse of the X’X matrix in correlation form (W’W)-1 are often called variance inflation factors VIFs, and they are an important multicollinearity diagnostic. For the soft drink delivery data, Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 3.8 Multicollinearity The variance inflation factors can also be written as: where R2j is the coefficient of multiple determination obtained from regressing xj on the other regressor variables. If xj is highly correlated with any other regressor variable, then R2j will be large. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.9 Why Do Regression Coefficients Have The Wrong Sign? Regression coefficients may have the wrong sign for the following reasons: The range of some of the regressors is too small. Important regressors have not been included in the model. Multicollinearity is present. Computational errors have been made. Linear Regression Analysis 5E Montgomery, Peck & Vining

3.9 Why Do Regression Coefficients Have The Wrong Sign? Linear Regression Analysis 5E Montgomery, Peck & Vining