STA302/1001 - week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 10 Simple Regression.
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 12 Simple Regression
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
T-test.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Ch. 14: The Multiple Regression Model building
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Objectives of Multiple Regression
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
Introduction to Linear Regression
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Chapter 4 Basic Estimation Techniques
Decomposition of Sum of Squares
Basic Estimation Techniques
Analysis of Variance in Matrix form
Chapter 11 Simple Regression
Basic Estimation Techniques
Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their.
Presentation transcript:

STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons to use multiple regression models are:  Often multiple X’s arise naturally from a study.  We want to control for some X’s  Want to fit a polynomial  Compare regression lines for two or more groups

STA302/ week 922 Multiple Linear Regression Model In a multiple linear regression model there are p predictor variables. The model is This model is linear in the β’s. The variables may be non-linear, e.g., log(X 1 ), X 1 *X 2 etc. We need to estimate p +1 β’s and σ 2. There are p+2 parameters in this model and so we need at least that many observations to be able to estimate them, i.e., need n > p +2.

STA302/ week 933 Multiple Regression Model in Matrix Form In matrix notation the multiple regression model is: Y=Xβ + ε where Note, Y and ε are vectors, β is a vector and X is a matrix. The matrix X is called the ‘design matrix’. The Gauss-Markov assumptions are: E(ε | X) = 0, Var(ε | X) = σ 2 I. These result in E(Y | X) = 0, Var(Y | X) = σ 2 I. The Least-Square estimate of β is

STA302/ week 944 Estimate of σ 2 The estimate of σ 2 is: It has n-p-1 degrees of freedom because… Claim: s 2 is unbiased estimator of σ 2. Proof:

STA302/ week 955 General Comments about Multiple Regression The regression equation gives the mean response for each combination of explanatory variables. The regression equation will not be useful if it is very complicated or a function of large number of explanatory variables. We generally want “parsimonious” model, that is, a model that is as simple as possible to adequately describe the response variable. It is unwise to think that there is some exact, discoverable equation. Many possible models are available. One or two models may adequately approximate the mean of the response variable.

STA302/ week 966 Example – House Prices in Chicago Data of 26 house sales in Chicago were collected (clearly collected some time ago). The variables in the data set are: price - selling price in $1000's bdr - number of bedrooms flr - floor space in square feet fp - number of fireplaces rms - number of rooms st - storm windows (1 if present, 0 absent) lot - lot size (frontage) in feet bth - number of bathrooms gar - garage size (0=no garage, 1=one-car garage, etc.)

STA302/ week 977 Interpreting Regression Coefficients In general, in multiple regression we interpret the coefficient of the j th predictor variable (β j or b j ) as the change in Y associated with a change of one unit in X j with all the other variables held constant. Note, that it may be impossible to hold all other variables constants. Example, re the home price example above, for 100 extra square feet (everything else held constant), the price goes up by $1760 on average. For one more room (everything else held constant), the price goes up by $3900 on average. For one more bedroom (everything else held constant), the price goes down by $7700 on average.

STA302/ week 988 Inference for Regression Coefficients As in simple linear regression, we are interesting in testing: H 0 : β j = 0 versus H a : β j ≠ 0. The test statistics is It has a t-distribution with n-p-1 degrees of freedom. We can calculate the P-value from the t-table with n-p-1 df. This test gives an indication of whether or not the jth predictor variable statistically significant contributes to the prediction of the response variable over and above all the other predictor variables. Confidence interval for β j (assuming all the other predictor variables are in the model) is given by:

STA302/ week 999 ANOVA Table The ANOVA table in multiple regression model is given by…

STA302/ week 910 Coefficient of Multiple Determination – R 2 As in simple linear regression model, R 2 = SSReg/SST. In multiple regression this is called the “coefficient of multiple determination”; it is not the square of a correlation coefficient. In multiple regression, need to be cautious judging model with R 2 because it always goes up when more predictor variables are added to the model, regardless of whether the predictor variables are useful for predicting Y.

STA302/ week 911 Adjusted R 2 An attempt to make R 2 more useful is to calculate Adjusted R 2 (“Adj R-Sq” in SAS) Adjusted R 2 is adjusted for the number of predictor variables in the model. It can actually go down when more predictors are added. It can be used for choosing the best model. It is defined as Note that Adjusted R 2 will increase only is MSE decrease.

STA302/ week 912 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess whether or not the model have any predictive ability. The test statistics is If H 0 is true, the above test statistics has an F distribution with p, n-p-1 degrees of freedom.

STA302/ week 913 F-Test versus t-Tests in Multiple Regression In multiple regression, the F test is designed to test the overall model while the t tests are designed to test individual coefficients. If the F-test is significant and all or some of the t-tests are significant, then there are some useful explanatory variables for predicting Y. If the F-test is not significant (large P-value), and all the t-tests are not significant, it means that no explanatory variable contribute to the prediction of Y. If the F-test is significant and all the t-tests are not significant, then it is an indication of “multicolinearity” – i.e., correlated X’s. It means that individual X’s don’t contribute to the prediction of Y over and above other X’s.

STA302/ week 914 If the F-test is not significant and some of the t-tests are significant, it is an indication of one of two things:  The model has no predictive ability but if there are many predictors, a few may have small P-value (type I error in t-tests).  Predictors were chosen poorly. If one useful predictor is added to many that are unrelated to the outcome its contribution may not be enough for model to be significant (F-test).