MODEL BUILDING IN REGRESSION MODELS. Model Building and Multicollinearity Suppose we have five factors that we feel could linearly affect y. If all 5.

Slides:



Advertisements
Similar presentations
TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL.
Advertisements

Guide to Using Excel 2007 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 8th Ed. Chapter 15: Multiple.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Topic 15: General Linear Tests and Extra Sum of Squares.
Chapter 13 Multiple Regression
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
1 Multiple Regression Here we add more independent variables to the regression. In this section I focus on sections 13.1, 13.2 and 13.4.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Hypothesis Tests About  With  Unknown. Hypothesis Testing (Revisited) Five Step Procedure 1.Define Opposing Hypotheses. (  ) 2.Choose a level of risk.
Chapter 12b Testing for significance—the t-test Developing confidence intervals for estimates of β 1. Testing for significance—the f-test Using Excel’s.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
ForecastingModelsWith Trend and Seasonal Effects.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Lecture 23 Multiple Regression (Sections )
ANOVA Single Factor Models Single Factor Models. ANOVA ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Two Population Means Hypothesis Testing and Confidence Intervals With Unknown Standard Deviations.
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
THE MULTIPLE REGRESSION MODEL. MULTIPLE REGRESSION In a multiple regression we are trying to evaluate the cumulative effects that changes to more than.
Multiple Regression. Want to find the best linear relationship between a dependent variable, Y, (Price), and 3 independent variables X 1 (Sq. Feet), X.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Multiple Regression
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
ANOVA (Analysis of Variance) by Aziza Munir
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Analysis A statistical procedure used to find relations among a set of variables.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
Lecture 10: Correlation and Regression Model.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
ENGR 610 Applied Statistics Fall Week 12 Marshall University CITE Jack Smith.
Chapter 14 Introduction to Multiple Regression
Multiple Regression Analysis and Model Building
Multiple Regression Models
STA 282 – Regression Analysis
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Presentation transcript:

MODEL BUILDING IN REGRESSION MODELS

Model Building and Multicollinearity Suppose we have five factors that we feel could linearly affect y. If all 5 are included we have: y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  But while the p-value for the F-test (Significance F) might be small, one or more (if not all) of the p- values for the individual t-tests may be large. Question: Which factors make up the “best” model? –This is called model building

Model Building There many approaches to model building –Elimination of some (all) of the variables with high p-values is one approach Forward stepwise regression “builds” the model by adding one variable at a time. Modified F-tests can be used to test if the a certain subset of the variables should be included in the model.

The Stepwise Regression Approach y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  Step 1: Run five simple linear regressions: –y =  0 +  1 x 1 –y =  0 +  2 x 2 –y =  0 +  3 x 3 –y =  0 +  4 x 4 –y =  0 +  5 x 5 Check the p-values for each – –Note for simple linear regression Significance F = p-value for the t-test. Suppose this model has lowest p-value (< α)

Stepwise Regression Step 2: Run four 2-variable linear regressions: Check Significance F and p-values for: –y =  0 +  4 x 4 +  1 x 1 –y =  0 +  4 x 4 +  2 x 2 –y =  0 +  4 x 4 +  3 x 3 –y =  0 +  4 x 4 +  5 x 5 Suppose lowest p-values (< α) Add X3

Stepwise Regression Step 3: Run three 3-variable linear regressions: –y =  0 +  3 x 3 +  4 x 4 +  1 x 1 –y =  0 +  3 x 3 +  4 x 4 +  2 x 2 –y =  0 +  3 x 3 +  4 x 4 +  5 x 5 Suppose none of these models have all p-values < α -- STOP -- best model is the one with x 3 and x 4 only

Example

Regression on 5 Variables

Summary of Results from 1-Variable Tests

Performing Tests With More Than One Variable Remember the Range for X must be contiguous CUTINSERT CUT CELLSUse CUT and INSERT CUT CELLS to arrange the X columns so that they are next to each other

Summary of Results From 2-Variable Tests

Summary of Results from 3-Variable Tests

Summary of Results from 4-Variable Tests

Best Model The best model is the three-variable model that includes x 1, x 4, and x 5.

TESTING PARTS OF THE MODEL Sometimes we wish to see whether to keep a set of variables “as a group” or eliminate them from the model. –Example: Model might include 3 dummy variables to account for how the independent variable is affected by a particular season (or quarter) of the year. Will either keep all seasons or will keep none The general approach is to assess how much “extra value” these additional variables will add to the model. –Approach is a Modified F-test

Approach: Compare Two Models – The Full Model and The Reduced Model Suppose a model consists of p variables and we wish to consider whether or not to keep a set of p-q of those p variables in the model. Two models –Full model – p variables –Reduced model – q variables For notational convenience, assume the last p-q of the p variables are the ones that would be eliminated. –Sample of size n is taken

The Modified F-Test Modified F-Test: H 0 : β q+1 = β q+2 =..… = β p = 0 H A : At least one of these p-q β’s ≠ 0 This is an F-test of the form: Reject H 0 (Accept H A ) if: F > F α,p-q,n-p-1 # variables considered for elimination Degrees of Freedom for the Error Term of the Full Model

The Modified F-Statistic For this model, the F-statistic is defined by:

Example A housing price model (Full model) is proposed for homes in Laguna Hills that takes into account p = 5 factors: –House size, Lot Size, Age, Whether or not there is a pool, # Bedrooms A reduced model that takes into account only the first of these (q = 3) was discussed earlier. Based on a sample of n = 38 sales, can we conclude that adding these p-q = 2 additional variables (Pool, # Bedrooms) is significant?

The Modified F-Test For This Example Modified F-Test: H 0 : β 4 = β 5 = 0 H A : At least one of β 4 and β 5 ≠ 0 For α =.05, the test is Reject H 0 (Accept H A ) if: F > F.05,2,32 F.05,2,32 can be generated in Excel by FINV(.05,2,32) = 3.29.

Full Model SSE Full MSE Full DFE Full

Reduced Model SSE Reduced

The Partial F-Test =((G3-C13)/2)/D13 =FINV(.05,2,B13) SSE from Output Reduced Worksheet

The Modified F-Statistic For this model, the modified F-statistic is: The critical value of F = F.05,2,32 = > There is enough evidence to conclude that including Pool and Bedrooms is significant.

Review Stepwise regression helps determine a “best model” from a series of possible independent variables (x’s) –Approach – Step 1 – Run one variable regressions –If there is a p-value < , keep the variable with lowest p-value as a variable in the model Step 2 – Run 2-variable regressions –One of the two variables in each model is the one determined in Step 1 –Keep the one with the lowest p-values if both are <  Repeat with 3, 4, 5 variables, etc. until no model as has p-values <  Modified F-test for testing the significance of parts of the model –Compare F to F α,p-q,DFE(Full), where F= ((SSE Reduced – SSE Full )/(#terms removed))/MSE Full