Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Multiple Regression and Model Building
Topic 12: Multiple Linear Regression
Here we add more independent variables to the regression.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
R2Y-InterceptTstat YintSlopeTstat Slope Education014% Education211% Education40% Education62% Education827%
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Example of Simple and Multiple Regression
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Chapter 13: Inference in Regression
Regression with 2 IVs Generalization of Regression from 1 to 2 Independent Variables.
Hypothesis Testing in Linear Regression Analysis
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Testing Hypotheses about Differences among Several Means.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Chapter 9 Multiple Linear Regression
Regression Techniques
Multiple Regression Analysis and Model Building
Quantitative Methods Simple Regression.
CHAPTER 29: Multiple Regression*
Multiple Regression Models
ANOVA Table Models can be evaluated by examining variability.
Multiple Linear Regression
Presentation transcript:

Model Selection1

1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each of the remaining k-1 variables. 4. Determine the best model that includes the previous best variable and one new best variable. 5. If either the adjusted-R 2 declines, the standard error of the regression increases, the t-statistic of the best variable is insignificant, or the coefficients are theoretically inconsistent, STOP, and use the previous best model. Repeat 2-4 until stopped or an all variable model has been reached. Model Selection2

3 -If the adjusted-R 2 declines when an additional variable is added, then the added value of the variable does not outweigh its modeling cost. -If the standard error increases then the additional variable has not improved estimation. -If the t-statistic of one of the variables is insignificant then there may be too many variables. -If the coefficients are inconsistent with theory may indicate multicollinearity effects.

1. Regress Y on all k potential X variables 2. Use t-tests to determine which X has the least amount of significance 3. If this X does not meet some minimum level of significance, remove it from the model 4. Regress Y on the set of k-1 X variables Repeat 2-4 until all remaining Xs meet minimum Model Selection4

Multiple Regression 15 The tests should be used one at a time. T 1 can tell you to drop X 1 and keep X 2 -X 6 T 2 can tell you to drop X 2 and keep X 1 and X 3 -X 6 Together, they don’t necessarily tell you to drop both and keep X 3 -X 6

Model Selection6 If t stat not significant, we can remove an X and simplify the model while still maintaining the model’s high Rsquare. Typical stopping rule Continue until all Xs meet some target “significance level to stay” (often.10 or.15 to keep more Xs).

 The forward and backward heuristics may or may not result in the same end model. Generally however the resulting models should be quite similar.  The backwards elimination model requires that you start with a model that includes all possible explanatory variables. But, for example, Excel will only conduct regression for up to 16 variables. Model Selection7

 When using many variables in a regression, it may be the case that some of the explanatory variables are highly correlated with other explanatory variables. In the extreme when two of the variables are linearly related, the multiple regression will fail as unstable.  Simple indicators are a failure of the F-test; an increase in Standard Error; insignificant t- statistic for a previously significant variable; theoretically inconsistent coefficients.  Recall also that when using a categorical variable, one of the categories must be “left out”. Model Selection8

 The variance-inflation-factors (VIFs) should be calculated after reaching a supposed stopping point in a multiple regression selection method.  The VIFs are calculated for each independent variable by regressing that INDEPENDENT VARIABLE against the other independent variables = 1 / (1-R 2 )  A simple rule-of-thumb is that the VIFs should be less than 4. Model Selection9

 The forward and backward heuristic rely on adding or deleting one variable at a time.  It is however possible to evaluate the statistical significance of including a set of variables by constructing the partial F- statistic. Model Selection10

Multiple regression 5 -- The partial F test11  Suppose there are r variables in the group  Define the full model to be the one with all Xs (all k predictors)  Define the reduced model to be the one with the group left out (it has k-r variables).

Multiple regression 5 -- The partial F test12  Look at the increase in the sum of squared errors SSE Reduced – SSE Full to see how much of the explained variation is lost.  Divide this by r, the number of variables in the group.  Put this in ratio to the MSE of the full model.  This is called the partial F statistic.

Multiple regression 5 -- The partial F test13 This has an F distribution with r numerator and (n-k-1) denominator degrees of freedom

Multiple regression 5 -- The partial F test14 Full Reduced

Multiple regression 5 -- The partial F test15 H o : Four variable coefficients are insignificant H 1 : at least one variable coefficient in the group is useful ( – )/ F = = = The correct F dist to test against is 4 numerator and 81 denominator degrees of freedom. The value for a (4,60) distribution is 2.53 at a significance level of.05 and 3.65 at a significance level of.01

Multiple Regression 4: Indicator Variables16 Extensions Two lines, different slopes More than two categories Multicategory, multislope

Multiple regression 5 -- The partial F test17  Recall that using the Executive variable alone created a salary model with two lines having different intercepts.  Adding the variable Alpha Experience resulted in a model also having two lines with different intercepts.  But, what if there is an interaction effect between Executive status and Alpha experience.

Multiple regression 5 -- The partial F test18  The Executive status variable has two categories: 0 and 1.  Create two variables from Alpha experience so that ◦ when Executive =0, Alpha retains its value, otherwise it equals 0. ◦ When Executive = 1, Alpha retains its value, otherwise it equals 0.  Using now three variables, Executive status and the two alpha variables will result in a model with two lines having different intercepts and different slopes capturing a simple interaction effect among the variables.

Model Selection19

Model Selection20

Model Selection21