Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for ©

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Specification Error II
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
INTERPRETATION OF A REGRESSION EQUATION
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Chapter 7 Multicollinearity. What is in this Chapter? In Chapter 4 we stated that one of the assumptions in the basic regression model is that the explanatory.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Multiple Regression and Correlation Analysis
5-3 Inference on the Means of Two Populations, Variances Unknown
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Introduction to Regression Analysis, Chapter 13,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Fundamentals of Hypothesis Testing: One-Sample Tests
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Returning to Consumption
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Specification Error I.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Twelve.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 21: More About Test & Intervals
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Managerial Economics & Decision Sciences Department intro to dummy variables  dummy regressions  slope dummies  business analytics II Developed for.
Managerial Economics & Decision Sciences Department hypotheses, test and confidence intervals  linear regression: estimation and interpretation  linear.
Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed.
Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.
Managerial Economics & Decision Sciences Department hypotheses  tests  confidence intervals  business analytics II Developed for © 2016 kellogg school.
Managerial Economics & Decision Sciences Department intro to linear regression  underlying concepts for the linear regression  interpret linear regression.
Managerial Economics & Decision Sciences Department random variables  density functions  cumulative functions  business analytics II Developed for ©
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Session 6 omitted variables, spurious regression and multicollinearity ► omitted variables ► spurious regression ► multicollinearity Developed for Managerial.
business analytics II ▌assignment three - solutions pet food 
business analytics II ▌assignment four - solutions mba for yourself 
business analytics II ▌assignment three - solutions pet food 
QM222 Class 9 Section A1 Coefficient statistics
business analytics II ▌appendix – regression performance the R2 
assignment 7 solutions ► office networks ► super staffing
business analytics II ▌assignment one - solutions autoparts 
business analytics II ▌panel data models
Chapter 21 More About Tests.
business analytics II ▌applications fuel efficiency 
Multiple Regression Chapter 14.
Chapter 7: The Normality Assumption and Inference with OLS
Presentation transcript:

Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II ▌ mulicollinearity week 7 week 6 week 8 week 3

© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II readings ► statistics & econometrics ► (MSN)  define the multicollinearity  understand the effects of multicollinearity  detecting multicollinearity learning objectives  the vif command ►  Chapter 7 ► (CS)  Dubuque Hot Dog session seven multicollinearity business analytics II Developed for

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for introduction ◄ inflated standard deviations◄ the F  test ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page1 multicollinearity: fundamentals session seven …one too many… Remark.  This is not the same as an interaction/slope dummy, which is a new variable created to ask a specific question about whether the effect of one variable depends on the level of another  You can have multicollinearity without any interaction, and vice versa  When variables are multicollinear, their standard errors are inflated, which makes it more difficult to draw inferences about the impact of each one separately (significance issues) ► If two variables are highly correlated, then they tend to move in lockstep  As a result, they lack independent action ; in effect, they represent the “same experiment”  The regression may be able to determine that “the experiment” actually had an effect on y, but it may not be able to determine which of the two variables is responsible  Thus, each variable used individually may be significant, but when entered jointly, multicollinearity may lead to neither being significant key concept : multicollinearity ► Multicollinearity occurs when two or more explanatory (x) variables are highly correlated.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for introduction ◄ inflated standard deviations ◄ the F  test ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page2 an illustration: …the Dubuque hot dogs… session seven ► Dubuque Hot Dogs offers branded hot dogs and has as main competitors Oscar Mayer and Ball Park. There are two kinds of Ball Park hot dogs: regular and all-beef. ► Data is available in hotdog.dta with MKTDUB gives Dubuque’s weekly market share in decimal points, i.e means a 4% share Avg. prices (in cents) during each week for Dubuque, Oscar Mayer and the two Ball Park hot dogs ► We try to explain Dubuque’s market share using the price variables in a regression: own price Oscar’s price Ball Park prices Remark.  Should we expect any variables to exhibit multicollinearity?  If any variables are suspected of multicollinearity then probably the prices of Ball Park hot dogs are likely to be correlated to each other (similar production, distribution costs, etc). The fact that the two prices have a common underlying generating process is a good example of “same experiment”.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page3 an illustration: …the Dubuque hot dogs… session seven ► The regression results are shown in the table below: MKTDUB | Coef. Std. Err. t P>|t| [95% Conf. Interval] pdub | poscar | pbpreg | pbpbeef | _cons | Figure 1. Results for regression of MKTDUB on pdub, poscar, pbpreg and pbpbeef ► The coefficients for the two Ball Park price variables are insignificant. We mentioned that the main effect of multicollinearity is inflated standard deviations for the coefficients. But the standard deviation is basically the denominator of the t  test (for significance) thus in the presence of multicollinearity the t-test is small and the larger is the corresponding p  value. Thus we’ll tend to see variables as insignificant (due to multicollinearity) when in fact those variables might have explanatory power. pbpreg | Coef. Std. Err. t P>|t| pbpbeef | _cons | Figure 2. Results for regression of pbpreg on pbpbeef Remark. To emphasize the concept of action : since the two variables are highly correlated we cannot see a lot of “independent movement” between the two variables. This is the origin of the inability to disentangle the separate effects of two highly correlated variables on the y variable. introduction ◄ inflated standard deviations ◄ the F  test ◄

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page4 an illustration: …the Dubuque hot dogs… session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► In the initial regression we saw that pbpreg and pbpbeef are likely to be correlated and that might induce inflated standard deviations for the coefficients. But are the standard deviations really inflated? ► The command vif delivers a list of “variance inflation factors” for each coefficient: regress MKTDUB pdub poscar pbpreg pbpbeef vif Variable | VIF 1/VIF pbpreg | pbpbeef | poscar | pdub | Mean VIF | Figure 3. Results for vif command key concept : inflated standard deviations ► For a given coefficient if the VIF value is greater than 10 then we have evidence that the standard deviation for that coefficient is inflated and therefore it is likely that the p-value is larger than it should be, i.e. will tend to indicate that the coefficient is insignificant when in fact it might be significant.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page5 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► The fact that we detect inflated standard deviations does not guarantee automatically detection of multicollinearity. To identify multicollinearity we use the F-test. ► The F -test tells us whether one or more variables adds predictive power to a regression: ► In plain language: you are basically testing whether these variables are no more related to y than junk variables. Remark. The F-test for a single variable returns the same significance level as the t-test ► The F  test for a group of variables can be executed in STATA using the test or testparm command and listing the variables you wish to test after running a regression. hypothesis H 0 : all of the regression coefficients (  ) on the variables you are testing equal 0 H a : at least one of the regression coefficients (  ) is different from 0 testparm xvar1 xvar2 … xvark Remark. After STATA command testparm you should list the variable for which you want to test whether their coefficients are all different from zero.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page6 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► The F-test applied to pbpreg and pbpbeef provides the following result:. testparm pbpreg pbpbeef ( 1) pbpreg = 0 ( 2) pbpbeef = 0 F( 2, 108) = Prob > F = the null hypothesis (joint) the p  value ► The decision criterion is similar to the one we introduced in the context of the t  test. First choose a significance level  then compare the p  value with  : ► What exactly does it mean to reject the null in the context of the F  test ? It means that at least one of the coefficients is significantly different from zero. ► What if you fail to reject the null in the context of the F  test ? It means the group of tested variables is no more predictive than junk: drop the group. Figure 4. Results for testparm command decision If p–value <  reject H 0 otherwise cannot reject H 0 at the significance level 

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page7 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► If the F  test does reject the null, then one or more variables in the group has a nonzero coefficient but:  this does not imply that every variable in the group has a nonzero coefficient  you should still generally keep all of the group. ► Extra thought is needed when working with categorical variables, such as seasonality. Don’t be surprised if despite the significant F-test, none of categorical variables are individually significant.  remember that significance is relative to the omitted category  thus, you may have significant pairwise comparisons between included categories ► If you do have multicollinearity, then:  if your goal is to predict y or if the multicollinear variables are not the key predictors you are interested in then multicollinearity is not a problem  if the variables are key predictors and your multicollinearity is problematic, then perform an F-test on the group of variables. If the test is significant, then conclude that these variables collectively matter, even though you may be unable to sort out individual effects  do not conclude that one variable, but not the other, is what matters; if you knew which one(s) mattered and how, you would not have a problem with multicollinearity in the first place.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page8 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► We saw that the F  test detected multicollinearity and now we know that the reason we get the coefficients for pbpreg and pbpbeef insignificant is because of inflated standard deviations. Can we simply remove one of the two? Which one? MKTDUB | Coef. Std. Err. t P>|t| [95% Conf. Interval] pdub | poscar | pbpreg | _cons | MKTDUB | Coef. Std. Err. t P>|t| [95% Conf. Interval] pdub | poscar | pbpbeef | _cons | Remark. Keep only pbpreg and drop pbpbeef. Now pbpreg becomes significant ► In both cases the remaining variable (one of the two) becomes significant. This should not be a surprise: the F  test indicated that jointly they are insignificant but at least one of them is significant. Figure 5. Results for regression of MKTDUB on pdub, poscar and pbpreg Figure 6. Results for regression of MKTDUB on pdub, poscar and pbpbeef Remark. Keep only pbpbeef and drop pbpreg. Now pbpbeef becomes significant

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page9 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ ► Another interesting fact is that in both regressions in which we use only one of the two variables the coefficient of the included variable is very close to the sum of the coefficients of the two variables when both were in included in the regression: initial regression (both): b pbpreg  and b pbpbeef  (sum ) regression with pbpreg : b pbpreg  regression with pbpbeef : b pbpbeef  ► In fact in either of the two regressions in which only one variable is included, the coefficient of the included variable “picks up” the cumulative effect. ► Nevertheless, since we are not able (yet) to figure out how to split the “cumulative” effect, i.e. which of the two variables to include and therefore exclude the other one, we choose to continue the analysis with both included.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page10 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ quiz ► Which of the two competitors (Oscar or Ball Park) do you think has a stronger impact on “stealing” market share from Dubuque? MKTDUB | Coef. Std. Err. t P>|t| [95% Conf. Interval] pdub | poscar | pbpreg | pbpbeef | _cons | Figure 7. Results for regression of MKTDUB on pdub, poscar, pbpreg and pbpbeef ► By now it should be easy to identify that an increase with one cent for in Oscar’s price leads to an increase of % (coefficient of poscar ) in Dubuque’s market share while an increase with one cent in Ball Park’s products leads to an increase of % (sum of pbpreg and pbpbeef ) in Dubuque’s market share. It seems that Ball Park is a more serious competitor than Oscar.

Managerial Economics & Decision Sciences Department session seven multicollinearity business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page11 detecting multicollinearity session seven introduction ◄ inflated standard deviations ◄ the F  test ◄ quiz ► Can we test whether Ball Park’s change in price has a higher impact on Dubuque’s market share than a change in Oscar’s price? ► To prove that a Ball Park’s change in price has a higher impact on Dubuque’s market share than a change in Oscar’s price we need to prove that where the coefficients correspond to the regression: ► The hypotheses are:. klincom _b[pbpreg]+_b[pbpbeef]-_b[poscar] MKTDUB | Coef. Std. Err. t P>|t| [95% Conf. Interval] (1) | If Ha: < then Pr(T < t) =.906 If Ha: not = then Pr(|T| > |t|) =.187 If Ha: > then Pr(T > t) =.094 ► We cannot reject the null hypothesis for up to   10%; the point here is to realize that klincom is useful. hypothesis H 0 :  3   4   2  0 H a :  3   4   2  0 Figure 8. Results for klincom command