Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics.

Similar presentations


Presentation on theme: "Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics."— Presentation transcript:

1 Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II ▌ omitted variables bias and spurious regressions week 6 week 5 week 7 week 3

2 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II readings ► statistics & econometrics ► (MSN)  define the omitted variable bias  construct the “influence diagram”  datamining and coping with omitted variable(s) bias  detecting spurious regression learning objectives  correlation ►  Chapter 7 ► (CS)  Refrigerator Pricing session six omitted variable bias business analytics II Developed for

3 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page1 introduction: energy cost and refrigerator pricing session six counter-intuitive results… ► We are provided with information on 41 popular models of refrigerators and the data is given in the file newfridge.dta (the relevant variables for this case are: Price, which gives the refrigerator price in $, energycost, which gives the annual energy cost of running the refrigerator in $ /year and volume_cuinches, which gives the volume in cubic inches). The results of the regression Price on energycost are shown below: Price | Coef. Std. Err. -----------+----------------------- energycost | 17.14957 6.075478 _cons | 300.1567 290.463 Remark. A decrease of $20 in energycost implies a decrease in selling price of 17.14957·(  $ 20)   $ 342.9914. This is counter-intuitive: for a refrigerator that consumes less energy, one would expect the Price to increase. ► A simple inspection of the variables available in the study reveals other features that are probably relevant in the pricing decision such as volume ( volume_cuinches ). Let’s include this variable in the regression ( as an independent variable with the assumption that it affects the selling price ): Price | Coef. Std. Err. ----------------+----------------------- energycost | -2.427965 13.02064 volume_cuinches |.0217692.0128861 _cons | -342.8964 474.8011 Remark. A decrease of $20 in energycost implies an increase in selling price of (  2.427965)·(  $ 20)  $ 48.5593 holding the volume constant. This means that for the same features (such as volume) if the spending on energy consumed by refrigerator is lower you’d expect to be able to sell the refrigerator at a higher price. It seems that we are on the right track.. Figure 1. Abridged results for regression Figure 2. Abridged results for regression

4 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page2 omitted variables session six what if a variable is missing? ► We’ll consider for the moment a hypothetical situation characterized by: - independent variables x and z - dependent variable y ► The true relations between these variables are: ► Ideally when studying the interaction between x, z and y we would use the above true relations. However, it’s more often than not that we might end up using (intentional, constrained by data availability or lack of a in depth analysis of causal/correlation effects) a truncated relation. ► The truncated relation between x and y is: true causal correlation truncated causal key concept : true, truncated relations and omitted variable bias  a true relation describes the relation between y and all relevant independent variables that affecting y  a truncated relation describes the relation between y and only a few independent variables affecting y  the effect, if any, on and (when compared to b 0 and b 1 ) is called the omitted variable bias.

5 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page3 omitted variables session six what if a variable is missing? key issue understand and quantify the omitted variable bias ► For the specific model introduced: the key problems we want to address is understanding:  true effect on y of a change in x given the true relations  perceived effect on y of a change in x given the truncated relation and eventually  quantify the omitted variable bias true causal correlation true relations truncated relations truncated causal

6 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page4 omitted variables session six what if a variable is missing? ► There are three channels “at work” in propagating the change in x to a change in y for the true causal relation :  direct channel a change in x results in a change in y through the causal relation direct causal effect : one unit change in x results in b 1 units change in y  correlation channel a change in x is associated with a change in z through the correlation relation correlation effect : one unit change in x results in a 1 units change in z  indirect channel a change in z results in a change in y through the causal relation indirect causal effect : one unit change in z results in b 2 units change in y Remark. Notice how the total effect on y is a result of two separate channels: direct and correlation-indirect. direct channel indirect channel correlation channel true causal correlation Figure 3. “Propagation channels” for true causal relation perceived direct channel indirect channel correlation channel truncated causal correlation Figure 4. “Propagation channel” for truncated causal relation y  b 0  b 1  x  b 2  z * z  a 0  a 1  x

7 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page5 omitted variables session six what if a variable is missing? ► Consider a change in x (recall the symbol for this  x ):  y  b 1   x   x direct channel indirect channel  y  b 2  a 1   x  y  b 2   z   z  a 1   x   x  z  a 1   x   x correlation channel true causal correlation true effect  y  ( b 1  b 2  a 1 )   x  true relation  truncated relation  y  b 1   x   x perceived direct channel * truncated effect  y  b 1   x * truncated causal

8 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page6 omitted variables session six what if a variable is missing? direct channel indirect channel correlation channel true causal correlation true effect  y  ( b 1  b 2  a 1 )   x  true relation  truncated relation perceived direct channel * truncated effect  y  b 1   x * truncated causal ► In the truncated relation we omitted the variable z and we estimate the value of the coefficient. By ignoring the indirect channel, the coefficient will actually “pick up” the perceived effect determined above: true effect of x on y (direct channel) omitted variable bias (correlation-indirect channel) perceived effect of x on y (perceived direct channel)

9 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page7 omitted variables session six what if a variable is missing?  y  4   x   x direct channel indirect channel  y  10   x  y  2   z   z  5   x   x  z  5   x   x correlation channel true causal correlation total effect  y  14   x  overestimation of b 1 this is the b 1 when y is regressed on x only * this is the b 1 when y is regressed on x and z ► Notice how the total effect on y of a change in x by one unit on y is 14 and this is what the truncated regression would indicate. However the direct effect on y of a change in x by one unit on y is 4. ► In this case b 1  14 and b 1  4. Thus b 1 is overestimated by 10 units which is the omitted variable bias: ovb  b 1  b 1  10 * *

10 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page8 omitted variables session six what if a variable is missing?  y  4   x   x direct channel indirect channel  y   10   x  y  2   z   z   5   x   x  z   5   x   x correlation channel true causal correlation total effect  y   6   x  underestimation of b 1 this is the b 1 when y is regressed on x only * this is the b 1 when y is regressed on x and z ► Notice how the total effect on y of a change in x by one unit on y is  6 and this is what the truncated regression would indicate. However the direct effect on y of a change in x by one unit on y is 14. ► In this case b 1   6 and b 1  4. Thus b 1 is underestimated by 10 units which is the omitted variable bias: ovb  b 1  b 1   10 * *

11 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page9 omitted variables session six what if a variable is missing? key concept : omitted variable bias ► The omitted variables effect ( ovb ) is defined as the difference between the coefficient based on the truncated equation and the coefficient obtained based on the true/causal relation). ► The ovb is easily derived with a bit of algebra using the three equations: Remark. Notice that the ovb 1 depends on the combination of both magnitudes and signs of the relation between z and y (through b 2 ) and between z and x (through a 1 ). The table below summarizes the effect of a one unit increase in x to the change in y as captured by the truncated relation: b 2  0 b 2  0 a 1  0 overestimationunderestimation a 1  0 underestimationoverestimation Figure 5. Over- and under-estimation of true effect

12 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page10 omitted variables session six what if a variable is missing? ► To summarize: Remarks:  if there is no correlation effect between x and z, i.e. a 1  0 then ovb 1  0 and therefore using the truncated equation gives the correct effect of x on y  to continue with the previous situation: even with no correlation effect between x and z using the truncated equation gives a biased estimate of the constant as long as b 2  0  the no correlation effect between x and z case implies that using the truncated equation will provide the correct slope but a biased constant thus inference: - on changes in y is still correct when based on the truncated equation - on the level of y is biased when based on the truncated equation omitted variable bias for constant omitted variable bias for coefficient

13 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page11 energy cost and refrigerator pricing session six analysis revisited ► Back to the two regressions we run and let’s formalize the two models so far: ► Let’s look a the relation between volume ( volume_cuinches ) and energy cost ( energycost ), in particular let’s run the following regression of volume_cuinches on energycost : ► We can write not necessarily in the sense of causal relation but correlation-wise: volume_cui~s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- energycost | 899.3238 73.76336 12.19 0.000 750.1233 1048.524 _cons | 29539.62 3526.558 8.38 0.000 22406.49 36672.76 correlation: truncated: true: Figure 6. Regression results of volume_cuinches on energycost

14 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page12 energy cost and refrigerator pricing session six analysis revisited ► Just rearrange the equations as direct channel indirect channel correlation channel truncated: ► Notice that b 1  b 2 · a 1   2.428  (0.022 · 899.324)  17.357 Remark. Above we obtained only approximately that b 1  b 1  b 2  a 1 ; previously we had this as an equality. This is because in our derivation we assumed that x and z are perfectly linearly related when in fact these can be related but not perfectly, e.g. energycost and volume_cuinches. Nevertheless, what remains valid is the direction of the bias, as this depends on the sign of b 2 and a 1 not on their magnitude. true: correlation: *

15 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page13 application: ads and profitability session six ► A recent study by an industry trade journal found that auto dealerships that spend heavily on marketing earn lower profits than dealerships that do not spend as heavily. This finding controls for market size and type of car being sold, but does not account for the extent of competition faced by each dealership.  What do you believe is the direction of correlation between marketing expenditures and the extent of competition?  What do you believe is the direction of the extent of competition and profits?  What bias, if any, does failing to account for competition impart to the study by the trade journal? direct channel indirect channel correlation channel Answer: truncated: true: correlation:

16 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page14 application: ads and profitability session six Answer: The direction ( underestimation or overestimation ) of omitted variables bias is determined by the product of b 2 : the direction of the effect of the omitted variable on the dependent variable (holding any other x ’s in the model fixed) a 1 : the direction of the relation between the omitted and included x -variable (holding any other x ’s in the model fixed).  Holding marketing spending, size of market and type of car sold fixed, we expect that dealers facing more competition are likely to experience lower profits, i.e. b 2  0  We also might expect that (holding size of market and type of car sold fixed) dealers facing more competition are likely to spend more on marketing, i.e. a 1  0 ► A recent study by an industry trade journal found that auto dealerships that spend heavily on marketing earn lower profits than dealerships that do not spend as heavily. This finding controls for market size and type of car being sold, but does not account for the extent of competition faced by each dealership.  What do you believe is the direction of correlation between marketing expenditures and the extent of competition?  What do you believe is the direction of the extent of competition and profits?  What bias, if any, does failing to account for competition impart to the study by the trade journal?  The direction of the bias is given by the sign of b 2 · a 1, but b 2  0 and a 1  0, thus sign( ovb ) is negative and so b 1  b 1 *

17 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page15 application: fast food wages session six ► A major fast food chain conducted a study of worker productivity. Using regression, it found that productivity (measured along several dimensions that are relevant to the production and service of fast food) is higher for workers who earn higher wages, even after controlling for worker experience. ► However, wages appear to have no effect on productivity in a regression that also controls for the median income of the local community where the fast food chain is located. In this regression, the estimated coefficient on median income is positive. ► What must be the sign of the correlation between wages and median income (holding worker experience fixed) in the study? direct channel indirect channel correlation channel truncated: true: correlation: Answer:

18 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page16 application: fast food wages session six ► A major fast food chain conducted a study of worker productivity. Using regression, it found that productivity (measured along several dimensions that are relevant to the production and service of fast food) is higher for workers who earn higher wages, even after controlling for worker experience. ► However, wages appear to have no effect on productivity in a regression that also controls for the median income of the local community where the fast food chain is located. In this regression, the estimated coefficient on median income is positive. ► What must be the sign of the correlation between wages and median income (holding worker experience fixed) in the study? Answer: In this example we know the direction of the bias from the information we are given about the two regressions : i. the coefficient on wages is positive in the regression from which local income is excluded, e.g. and zero when it is included; thus ii. the estimated effect of local income on productivity, holding worker experience and wages fixed, is positive ; thus b 2 > 0  Since b 1  b 1  b 2  a 1 and b 1  0, b 1  0 and b 2  0, we infer that a 1 > 0.  Thus local income and wages must be positively correlated. *

19 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page17 committed variables: conclusions session six ► Living with ovb : It is impossible to get data on all the factors that might affect the dependent variable. This exposes all regressions to potential omitted variable bias, which is why we always should think about possible biases in our regressions. Fortunately, the omitted variable bias can be managed: ■ Omitting variables results in biased coefficients only if they are i. related to the dependent variable, and ii. related to included independent variables ■ If the magnitude of i. and ii. above are small, the bias will be small. ■ Even if omitted variable bias exists, it may be possible to determine the direction of the bias. This will allow us to state that the reported coefficients are either upper or lower bounds on the actual effects. ■ Thinking about omitted variable bias forces us to carefully identify the correct model before we run any regressions and do a better job of variable selection in the first place ■ As we will discuss in a later class, fixed effects models can sometimes further mitigate or eliminate OVB.

20 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page18 spurious regressions session six ► We have a spurious regression when we find a statistically significant relationship between two truly unrelated variables, i.e. we reject H 0 :  k  0 although there is no causal relation between x k and y. In reality, variable x k does not belong to the regression. ► A typical spurious regression is obtained when both the dependent and independent variables in a regression are in fact determined by a third, genuine independent variable :  true relations y  a 0  a 1 · z and x  b 0  b 1 · z  spurious relation y   0   1 · x ► It’s not difficult to find the connection between the coefficients in the spurious relation between the y and x, and the coefficients in the true relations. Retrieve z as a function of x and plug it into the first true relation. Notice again the “propagation” of the sign (positive/negative) from the real relations to the spurious relation:

21 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page19 spurious regressions: conclusions session six ► Key points: We may incorrectly include x k in the model because it appears statistically significant. ■ As a result, we draw false conclusions about variable x k ■ To make matters worse, by including the “junk” variable x k in the model, we may bias the coefficients on all other included variables ► Subjectivity and hindsight: Spurious correlation plays to our psychology: ■ We are hard wired to try to explain empirical phenomena and “smart” people can rationalize almost any findings ■ Thus, after the fact, we can claim that almost any correlation is consistent with theory ■ Such correlations may have been accidents – artifacts of sampling, noise, and the definition of statistical significance (which tells us the probability of observing our result due to random chance), as a result, we reach silly conclusions that cannot be replicated ► Data mining: In explaining y you try several (an extremely high number) of potential independent variables and keep the best ones, i.e. the ones that create the best fit, and then try to rationalize in hindsight.

22 Managerial Economics & Decision Sciences Department session six omitted variable bias and spurious regressions business analytics II Developed for true and truncated relations ◄ the omitted variable bias effect ◄ spurious regressions ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page20 spurious regressions: conclusions session six ► Potential solutions: Include variables on the RHS only when there is a plausible reason to do so and do your theorizing ex ante, not ex post; we are all expert “rationalizers”! ■ Use a smaller significance level when you try many variables or estimate many regressions ■ Question the process used to generate results ► Data mining to your advantage: While it is much better to approach your data with hypotheses based on a good understanding of what you are studying you may sometimes approach a problem without such an understanding ■ Data mining can identify patterns that may lead to hypotheses  Do not confuse the data mining step as a test of your hypotheses – gather more data and conduct new tests - this is known as “ out of sample ” testing  Spurious correlation tells us that this process will be costly ■ Most hypotheses coming out of data mining will not be confirmed


Download ppt "Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics."

Similar presentations


Ads by Google