Presentation is loading. Please wait.

Presentation is loading. Please wait.

assignment 7 solutions ► office networks ► super staffing

Similar presentations


Presentation on theme: "assignment 7 solutions ► office networks ► super staffing"— Presentation transcript:

1 assignment 7 solutions ► office networks ► super staffing
Managerial Economics & Decision Sciences Department Developed for assignment 7 solutions ► office networks ► super staffing © Kellogg School of Management

2 non-linear models ► STATA ► non-linearity (log models)
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models learning objectives ► STATA  testing for curvature: rvfplot  testing and correcting for heteroskedasticity: hettest, robust  correcting for clustering: cluster() ► non-linearity (log models)  test for curvature and effect on linear regression  use of logarithmic (log) models: interpretation and prediction with log models ► heteroskedasticity  define heteroskedasticity and effect on linear regression  correction for heteroskedasticity: log models and the “white wash” approach ► independence and clustering  define independence of errors and effect of clustering  correction for clustering readings ► (MSN) Chapter 8 ► (KTN) Log vs. Linear Models, Noise, Heteroskedasticity, and Grouped Data © Kellogg School of Management

3 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks model specification and data curvature. A simply scatter diagram (on the left below) indicates the presence of “curvature”. For the sake of presentation, and to understand the rvfplot command, let’s run the linear regression s = b0 + b1·Computers . The results are shown in the table below while the fitted line is shown on the right below. . regress s Computers s | Coef. Std. Err t P>|t| [95% Conf. Interval] Computers | _cons | curvature fitted linear regression despite the obvious curvature, the results of the linear regression do not reflect the miss-fitting with respect to curvature © Kellogg School of Management page | 1

4 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks model specification and data curvature. Fairly easy to see that the fitting a straight line definitely “misses” the curvature. This is why, if possible, a first visual “check” is required, and even better, try rvfplot. We consider three points (A), (B) and (C) shown on the left for which the predicted values (according to the linear regression) are shown. For each of these points we measure the distance between the true y and the predicted , i.e and we get the numbers below. (A): (B): (C): ► rvfplot will plot vs. for each observation, thus the vertical axis from the left becomes the horizontal axis on the right and the vertical axis on the right simply measures the distance from the true y to the predicted value. The more curvature in the rvfplot the more the considered regression misses the curvature in the original data. C C rvfplot plots A B A B © Kellogg School of Management page | 2

5 ln(Emails) = b0 + b1·Computers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks model specification and data curvature. Since the curvature is “U” shaped we try a log-linear specification. (for a inverted “U” shape, i.e. “∩”, we would have to try a linear-log specification). The regression is ln( s) = b0 + b1·Computers Remark. Every time we change the specification we have to make sure to transform first the variables. In this case it is only the s variable that is transformed, thus we would have to generate first the logarithm of s and then run the regression of this on Computers. . generate ln s=ln( s) . regress ln s Computers Source | SS df MS Number of obs = F( 1, 22) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = ln s | Coef. Std. Err t P>|t| [95% Conf. Interval] Computers | _cons | © Kellogg School of Management page | 3

6 ln(Emails) = 2.712677 + 0.1188471·Computers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks model specification and data curvature. The estimated regression is ln( s) = ·Computers Remark. To generate the predicted values for the new regression simply generate the ln s_hat as: generate ln s_hat = _b[cons] + _b[Computers]*Computers and then revert to s_hat as generate s_hat = exp(ln s_hat) the log specification fits better the actual data rvfplot: no curvature in the plot fitted values according to the log-specification © Kellogg School of Management page | 4

7 ln(Emails) = 2.712677 + 0.1188471·Computers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks model specification and data curvature. The estimated regression is ln( s) = ·Computers Remark. In the previous graph we plotted the true number of s and the predicted number of s, in other words, we first predicted the ln( s) then we transformed this into s. This is repeated in the diagram on the left. However, we can “live” in a “logarithmic world”: why not compare the logarithm of true s, i.e. ln s, with the predicted logarithm of s, i.e. ln s_hat. This is shown in the right diagram. Of course the “fit” should be similarly “good” in both cases. units of measurements: s units of measurements: ln( s) fitted values according to the log-specification fitted values according to the log-specification © Kellogg School of Management page | 5

8 ln(Emails) = 2.712677 + 0.1188471·Computers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks i. estimation. The estimated regression is ln( s) = ·Computers ► For Computers = 20 we get first the estimation of logarithm of s as  ln( s)|Computers = 20 = ·20 = then we “exponentiate” back to get the estimated number of s as  s|Computers = 20 = exp(ln( s)|Computers = 20) = exp( ) = units of measurements: s units of measurements: ln( s) fitted values according to the log-specification fitted values according to the log-specification 5.08 162.32 © Kellogg School of Management page | 6

9 ln(Emails) = 2.712677 + 0.1188471·Computers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks ii. prediction interval. The estimated regression is ln( s) = ·Computers ► Since we are asked about an interval for the estimate of a single office with Computers = 20 we will use kpredint command: Remark. The above interval [ , ] is for the estimated ln( s) = ► To obtain the interval for s we need to exponentiate the lower and upper bounds of the interval (no need for correction here since we are dealing with one observation): lower bound( s) = exp( ) = ; upper bound( s) = exp( ) = ► The prediction interval is thus [ , ] and the estimate is . kpredint _b[_cons]+_b[Computers]*20 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 © Kellogg School of Management page | 7

10 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks ii. prediction interval. Units of measurement translation: ► This is the “transformed” interval through the exponential function in order to translate the initial interval obtained in “logarithmic units” into the original units. Since the exponential function is no-linear, the transformation of the initial interval is not proportional, i.e. the equal distances in the initial “logarithmic units” do not translates into equal distances in the original units the exp(·) function 95.577 60.158 Remark. The interval for the logarithm, i.e. [4.626,5.552], is centered around the estimated logarithm, i.e But notice that the interval for s, i.e. [ , ], is actually not centered in the estimate for s, i.e 4.626 5.089 5.552 0.463 0.463 ► This is the interval provided by kpredint and it will always be “centered” around the estimate © Kellogg School of Management page | 8

11 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks iii. benchmarking the estimate. We are asked to “estimate the probability that the daily internal s at a particular office with 20 computers will be under 200“. There are two important issues here:  Understand what is the probability about, i.e. what exactly are you asked to calculate?  Make sure the units of measurement are used consistently As for the first issue: the regression only gives us an estimate of the daily internal s at an office with 20 computers (and this estimate is about 162), but we really don’t know the true value of this number (of internal s at an office with 20 computers). Thus the true number of s, call it trueY, remains a random variable for us and the probability refers exactly to this trueY, i.e. we have to calculate for a given benchmark yb (in our case yb = 200) the following: This looks like a daunting task…unless we remember that we know the following fact: where is the sample estimate of trueY, is the standard error of the estimate and finally T(n – k – 1) is a T-distributed random variable with n – k – 1 degrees of freedom (k is the number of variables used to obtain the estimate . For our situation the estimate is obtained as a logarithm in a regression with one variable thus k = 1. © Kellogg School of Management page | 9

12 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks iii. benchmarking the estimate. Changing to logarithms (there’s no need of correction here): ► But how do we really use this result? The above implies that for any number t (this is our choice): ► With a bit of work (algebra): ► We are asked to evaluate so choose x such that Call this particular t that solves the equation as x*, then ► Can we find x? It must satisfy thus © Kellogg School of Management page | 10

13 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks iii. benchmarking the estimate. Our conclusion is that, with the desired probability is: ► The final step is to calculate tb. Going back to the output from kpredint: we identify and ln(yb) = ln(200) = thus ► We find Remark. If you run kpredint _b[_cons] + _b[Computers]*20 – you should get for Ha: > the result 0.82. . kpredint _b[_cons]+_b[Computers]*20 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 © Kellogg School of Management page | 11

14 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks iv.-v. estimate of average and confidence interval. The logic for this art is very similar with the one we used in answering a previous question. There are a few differences in terms of the correction factor and standard error of the estimate given that now we are estimating the average number of s for all office that have 20 computers. ► First, since we are estimating the average number of s we would run the klincom: ► Thus, the (corrected) estimate is: est.avg. s = exp( )*exp(e(rmse)^2/2) = and confidence interval: lower bound = exp( )*exp(e(rmse)^2/2) = upper bound = exp( )*exp( ^2/2) = . klincom _b[_cons]+_b[Computers]*20 ln s | Coef. Std. Err t P>|t| [95% Conf. Interval] (1) | If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 © Kellogg School of Management page | 12

15 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks vi. benchmarking the estimate. We saw that: thus ► But how do we really use this result? The above implies that for any number t (this is our choice): ► With a bit of work (algebra): ► We need to evaluate: © Kellogg School of Management page | 13

16 non-linear models office networks
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ office networks vi. benchmarking the estimate. Use the last two equalities: ► We need to get x such that ► Finally: Remark. Notice the difference: the company is 82% sure that the number of s will be less than 200 for one office with 20 computers but it is 99% sure that the average number of s will be less than 200 across all office with 20 computers. © Kellogg School of Management page | 14

17 non-linear models super staffing
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ super staffing Part I. i. linear regression estimate. We estimate the regression (results below): supers = b0 + b1·workers . regress supers workers supers | Coef. Std. Err t P>|t| [95% Conf. Interval] workers | _cons | ► The coefficient on workers means that increasing employment by one worker requires 0.1 more supers; it could be restated as: for every extra 10 workers there’s need for an extra super. Part I. ii. prediction. Running the kpredint for workers = 1200 gives the output: . kpredint _b[_cons]+_b[workers]*1200 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 ► The new factory requires about 140 supers, with a 95% prediction interval : lower bound = 94 upper bound = 188 © Kellogg School of Management page | 15

18 ln(supers) = b0 + b1·workers
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ super staffing Part II. i. log-linear regression estimate. We estimate the regression (results below): ln(supers) = b0 + b1·workers . regress lnsupers workers lnsupers | Coef. Std. Err t P>|t| [95% Conf. Interval] workers | _cons | ► The coefficient on workers: to increase employment by one worker requires 0.1 percent more supers. Part II. ii. prediction. Running the kpredint for workers = 1200 gives the output: . kpredint _b[_cons]+_b[workers]*1200 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 ► The new factory requires about exp(4.96) = 142 supers, with a 95% prediction interval : lower bound = exp(4.41) = 82 upper bound = exp(5.50) = 245 © Kellogg School of Management page | 16

19 ln(supers) = b0 + b1·ln(workers)
Managerial Economics & Decision Sciences Department assignment 7 - solutions non-linear models office networks ◄ super staffing ◄ super staffing Part III. i. log-log regression estimate. We estimate the regression (results below): ln(supers) = b0 + b1·ln(workers) . regress lnsupers lnworkers lnsupers | Coef. Std. Err t P>|t| [95% Conf. Interval] lnworkers | _cons | ► The coefficient on workers: to increasing employment by one percent requires 0.91 percent more supers; it could be restated as: for every extra 10 percent increase in workers number the number of supers should increase by 9 percent. Part III. ii. prediction. Running the kpredint for ln(workers) = ln(1200) = 7.09 gives the output: . kpredint _b[_cons]+_b[lnworkers]*7.09 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 ► The new factory requires about exp(4.96) = 142 supers, with a 95% prediction interval : lower bound = exp(4.57) = 96 upper bound = exp(5.35) = 211 © Kellogg School of Management page | 17


Download ppt "assignment 7 solutions ► office networks ► super staffing"

Similar presentations


Ads by Google