COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.

Slides:



Advertisements
Similar presentations
CHOW TEST AND DUMMY VARIABLE GROUP TEST
Advertisements

EC220 - Introduction to econometrics (chapter 5)
EC220 - Introduction to econometrics (chapter 4)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
EC220 - Introduction to econometrics (chapter 7)
1 PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE red This sequence provides an example of a discrete random variable. Suppose that you.
EC220 - Introduction to econometrics (chapter 2)
00  sd  0 –sd  0 –1.96sd  0 +sd 2.5% CONFIDENCE INTERVALS probability density function of X null hypothesis H 0 :  =  0 In the sequence.
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient (2010/2011.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: nonlinear regression Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: Chow test Original citation: Dougherty, C. (2012) EC220 - Introduction.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
FIXED EFFECTS REGRESSIONS: WITHIN-GROUPS METHOD The two main approaches to the fitting of models using panel data are known, for reasons that will be explained.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE DUMMY VARIABLE TRAP 1 Suppose that you have a regression model with Y depending on a set of ordinary variables X 2,..., X k and a qualitative variable.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: footnote: the Cochrane-Orcutt iterative process Original citation: Dougherty,
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
FOOTNOTE: THE COCHRANE–ORCUTT ITERATIVE PROCESS 1 We saw in the previous sequence that AR(1) autocorrelation could be eliminated by a simple manipulation.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression model. N regular school occupational school 1'1'

COST 11 2 Suppose that you have data on the annual recurrent expenditure, COST, and the number of students enrolled, N, for a sample of secondary schools, of which there are two types: regular and occupational. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES N regular school occupational school 1'1'

COST 11 3 The occupational schools aim to provide skills for specific occupations and they tend to be relatively expensive to run because they need to maintain specialized workshops. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES N regular school occupational school 1'1'

4 One way of dealing with the difference in the costs would be to run separate regressions for the two types of school. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST 11 N regular school occupational school 1'1'

5 However this would have the drawback that you would be running regressions with two small samples instead of one large one, with an adverse effect on the precision of the estimates of the coefficients. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST 11 N regular school occupational school 1'1'

6 Another way of handling the difference would be to hypothesize that the cost function for occupational schools has an intercept  1 ' that is greater than that for regular schools. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES N COST 11 OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 ' +  2 N + u 1'1' regular school occupational school

7 Effectively, we are hypothesizing that the annual overhead cost is different for the two types of school, but the marginal cost is the same. The marginal cost assumption is not very plausible and we will relax it in due course. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 ' +  2 N + u N COST 11 regular school occupational school 1'1'

8 Let us define  to be the difference in the intercepts:  =  1 ' –  1. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 ' +  2 N + u Define  =  1 ' –  1 N COST 11 regular school occupational school  1'1'

9 Then  1 ' =  1 +  and we can rewrite the cost function for occupational schools as shown. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 +  +  2 N + u Define  =  1 ' –  1 1+1+ N COST 11 occupational school regular school 

OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 +  +  2 N + u Combined equationCOST =  1 +  OCC +  2 N + u 10 We can now combine the two cost functions by defining a dummy variable OCC that has value 0 for regular schools and 1 for occupational schools. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES N COST 11 1+1+ occupational school regular school 

Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes that for occupational schools. 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 +  +  2 N + u Combined equationCOST =  1 +  OCC +  2 N + u N COST 11 1+1+ occupational school regular school 

We will now fit a function of this type using actual data for a sample of 74 secondary schools in Shanghai. 12 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST N occupational school regular school

School Type COST N OCC 1Occupational345, Occupational 537, Regular 170, Occupational Regular100, Regular 28, Regular 160, Occupational 45, Occupational 120, Occupational61, The table shows the data for the first 10 schools in the sample. The annual cost is measured in yuan, one yuan being worth about 20 cents U.S. at the time. N is the number of students in the school. 13 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

14 OCC is the dummy variable for the type of school. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES School Type COST N OCC 1Occupational345, Occupational 537, Regular 170, Occupational Regular100, Regular 28, Regular 160, Occupational 45, Occupational 120, Occupational61,000991

. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons | We now run the regression of COST on N and OCC, treating OCC just like any other explanatory variable, despite its artificial nature. The Stata output is shown. 15 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

We will begin by interpreting the regression coefficients. 16 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

17 The regression results have been rewritten in equation form. From it we can derive cost functions for the two types of school by setting OCC equal to 0 or 1. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST = –34, ,000OCC + 331N ^

Regular school (OCC = 0) COST = –34, ,000OCC + 331N COST = –34, N ^ ^ 18 If OCC is equal to 0, we get the equation for regular schools, as shown. It implies that the marginal cost per student per year is 331 yuan and that the annual overhead cost is -34,000 yuan. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

19 Obviously having a negative intercept does not make any sense at all and it suggests that the model is misspecified in some way. We will come back to this later. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Regular school (OCC = 0) COST = –34, ,000OCC + 331N COST = –34, N ^ ^

20 The coefficient of the dummy variable is an estimate of , the extra annual overhead cost of an occupational school. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Regular school (OCC = 0) COST = –34, ,000OCC + 331N COST = –34, N ^ ^

Regular school (OCC = 0) Putting OCC equal to 1, we estimate the annual overhead cost of an occupational school to be 99,000 yuan. The marginal cost is the same as for regular schools. It must be, given the model specification. 21 COST = –34, ,000OCC + 331N COST = –34, N COST = –34, , N = 99, N ^ ^ ^ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Occupational school (OCC = 1)

The scatter diagram shows the data and the two cost functions derived from the regression results. 22 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST N occupational school regular school

In addition to the estimates of the coefficients, the regression results will include standard errors and the usual diagnostic statistics. 23 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

We will perform a t test on the coefficient of the dummy variable. Our null hypothesis is H 0 :  = 0 and our alternative hypothesis is H 1 :  DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

In words, our null hypothesis is that there is no difference in the overhead costs of the two types of school. The t statistic is 6.40, so it is rejected at the 0.1% significance level. 25 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons | We can perform t tests on the other coefficients in the usual way. The t statistic for the coefficient of N is 8.34, so we conclude that the marginal cost is (very) significantly different from DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons | In the case of the intercept, the t statistic is –1.43, so we do not reject the null hypothesis H 0 :  1 = DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Thus one explanation of the nonsensical negative overhead cost of regular schools might be that they do not actually have any overheads and our estimate is a random number. 28 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons | A more realistic version of this hypothesis is that  1 is positive but small (as you can see, the 95 percent confidence interval includes positive values) and the error term is responsible for the negative estimate. 29 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

As already noted, a further possibility is that the model is misspecified in some way. We will continue to develop the model in the next sequence. 30 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES. reg COST N OCC Source | SS df MS Number of obs = F( 2, 71) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | OCC | _cons |

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 5.1 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics