Download presentation
Presentation is loading. Please wait.
Published byOwen Snow Modified over 9 years ago
1
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 5). [Teaching Resource] © 2012 The Author This version available at: http://learningresources.lse.ac.uk/131/http://learningresources.lse.ac.uk/131/ Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/ http://creativecommons.org/licenses/by-sa/3.0/ http://learningresources.lse.ac.uk/
2
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression model.
3
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 2 Suppose that you have data on the annual recurrent expenditure, COST, and the number of students enrolled, N, for a sample of secondary schools, of which there are two types: regular and occupational.
4
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 3 The occupational schools aim to provide skills for specific occupations and they tend to be relatively expensive to run because they need to maintain specialized workshops.
5
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 4 One way of dealing with the difference in the costs would be to run separate regressions for the two types of school.
6
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 5 However this would have the drawback that you would be running regressions with two small samples instead of one large one, with an adverse effect on the precision of the estimates of the coefficients.
7
OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 ' + 2 N + u DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 6 Another way of handling the difference would be to hypothesize that the cost function for occupational schools has an intercept 1 ' that is greater than that for regular schools. 11 1'1'
8
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 7 Effectively, we are hypothesizing that the annual overhead cost is different for the two types of school, but the marginal cost is the same. The marginal cost assumption is not very plausible and we will relax it in due course. OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 ' + 2 N + u 11 1'1'
9
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 8 Let us define to be the difference in the intercepts: = 1 ' – 1. OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 ' + 2 N + u 11 1'1'
10
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 9 Then 1 ' = 1 + and we can rewrite the cost function for occupational schools as shown. 1+1+ OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 + + 2 N + u 11
11
Combined equationCOST = 1 + OCC + 2 N + u OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 + + 2 N + u DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 10 We can now combine the two cost functions by defining a dummy variable OCC that has value 0 for regular schools and 1 for occupational schools. 11 1+1+
12
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes that for occupational schools. 11 11 1+1+ Combined equationCOST = 1 + OCC + 2 N + u OCC = 0 Regular schoolCOST = 1 + 2 N + u OCC = 1 Occupational schoolCOST = 1 + + 2 N + u
13
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES We will now fit a function of this type using actual data for a sample of 74 secondary schools in Shanghai. 12
14
School TypeCOST N OCC 1Occupational345,0006231 2Occupational 537,0006531 3Regular 170,0004000 4Occupational 526.0006631 5Regular100,0005630 6Regular 28,0002360 7Regular 160,0003070 8Occupational 45,0001731 9Occupational 120,0001461 10 Occupational61,000991 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES The table shows the data for the first 10 schools in the sample. The annual cost is measured in yuan, one yuan being worth about 20 cents U.S. at the time. N is the number of students in the school. 13
15
School TypeCOST N OCC 1Occupational345,0006231 2Occupational 537,0006531 3Regular 170,0004000 4Occupational 526.0006631 5Regular100,0005630 6Regular 28,0002360 7Regular 160,0003070 8Occupational 45,0001731 9Occupational 120,0001461 10 Occupational61,000991 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 14 OCC is the dummy variable for the type of school.
16
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES We now run the regression of COST on N and OCC, treating OCC just like any other explanatory variable, despite its artificial nature. The Stata output is shown. 15
17
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES We will begin by interpreting the regression coefficients. 16
18
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 17 COST = –34,000 + 133,000OCC + 331N ^ The regression results have been rewritten in equation form. From it we can derive cost functions for the two types of school by setting OCC equal to 0 or 1.
19
Regular School (OCC = 0) DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 18 COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N ^ ^ If OCC is equal to 0, we get the equation for regular schools, as shown. It implies that the marginal cost per student per year is 331 yuan and that the annual overhead cost is -34,000 yuan.
20
Regular School (OCC = 0) DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 19 COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N ^ ^ Obviously having a negative intercept does not make any sense at all and it suggests that the model is misspecified in some way. We will come back to this later.
21
Regular School (OCC = 0) DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 20 COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N ^ ^ The coefficient of the dummy variable is an estimate of , the extra annual overhead cost of an occupational school.
22
Regular School (OCC = 0) Occupational School (OCC = 1) DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Putting OCC equal to 1, we estimate the annual overhead cost of an occupational school to be 99,000 yuan. The marginal cost is the same as for regular schools. It must be, given the model specification. 21 COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N COST = –34,000 + 133,000 + 331N = 99,000 + 331N ^ ^ ^
23
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES The scatter diagram shows the data and the two cost functions derived from the regression results. 22
24
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES In addition to the estimates of the coefficients, the regression results will include standard errors and the usual diagnostic statistics. 23
25
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES We will perform a t test on the coefficient of the dummy variable. Our null hypothesis is H 0 : = 0 and our alternative hypothesis is H 1 : 0. 24
26
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES In words, our null hypothesis is that there is no difference in the overhead costs of the two types of school. The t statistic is 6.40, so it is rejected at the 0.1% significance level. 25
27
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES We can perform t tests on the other coefficients in the usual way. The t statistic for the coefficient of N is 8.34, so we conclude that the marginal cost is (very) significantly different from 0. 26
28
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES In the case of the intercept, the t statistic is –1.43, so we do not reject the null hypothesis H 0 : 1 = 0. 27
29
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Thus one explanation of the nonsensical negative overhead cost of regular schools might be that they do not actually have any overheads and our estimate is a random number. 28
30
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES A more realistic version of this hypothesis is that 1 is positive but small (as you can see, the 95 percent confidence interval includes positive values) and the error term is responsible for the negative estimate. 29
31
. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES As already noted, a further possibility is that the model is misspecified in some way. We will continue to develop the model in the next sequence. 30
32
Copyright Christopher Dougherty 2011. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 5.1 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course 20 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 11.07.25
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.