Chapter 8: DUMMY VARIABLE (D.V.) REGRESSION MODELS Econometrics Econ. 405 Chapter 8: DUMMY VARIABLE (D.V.) REGRESSION MODELS
I. The Natural of Dummy Variables In regression analysis the dependent variable is frequently influenced by variables that are essentially qualitative, in nature, such as sex, race, color, religion, nationality, geographical region, etc. One way we could “quantify” such attributes is by constructing artificial variables such that: 1 indicating the presence of that attribute. 0 indicating the absence of that attribute.
Variables that assume such 0 and 1 values are called dummy variables. Example: (A) “1” may indicate that a person is a female and 0 may designate a male; (B) “1” may indicate that a person is a college graduate, and 0 that the person is not, and so on.
II. Estimating Models with Dummy Variables the wage gain/loss if the person is a woman rather than a man (holding other things fixed) Dummy variable (D): =1 if the person is a femal =0 if the person is a male Note: The coefficients attached to the dummy variables are known as the differential intercept coefficients
Also Note that Now we have two cases: Di=0 Yi=β1+β2X2i+ β3(0) +ui Yi=β1+β2X2i+ui Di=1 Yi=β1+β2X2i+β3(1)+ui Yi=(β1+β3) +β2X2i+ui
Numerical Illustration: Wage (in KD) Education (Year) D 5000 7 1 2000 5 3600 6 5500 8 1000 3 1500 4 ... So on
Graphical Illustration:
Holding education, and other variables (if any), women earn 1.81$ less per hour than men
II. Caution in the Use of Dummy Variables When dealing with dummy variables in the regression function, you should be aware to some important aspects. Therefore, there are three forms of model that are used to explain the multiple regression analysis with qualitative information.
Dummy variable trap 1- When separating the dummy variable: This model cannot be estimated (perfect collinearity) 2-Alternatively, one could omit the intercept: Disadvantages: 1) More difficult to test for diffe-rences between the parameters 2) R-squared formula only valid if regression contains intercept 3- When using dummy variables, one category always has to be omitted: The base category are men The base category are women
III. Interaction Variables We can use dummy variables as standalone independent variables, but also we can interact (multiply) them with quantitative variables. Interacting dummy variables with quantitative variables provides flexibility to detect differences between groups overall and differences that may vary depending on the value of quantitative variables.
Yi=β0+………+βi DiXi + ………+ ui The product of the dummy variable (D) with the independent variable (X) results in a new term called interaction term: Yi=β0+………+βi DiXi + ………+ ui The inclusion of an interaction term in your econometrics model allows the regression function to have a different intercept and slope for each group identified by the dummy variables (used in the interaction term). The coefficient for your dummy variable in the regression shifts the intercept, while the coefficient of your interaction term changes the slope.
Di=0 Yi=β1+β2X2i+β3(0)iX2i+ui Yi=β1+β2X2i+ui Consider the same case but now with the dummy affecting the slope Yi=β1+β2X2i+β3DiX2i+ui Now we have two cases Di=0 Yi=β1+β2X2i+β3(0)iX2i+ui Yi=β1+β2X2i+ui Di=1 Yi=β1+β2X2i+β3(1)iX2i+ui Yi=β1+(β2+β3)X2i+ui
IV. Testing of Significance When using dummy variables in the regression, you have to take into account the collective significance of those variables. Their effect can be collectively significant even if they are individually insignificant.
Example: Assume that the determination of the college grade point average (GPA) is reflected by the following regression function: Unrestricted model (contains full set of interactions) Restricted model (same regression for both groups) College grade point average Standardized aptitude test score High school rank percentile Total hours spent in college courses
Estimation of the unrestricted model: Null hypothesis: Estimation of the unrestricted model: All interaction effects are zero, i.e. the same regression coefficients apply to men and women Tested individually, the hypothesis that the interaction effects are zero cannot be rejected
Joint test with F-statistic Null hypothesis is rejected
The Chow Test for Structural Stability Alternative way to compute F-statistic ( in the same previous example): Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions Run regression for the restricted model and store SSR the test is computed in this way it is called the Chow-Test Important: Test assumes a constant error variance accross groups
The Chow Test for Structural Stability
Step 3: Calculate the F-statistic Step 4: If F-statistical bigger than F-critical F(k,n-2k-2) then reject the null that the parameters are stable for the whole data set.