1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Dummy variables 4Often, our data contain qualitative variables, such as gender. These are not quantitative variable. They are qualitative variables. 2

4However, such qualitative variables are also important in analyzing data. For example, you may want to answer the following question: “Is there any gender wage gap?” 3

4To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”. 4If you would like to incorporate gender information in your model, create the following dummy variable: Female =1 if the person is female =0 if the person is male 4

Incorporating dummy variable as an independent variable 4Suppose you are interested in gender wage gap, then you include the dummy variable for female as Log(wage)= β 0 +δ 0 (female)+β 1 (experience)+u where wage is hourly wage rate, and experience is in years.  Then δ 0 shows the wage difference between male and female who have the same experience. To understand this, see the next slide. 5

For male, the predicted log wage at a given experience is 6 For female, the predicted log wage at a given experience is Therefore, the gender difference in wage at a given experience is given by If female earns less than male, will be negative

4Using a graph, the gender wage gap is described as an intercept shift because: 7 Intercept for male Intercept for female 4Assuming that female earns lower salary, (that is is negative), the predicted wage experience profiles would look like the ones in the next slide

8 Log(wage) Experience Male Female Note that is usually negative, so experience salary profile for female lies below the male’s. The estimated wage-experience profiles by gender

The base group 4When you include (female), you do not include (male). 4The predicted wage for males is given by setting female=0. 4Thus the wage gap is estimated relative to males. 4This means that, in our example, we set males as the base group. We often call this group as, the benchmark group, excluded group, or the excluded category. 9

Example 4Use Wage1.dta, estimate the following model. Is there any gender wage gap? How big is the wage gap? Log(wage)= β 0 +δ 0 (female)+β 1 (experience)+u 10

11 Female earns 39% lower wage than male after controlling for experience.

Policy analysis using a dummy variable 4State of Michigan provided a job training program for manufacturing companies. Did this grant helped firms providing more training to their employees? 4To answer to this question, you may estimate the following model. 12 (Hours of training per employee)=β 0 +δ 0 (grant)+β 1 log(sales)+u Where (grant) is a dummy variable taking the value 1 if the firm received the grant, and 0 otherwise.

13 Grant appears to have a significant effect on employee training.

Using dummy variables for multiple categories 4When you compare gender gap, there are only two groups: males or females. 4However, in some situation, there are more than 2 categories. For example, you may want to examine the gender differences among the following four groups Married men Married women Single men Single women 14

4Then, solution is to create dummy variables for all the categories except one category. For example, you estimate Log(wage)= β 0 +δ 0 (Married men) +δ 1 (Married women) +δ 2 (Single women) +β 1 (Education) +β 2 (experience) +β 3 (experience) 2 +u The excluded group is the single male. So the differences in wage among the four groups are estimated relative to single males 15

4Exercise, using WAGE1.dta, estimate the model in the previous page. 16

17 Married men earns 24.6% more than single male. Married women earns 21.8% less than single male. Single women earns 12.1% less than single male.

18 use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\WAGE1.DTA", clear ******************* * Create dummy for* * married men * ******************* gen marriedmen=0 replace marriedmen=1 if female==0 & married==1 ******************* * Create dummy for* * married women * ******************* gen marriedwomen=0 replace marriedwomen=1 if female==1 & married==1 ******************* * Create dummy for* * single women * ******************* gen singlewomen=0 replace singlewomen=1 if female==1 & married==0 ********************* * Estimate the model* ********************* reg lwage marriedmen marriedwomen singlewomen educ exper expersq Here is the do file I used to obtain the results.

Incorporating ordinary information by using dummy variables 4Some information is ordinary, like the credit rating or the law school rankings. 4For concreteness, consider to estimate the effect of municipal credit rating on the municipal bond interest 4You have credit rating variable that takes values from 1 to 5. The rating 1 is the worst rating, and 5 is the best rating. 19

4How do we incorporate this information? One possibility is to estimate (Municipal bond interest rate) = β 0 +β 1 (Credit rating)+(other factors) Then β 1 shows the change in municipal bond interest when credit rating increases by 1. 20

4But this assume that the effect of improving credit rating from 1 to 2 is the same as the effect of improving the rating from 2 to 3, and so on. 4But there is no reason why the improvement from 1 to 2 should be the same as 2 to 3. 4In this situation, it is better to create dummy variables for each rating, excluding one category, then include them in the model. 21

4That is, create the following 4 dummies CR1 =1 if credit rating=1 =0 if otherwise CR2=1 if credit rating=2 =0 if otherwise CR3 =1 if credit rating=3 =0 if otherwise CR4=1 if credit rating =4 =0 if otherwise The excluded category is credit rating=5 22

(Municipal bond interest rate) = β 0 +β 1 CR1+β 2 CR2+β 3 CR3+β 4 CR4 +(other factors) Then, β 1 shows the effect of getting credit rating 1 on the bond interest rate relative to credit rating 5. Other coefficients are interpreted in the same way. 23

Exercise 4Use beauty.dta, examine if one’s physical attractiveness would affect wage. Use the variable for `below average looks’ and `above average looks’. Include other variables where it makes sense to do so. Try also to estimate separately for male and female. 24

Interactions involving dummy variables Example 1 4Suppose that you are interested in gender wage gap, but you suspect that gender wage gap may change with experience. 4Then you would estimate the following. Log(wage)= β 0 +δ 0 (female) +δ 1 (female)(experience) +β 1 (experience)+u 25

Then male wage at given experience is written as 26 Female wage at given experience is written as Thus, the gender gap at a given experience is:

4 Thus is the gender wage gap at hiring (i.e, experience=0). Usually it is negative. So, if the coefficient for the interaction term,, is positive, then the gender gap is decreasing with experience. If is negative, the gender gap is increasing with experience. 4The case where gender gap is increasing with experience is described in the following slide. 27

4Case where gender gap is increasing with experience: (i.e., is negative) 28 Male Female Gender gap at a given experience = Experience Log(wage)

Exercise 4Use Wage1.dta estimate the following model. Log(wage)= β 0 +δ 0 (female) +δ 1 (female)(experience) +β 1 (experience)+u Q1. Is the gender gap increasing or decreasing with experience? Q2. What is the gender gap at hiring (exp=0) Q3. What is the gender gap at experience equal to 10? Is the gender gap significant at this experience? 29

4Answer 1.Gender gap is increasing with experience since the coefficient on the interaction term is negative 2.Gender gap at hiring =-0.29 3.Gender gap at experience equal to 10 = -0.293+(-.00586)*10=-0.35 This gap is significant at 5% level. 31

The interaction between two dummy variables 4Suppose that you are interested in if gender wage gap is concentrated in particular group of people. For example, you want to know if gender wage gap is concentrated in married people. 32

4Then you can estimate the following model. Log(wage)= β 0 +δ 0 (female) +δ 1 (female)(married) +β 1 (experience) +β 2 (married) +u 4Then we have the following Gender gap for married people = δ 0 +δ 1 Gender gap for single people = δ 0 33

Exercise 4Using Wage1.dta, estimate the following model. Log(wage)= β 0 +δ 0 (female) +δ 1 (female)(married) +β 1 (experience) +β 2 (married) +u 4What is the gender wage gap within married people? Is it statistically significant? 4What is the gender wage gap within single people? Is it statistically significant? 34

1. Gender wage gap within married people = (-0.133)+ (-0.372)=-0.505. It is significant at 5% level. 36 2. Gender wage gap within single people = -0.133. It is significant at 5% level. (This is based on the usual t-test. )

Testing for differences in regression functions across groups (The Chow test) 4Consider initially that you are interested in examining the determinants of GPA of college students. So you have the following equation in mind. (Cumulative GPA) = β 0 +β 1 (SAT)+β 2 (Hispanic)+β 3 (total hours)+u Where SAT is the SAT score, Hispanic is the dummy for Hispanics and (total hours) is the total hours of college courses. 37

4But suppose that you wonder if all the explanatory variables have different effects on GPA depending on gender. 4That is, you wonder if males and females have different coefficients. 4We can test if this is the case by estimating the following model. 38

(Cumulative GPA) = β 0 +β 1 (SAT)+β 2 (Hispanic)+β 3 (total hours) +δ 0 (female) +δ 1 (female)(SAT) +δ 2 (female)(Hispanic) +δ 3 (female)(Total hours)+u Then we can test of if males and females have different coefficients by testing the following hypotheses using F-test. H 0 : δ 0 =0, δ 1 =0, δ 2 =0, δ 3 =0 H 1 : H 0 is not true 39

4This particular F-test is called the Chow test. 4Now, using GPA3.dta, conduct the Chow test described above. 40

41 We reject the null hypothesis that male and female have the same functional form at 5% significance level.

Chow test: What to do when you have a lot of variables. 4Chow test is easy when your initial model contains 3 or 4 variables. 4But if your model contains many variables, creating interaction terms takes a lot of time. 4Here is another way to do the same Chow test. 42

The equivalent procedure of Chow test : (Let me explain this by using the same example) Step 1 : Estimate the initial model using only the male sample. (Cumulative GPA) = β 0 +β 1 (SAT)+β 2 (Hispanic)+β 3 (total hours)+u The obtain SSR. Call this SSR 1. 43

Step 2 : Estimate the initial model using only the female sample. (Cumulative GPA) = β 0 +β 1 (SAT)+β 2 (Hispanic)+β 3 (total hours)+u The obtain SSR. Call this SSR 2. 44

Step 3 : Estimate the initial model using pooled sample (both males and females included) (Cumulative GPA) = β 0 +β 1 (SAT)+β 2 (Hispanic)+β 3 (total hours)+u The obtain SSR. Call this SSR p. 45

Step 4 : Compute the following statistic 46 This F-statistic follows F distribution with degree of freedom equal to [k+1, n-2(k+1)] You reject the null hypothesis that males and females have the same coefficients if F-stat falls in the rejection region. This particular F-stat is called Chow statistic. This F-stat will be the same as the F-stat when you include the interaction terms as described before. k is the number of slope parameters in the initial model. Note k does not include female. So in our example, k=3. n is the number of the observations.

Exercise 4Conduct Chow test again using the alternative method described above. 47

48 Male only sample Female only sample SSR1 SSR2

49 SSR P Pooled sample (both male and female) This follows F[3+1, 724-2(3+1)]=F(4, 716) The cutoff at 5% significance level is 2.37. Thus we reject the null hypothesis that males and females have the same coefficients. Also note that this F-stat is the same as the F-stat you obtained by using the other method.

Always think whether the policy variable is endogenous or not 4Consider that you are interested in estimating the effects of employee training grants on the employee productivity. Then you may estimate (Productivity)= β 0 +β 1 (grant)+β 2 (sales)+(Other factors)+u 50

4Using JTRAIN.dta, we estimate the above model. 4We use the log of scrap rate as the measure of the productivity. The lower the scrap rate, the higher the productivity. 51

52 So, we did not find evidence that (grant) reduces scrap rate (i.e., grand does not increase productivity). But is this effect the true effect? Now, the most important condition for OLS is that the explanatory variables should be uncorrelated with the error term. Let us consider if (grant) is uncorrelated with the error term.

4The answer is that, it is likely that the grant is correlated with the error term. In other word, (grant) is likely to be endogenous. Thus, the coefficient on (grant) is likely to be biased. 4The reason is the following. 4This employee training grant is given to firms on first-come first-serve basis. 53

4Thus, it is very likely that the firms with less productive workers saw a greater benefit in this training grant. Thus, less productive firms are more likely to have received the grant. 4This causes the endogeneity problem. 4To clear up the situation, consider a variable, (ability), which is the average ability of workers prior to the grant application. 54

4(Ability) would affect the scrap rate of the firm, but this is unobserved to the researcher. 4Thus, this variable is contained in the error term. This can be written as: 55  So the error term u is equal to ( β 3 ability+e) 4Since firm with low ability workers are more likely to get the grant, (grant) and (ability) are negatively correlated.

4This means that u and (grant) are correlated. Thus (grant) is endogenous, and thererefore, the coefficeint for grant will be biased. 4Notice that the endogeneity in this example is caused by the fact that the firms with low ability workers self- selected into the grant program. 4Thus, this is often called self-selection problem. 56

4Therefore, in a policy analysis, if the observations self-select into the program, you should always suspect endogeneity in the policy variable. 57

4Now, the next question is, what is the direction of the biases? 4We can use the omitted variable bias framework to guess the direction. 58

4If we have the variable (ability), we can estimate the following model, which satisfies all the OLS assumptions. 59 4However, since we do not observe (ability), we can only estimate the following model which omits ability.

4If we had the variable ability, we could estimate (1). Let be the OLS coefficient for (grant) for equation (1) 4Let be the estimated OLS coefficient for (grant) using equation (2). 4Then using the result of omitted variable biases in the handout 2, the relationship between and is given by: 60 The bias term: This determines the direction of bias The true effect of (grant) The actual estimate of the effect of grant (which is unfortunately biased).

4 is OLS coefficient for (grant) in the following regression. 61 4Since firm with high ability workers are more productive (i.e., scrap rate is low), will be negative 4Since (grant) and (ability) are negatively correlated, is negative. 4Therefore, we can predict the direction of bias as follows.

62 Negative since grant is likely to reduce scrap rate (i.e., increase productivity) There will be positive bias (or upward bias). Thus, even if the true effect of grant on scrap rate is negative, the bias term will cancel out this effect. Thus, the endogeneity problem will bias the coefficient towards not finding the effects of grant on scrap rate.

1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Similar presentations

Presentation on theme: "1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Similar presentations

Presentation on theme: "1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©"— Presentation transcript:

Similar presentations

About project

Feedback