Download presentation
Presentation is loading. Please wait.
Published byAnissa Greene Modified over 9 years ago
1
17-1 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly
2
17-2 Chapter 17 Learning Objectives (LOs) LO 17.1: Use dummy variables to capture a shift of the intercept. LO 17.2: Test for differences between the categories of a qualitative variable. LO 17.3: Use dummy variables to capture a shift of the intercept and/or slope.
3
17-3 Is There Evidence of Wage Discrimination? Three Seton Hall professors recently learned in a court decision that they could pursue their lawsuit alleging the University paid higher salaries to younger instructors and male professors. Mary Schweitzer works in human resources at another college and has been asked by the college to test for age and gender discrimination in salaries. She gathers data on 42 professors, including the salary, experience, gender, and age of each.
4
17-4 Is There Evidence of Wage Discrimination? Using this data set, Mary hopes to: 1.Test whether salary differs by a fixed amount between males and females. 2.Determine whether there is evidence of age discrimination in salaries. 3.Determine if the salary difference between males and females increases with experience.
5
17-5 17.1 Dummy Variables In previous chapters, all the variables used in regression applications have been quantitative. In empirical work it is common to have some variables that are qualitative: the values represent categories that may have no implied ordering. We can include these factors in a regression through the use of dummy variables. A dummy variable for a qualitative variable with two categories assigns a value of 1 for one of the categories and a value of 0 for the other. LO 17.1 Use dummy variables to capture a shift of the intercept.
6
17-6 Variables with Two Categories For example, suppose we are interested in determining the impact of gender on salary. We might first define a dummy variable d (other meaningful names e.g., Dgender, are better) that has the following structure: Let d = 1 if gender = “female” and d = 0 if gender = “male.” This allows us to include a measure for gender in a regression model and quantify the impact of gender on salary. LO 17.1
7
17-7 Regression with a Dummy Variable LO 17.1
8
17-8 Regression with a Dummy Variable LO 17.1
9
17-9 Regression with a Dummy Variable Graphically, we can see how the dummy variable shifts the intercept of the regression line. LO 17.1
10
17-10 Salaries, Gender, and Age LO 17.1 d 1 = 1 for male and 0 for female d 2 = 0 for young and 1 for old SalaryExperd1d2GenderAge 67.501410MaleUnder 53.51610MaleUnder 50.05200FemaleUnder 111.883411MaleOver 63.682110MaleUnder 75.563501FemaleOver 65.501410MaleUnder
11
17-11 Estimation Results LO 17.1 The estimated model is ŷ = 40.61 + 1.13x + 13.92d 1 + 4.34d 2. b. The predicted salary of a 50-year old male professor (d 1 = 1 and d 2 = 0) with 10 years of experience (x = 10) is ŷ = 40.61 + 1.13(10) + 13.92(1) + 4.34(0) = 65.83, or $65,830. The corresponding salary of a 50-year-old female (d 1 = 0 and d 2 = 0) is ŷ = 40.61 + 1.13(10) + 13.92(0) + 4.34(0) = 51.91, or $51,910. The predicted difference in salary between a male and a female professor with 10 years of experience is $13,920 (65,830 − 51,910). This difference can also be inferred from the estimated coefficient 13.92 of the gender dummy variable d 1. Note that the salary difference does not change with experience. For instance, the predicted salary of a 50-year-old male with 20 years of experience is $77,130. The corresponding salary of a 50-year-old female is $63,210, for the same difference of $13,920.
12
17-12 Estimation Results LO 17.1 c. For a 65-year-old female professor with 10 years of experience, the predicted salary is ŷ = 40.61 + 1.13(10) + 13.92(0) + 4.34(1) = 56.25, or 56,250. Prior to any statistical testing, it appears that an older female professor earns, on average, $4,340 (56,250 − 51,910) more than a younger female professor with the same experience.
13
17-13 Testing the Significance of Dummy Variables The statistical tests discussed in Chapter 15 remain valid for dummy variables as well. We can perform a t-test (using p-value) for individual significance, form a confidence interval using the parameter estimate and its standard error, and conduct a partial F test for joint significance. LO 17.2 Test for differences between the categories of a qualitative variable.
14
17-14 Example 17.2 LO 17.2
15
17-15 Multiple Categories LO 17.2
16
17-16 Multiple Categories LO 17.2 d1d1 d2d2 Public10 Alone01 Carpool00
17
17-17 Avoiding the Dummy Variable Trap Given the intercept term, we exclude one of the dummy variables from the regression. If we included as many dummy variables as categories, this would create perfect multicollinearity in the data, and such a model cannot be estimated. So, we include one less dummy variable than the number of categories of the qualitative variable. LO 17.2
18
17-18 Homework Problem 8 on p. 524. the data file (SATdummy) is posted on S: drive. The answers are in the appendix.
19
17-19 Example 17.3 A recent article suggests that Asian-Americans face serious discrimination in the college admissions process (The Boston Globe, February 8, 2010). Specifically, Asian applicants typically need an extra 140 points on the SAT to compete with white students. Another report suggests that colleges are eager to recruit Hispanic students who are generally underrepresented in applicant pools (USA Today, February 8, 2010). In an attempt to corroborate these claims, a sociologist first wants to determine if SAT scores differ by ethnic background. She collects data on 200 individuals from her city with their recent SAT scores and ethnic background.
20
17-20 Example 17.3 Race DVWhiteB0lackAsian White100 Black010 Asian001 Hispanic000 3, not 4 DV as follows:
21
17-21 Example 17.3 b. For an Asian individual, we set d 1 = 0, d 2 = 0, d 3 = 1 and calculate ŷ = 1388.89 + 264.86 = 1653.75. Thus, the predicted SAT score for an Asian individual is approximately 1654. The predicted SAT score for a Hispanic individual (d 1 = d 2 = d 3 = 0) is ŷ = 1388.89, or approximately 1389. c. Since the p-values corresponding to d 1 and d 3 are approximately zero, we conclude at the 5% level that the SAT scores of White and Asian students are different from those of Hispanic students. However, with a p-value of 0.16, we cannot conclude that the SAT scores of Black and Hispanic students are statistically different.
22
17-22 Homework Problem 11on p. 524. the data file (Retail Sales) is posted on S: drive. Do not do part d.
23
17-23 Problem 11 on page 524 A government researcher is analyzing the relationship between retail sales and the gross national product (GNP). He also wonders whether there are significant differences in retail sales related to the quarters of the year. He collects ten years of quarterly data. A portion is shown in the accompanying table; the complete data set can be found on the text website, labeled Retail Sales.
24
17-24 a. Estimate y = β 0 + β 1 x + β 2 d 1 + β 3 d 2 + β 4 d 3 + where y is retail sales, x is GNP, d 1 is a dummy variable that equals 1 if quarter 1 and 0 otherwise, d 2 is a dummy variable that equals 1 if quarter 2 and 0 otherwise, and d 3 is a dummy variable that equals 1 if quarter 3 and 0 otherwise. b. Predict retail sales in quarters 2 and 4 if GNP equals $13,000 billion. c. Which of the quarterly sales are significantly different from those of the 4th quarter at the 5% level? d. Use the partial F test to determine if the three seasonal dummy variables used in the model are jointly significant at the 5% level.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.