1 Revisiting salary Acme Bank: Background A bank is facing a discrimination suit in which it is accused of paying its female employees less than their male employees The bank had 208 employees in females and 68 males
2 Raw Data: First 6 records of the data are shown below Summary statistics by Gender
3 Simple regression: Salary vs. Gender
4 Difference in salaries Female salaries are lower than male salaries on average by $8,295 (coefficient of Gender; Gender=0 for male and Gender=1 for female) Although r-square is low (12%), most people would agree that the difference in salaries is statistically significant based on the low p-value Simple regression only looks at things at the gross or surface level Multiple regression helps us net out effects of other important variables, such as prior experience, job grade, educational background etc. We need to include additional variables (information) in the analysis to see if salaries are different We expand the model by including prior experience (YrsPrior) in the banking industry, and years at this bank (YrsExp)
5 MR1: Salary vs Gender, Years at the current bank (YrsExpr), and prior experience in banking (YrsPrior)
6 Questions 1.Is the MR1 better than the simple regression model (ensure that the model is significant, and compare r-squared values)? 2.Interpret the coefficient of Gender in MR1 3.Interpret the coefficients of YrsExp and YrsPrior
7 Further expansion Next, we add Job Grade (see slide #2) to the MR1 model Since Job Grade is a categorical variable with 6 levels, 5 dummy variables are created to represent these levels (Job_2 through Job_6 ) Job_2 was set to 1 if Job Grade was 2, and zero other wise – similar approach was used in coding Job_3 through Job_6 4.Can you tell which Job Grade is represented by the default setting where Job_2, Job_3, Job_4, Job_5 and Job_6 equal zero? Please ask me to elaborate if you are not clear on this.
8 MR2
9 Questions 5.Is this model better than the first two models (check that the model is significant; please also look at r- squared)? 6.Are female salaries significantly lower than male salaries at the 10% level? 7.Are you clear on the interpretation of coefficients for the Job Grade dummy variables? Can you determine the difference in salaries for a person moving from Job Grade 2 to 3, all else equal?
10 Full Model More information was added to further expand the model Four dummy variables (Ed_2 through Ed_5) were added to represent the 5 education levels Age for each employee was included
11 MR3
12 Questions 8.Is the most recent model in MR3 (slide #11) better than the MR2 (slide #8)? 9.Are female salaries significantly lower than male salaries at 10%? 10.Using p-values, identify variables that do not appear to contribute significantly to MR3
13 Refining the regression model – removing variables Age does not appear to contribute significantly to the MR3 model Most education levels (with the exception of Ed_5) also do not appear to contribute significantly to the MR3 model Thus, we should exclude Age and Education Level variables from further analysis (this gives us the same model as MR2, but it is shown again as MR4 on the next slide for easy reference)
14 MR4 (Same as MR2)
15 Are further refinements possible? Seems like YrsPrior is not adding to MR4 – lets do the analysis again by excluding this variable
16 MR5
17 Gender and experience – is there an interaction? The above table shows correlations between Salary and the Yrs of Experience and Job Grade for Males and Females The correlation between Yrs of Experience and Salary appears to be much stronger for males than females – in other words, male employees are moving up the salary ladder faster than female employees – the analyst felt this may be the source of salary discrimination at Acme Bank Thus, there appears to be an interaction: The effect that Yrs of Experience has on Salary depends on whether the employee is male or female Regression analysis can be improved by adding an interaction term in the model – the method is described in the next slide
18 Variables for the model To capture the interaction between Yrs Experience and Gender, a new variable called Gen*YrsExp was created by multiplying the value of Gender (0 or 1) by the employees experience (YrsExp) at Acme Thus the MR model with interaction is: Salary against Gender, YrsExp, and Gen*YrsExp
19 Regression model with interaction
20 Questions 11.What is the regression equation for male employees? 12.What is the regression equation for female employees? 13.How do we interpret the regression coefficients in the slide #19?
21 Questions 14.According to the regression model, what is the salary for a male (Gender=0) who has 1 year of experience at this bank? 15.What is the predicted salary for a male (Gender=0) who has 6 years of experience at this bank? 16.Answer the above questions for a female (Gender=1) at this bank Contd.
22 Questions 17.Looking at your answers, can you tell if there is an interaction? 18.Can you explain the interaction? 19.Is the interaction significant (=10%)? Contd.
23 How many interaction variables? Suppose we want to test the interaction between Gender and Job-Grade 20.How many interaction variables would we need? 21.Lets add the interaction variable to our best MR model so far (MR5 on slide #16) to see if further improvements are possible …. The new model (MR6) is shown in the following slide – do you think the model with interaction is better (is MR6 better than MR5 -- why)?
24 MR6: Full model with interaction
25 But your data has an outlier! Before we accept that there are significant differences between male and female salaries, wed like to address the issue of outliers Specifically, there is a female employee in the highest job grade (Job Grade 6), has 33 years of experience at Acme, but whose salary is only $30,000 – this could a major source of discrimination at Acme Bank To see if this is the case, we remove this employee from our data and redo the regression analysis
26 MR7: Regression with outlier removed
27 Questions 22.Does MR7 support the argument that male and female salaries are different? Does this make the case stronger or weaker for those accusing Acme of gender discrimination? 23.Is MR7 better than MR6?
28 Outcome of the Case…. So what was the outcome of the case.. Any guesses?