Example 12.4 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield The Partial F Test
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Objective To use several partial F tests to see whether various groups of explanatory variables should be included in a regression equation for salary, given that other variables are already in the equation.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 BANK.XLS n Recall from Example 11.3 that the Fifth National Bank has 208 employees. n The data for these employees are stored in this file. n In the previous chapter we ran several regressions for Salary to see whether there is convincing evidence of salary discrimination against females. n We will continue this analysis here.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Analysis Overview n First, we will regress Salary versus the Female dummy, YrsExper, and the interactions between Female and YrsExper, labeled Fem_YrsExper. This will be the reduced equation. n Then we’ll see whether the JobGrade dummies Job_2 to Job_6 add anything significant to the reduced equation. If so, we will then see whether the interactions between the Female dummy and the JobGrade dummies, labeled Fem_Job2 to Fem_Job6, add anything significant to what we already have.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Analysis Overview -- continued n If so, we’ll finally see whether the education dummies Ed_2 to Ed_5 add anything significant to what we already have.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution n First, note that we created all of the dummies and interaction variables with StatPro’s Data Utilities procedures. n Also, note that we have used three sets of dummies, for gender, job grad and education level. n When we use these in a regression equation, the dummy for one category of each should always be excluded; it is the reference category. The reference categories we have used are “male”, job grade 1 and education level 1.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n The output for the “smallest” equation using Female, YrsExper, and Fem_YrsExper as explanatory variables is shown here.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n We’re off to a good start. These three variables already explain 63.9% of the variation of Salary. n The output for the next equation which adds the explanatory variables Job_2 to Job_6 is on the next slide. n This equation appears much better. For example R 2 has increased to 81.1%. We check whether it is significantly better with the partial test in rows
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n The degrees of freedom in cell C28 is the same as the value in cell C12, the degrees of freedom for SSE. n Then we calculate the F-ratio in cell C29 with the formula =((Reduced!D12- Complete!D12)/Complete!C27)/Complete!E12 were Reduced!D12 refers to SSE for the reduced equation from the Reduced sheet. n Finally, we calculate the corresponding p-value in cell C30 with the formula =FDIST(C29,C27,C28). It is practically 0, so there is no doubt that the job grade dummies add significantly to the explanatory power of the equation.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n Do the interactions between the Female dummy and the job dummies add anything more? n We again use the partial F test, but now the previous complete equation becomes the new reduced equation, and the equation that includes the new interaction terms becomes the new equation. n The output for this new complete equation is shown on the next slide. n We perform the partial F test in rows as exactly as before. The formula in C34 is =((Complete!D12- MoreComplete!D12)/MoreComplete!C32)/ MoreComplete!E12.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n Again the p-value is extremely small, so there is no doubt that the interaction terms add significantly to what we already had. n Finally, we add the education dummies. n The resulting output is shown on the next slide. We see how the terms reduced and complete are relative. n This output now corresponds to the complete equation, and the previous output corresponds to the reduced equation.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Solution -- continued n The formula in cell C38 for the F-ratio is now =((MoreComplete!D12- StillMoreComplete!D12/StillMoreComplete!C36)/ StillMoreComplete!E12. The R 2 value increased from 84.0% to 84.7%. Also the p-value is not extremely small. n According to the partial F test, it is not quite enough to qualify for statistical significance at the 5% level. n Based on this evidence, there is not much to gain from including the education dummies in the equation, so we would probably elect to exclude them.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Concluding Comments n First, the partial test is the formal test of significance for an extra set of variables. Many users look only at the R 2 and/or s e values to check whether extra variables are doing a “good job”. n Second, if the partial F test shows that a block of variables is significant, it does not imply that each variable in this block is significant. Some of these values can have low t-values.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Concluding Comments -- continued n Third, producing all of these outputs and doing the partial F tests is a lot of work. Therefore, we included a “Block” option in StatPro to make life easier. To run the analysis in this example use StatPro/Regression analysis/Block menu item. After selecting Salary as the response variable, we see this dialog box.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Concluding Comments -- continued n We want four blocks of explanatory variables, and we want a given block to enter only if it passes the partial F test at the 5% level. In later dialog boxes we specify the explanatory variables. Once we have specified all this, the regression calculations are done in stages. The output from this appears on the next two slides. The output spans over two figures. Note that the output for Block 4 has been left off because it did not pass the F test at 5%.
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5
| 12.2 | 12.3 | 12.3a | 12.1a | 12.4a | a12.1a12.4a12.5 Concluding Comments -- continued n Finally, we have concentrated on the partial F test and statistical significance in this example. We don’t want you to lose sight, however, of the bigger picture. Once we have decided on a “final” regression equation we need to analyze its implications for the problem at hand. n In this case the bank is interested in possible salary discrimination against females, so we should interpret this final equation in these terms. Our point is simply that you shouldn’t get so caught in the details of statistical significance that you lose sight of the original purpose of the analysis!