Class 5 Multiple Regression Models
We can readily imagine that there may be several factors that we can include in our model to explain test scores.
Using EXCEL The procedure is the same: tools/data analysis/regression. Note that the independent variables have to be in contiguous columns. The F-test now tests to see if all of the variables are explaining variation in y. The problem becomes tricky because the degree to which a variable appears to be important in explaining the variation in y depends on the other variables present!
Hypothesis Testing The F-test tests to see if all of the coefficients of the independent variables are zero. For our model: The t-test tests to see if each coefficient of an independent variables is zero.
Some Final Comments The first step in building a regression model is to develop a list of candidate variables. Notice that measurement might be a problem. Note that the t-test now takes on an important role. But all you need are the p- values! Examination of residuals may provide clues about other factors that you have left out.
Adding Qualitative Factors Qualitative factors can be added to the model through the use of dummy variables. Consider the following data: Is there information available that shows discrimination based on gender?
Coding the Data We can add the gender factor by coding a variable in the following way: If Female then x = 1, If Male then x = 0. What does our model say about salary? E(y) = expected salary = 0 + 1 x
Doing the Analysis After doing the regression analysis, what hypothesis should we test? Is there another way of doing this test? From prior material?
Coding Variables with More than Two Levels Consider the following data set. How would you code the qualitative factor additive for the model? The additives were added to the gasoline and resulted in the following miles per gallon (MPG). Is there a difference in the additives? What model should we build to check this? Be careful about what the model implies!
Coding Qualitative Variables-- Summary The coding of dummy variables depends upon the number of levels that the qualitative factor has. For k levels, use k-1 dummy variables. The case where k=5: This adds four variables to the model (four columns in your spreadsheet).
More on Dummy Variables Of course, these dummy variables just define different populations of which we are comparing the means. If there are only two populations (one dummy variable), you can use the pooled t- test. In a regression model, we have the luxury of including other factors! Controlling for other factors!
More on Dummy Variables If you have only a set of dummy variables (like the fuel additive problem), you can use ANOVA.