Presentation is loading. Please wait.

Presentation is loading. Please wait.

DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.

Similar presentations


Presentation on theme: "DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building."— Presentation transcript:

1 DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building

2 Multiple Regression y =  0 +  1 x 1 +  2 x 2 + …+  p x p So we will be using p different variables to predict y. Conceptually we will be doing the same thing as in simple linear regression, except having more variables. We will be approaching this topic mostly by example. Several of the concepts are also applicable to simple linear regression. 2DSCI 346 Lect 6 (15 pages)

3 Example We are trying to see which factors influence the total amount of dollars spent on medical care by a person with diabetes. Our outcome (dependent variable) is the amount spent (net pay) Our predictors (independent variables) will be Age, gender, Severity of Illness Score, whether or not the patient had a test for blood sugar level, whether or not the patient had a test for cholesterol level, whether or not the patient was on a blood pressure medication, and the percent of the severity of illness score that was attributable to each of the following disorders: metabolic, musculoskeletal, psychiatric, respiratory, diabetes, neoplasms, cardiovascular 3DSCI 346 Lect 6 (15 pages)

4 4 Normality and Transformations Do a histogram of the net pay variable Not normal, transform variable by taking ln (base e logarithm) of net pay

5 DSCI 346 Lect 6 (15 pages)5 Normality and Transformations Do a histogram of the severity of illness variable Not normal, transform variable by taking ln (base e logarithm)

6 DSCI 346 Lect 6 (15 pages)6 Do a scattergram of ln net pay vs ln severity, look for non-linear pattern Do scattergram of ln net pay vs age Checking model

7 Other data transfomations Variables that yes/no type of variables are transformed into 1/0 variables (e.g. gender variable becomes female =1 and male = 0 since in regression models all variables must be numeric 7DSCI 346 Lect 6 (15 pages)

8 Multicollinearity Multicollinearity exists when independent variables are correlated with each other and therefore have some redundancy with regard to the information they provide in explaining the variation in the independent variable. One major impact of multicollinearity is that the significance tests of independent variables is not accurate. 8DSCI 346 Lect 6 (15 pages)

9 Multicollinearity One way to measure multicollinearity is called the Variance Inflation Factor (VIF) VIF (for each independent variable) = 1/(1-R 2 j ) where R 2 j is the coefficient of determination when the j th independent variable is regressed against the remaining k-1 independent variables VIFs > 5.0 indicate issues with multicollinearity Example (Birthweight data) VariableR 2 j VIF j Multicollinearity Age0.069291.07No LWT0.043281.05No FTV0.057091.06No 9DSCI 346 Lect 6 (15 pages)

10 Interactions Sometimes the affect of one of the variables is impacted by the value of another variable. To model this another variable is created by multiplying the variables together. For example, if you believed that the impact of the severity of illness was different for females than for males you could create an interaction variable that was created by multiplying the gender variable by the severity variable. In this example, I believed that the impact of the blood sugar test, the cholesterol test and the use of blood pressure medication was impacted by the severity so I created three interaction terms 10DSCI 346 Lect 6 (15 pages)

11 11 Fit model and check for outliers by looking at standardized residuals...... (residuals that have been transformed so they have standard normal distribution)

12 Remove outliers by deleting those data points and refit model Check again for severe outliers, delete data points and repeat until no more severe outliers 12DSCI 346 Lect 6 (15 pages)

13 Model fitting and variable selection Backwards selection You want a significant fit and all the variables to be significant and a reasonable adjusted R 2 Significant fit since pvalue <.05 Not all variables significant 13DSCI 346 Lect 6 (15 pages)

14 Remove least significant predictor, refit and repeat until all variables are significant 14DSCI 346 Lect 6 (15 pages)

15 15 Check model fit using residuals Underpredicting high values and overpredicting low values

16 Other model building strategies Forward selection (Start with independent with highest R 2, add in second significant independent that creates the highest model R 2, etc) Best possible subset (Do all possible combinations, then choose the best. Issue is best is not universal defined; common criteria include Adjusted R 2, Mallows C p, all independent variables significant, makes sense, etc) DSCI 346 Lect 6 (15 pages)16


Download ppt "DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building."

Similar presentations


Ads by Google