Presentation is loading. Please wait.

Presentation is loading. Please wait.

More on regression Petter Mostad 2005.10.24. More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.

Similar presentations


Presentation on theme: "More on regression Petter Mostad 2005.10.24. More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will."— Presentation transcript:

1 More on regression Petter Mostad 2005.10.24

2 More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will just have an addition to the constant term To use different slopes for these cases, additional variables must be added (products of predictors and indicators) By viewing the constant term as a data column, we can express the models more symmetrically

3 Several indicator variables A model with two indicator variables will assume that the effect of one indicator adds to the effect of the other If this may be unsuitable, use an additional interaction variable (product of indicators) For categorical variables with m possible values, use m-1 indicators.

4 Logistic regression What if the dependent variable is an indicator variable? The model then has two stages: First, we predict a value z i from predictors as before, then the probability of indicator value 1 is given by Given data, we can estimate coefficients in a similar way as before

5 Experimental design So far, we have considered data as given; to the extent that we can control what data we have, how should we choose to set the independent variables? –Choice of variables –Choice of values for these variables

6 Choice of variables Include variables which you believe have a clear influence on the dependent variable, even if the variable is ”uninteresting”: This helps find the true relationship between ”interesting” variables and the dependent. Avoid including a pair (or a set) of variables whose values are clearly linearily related

7 Multicollinearity To discover it, make plots and compute correlations (or make a regression of one parameter on the others) To deal with it: –Remove unnecessary variables –Define and compute an ”index” –If variables are kept, model could still be used for prediction

8 Specification bias Unless two independent variables are uncorrelated, the estimation of one will influence the estimation of the other Not including one variable which bias the estimation of the other Thus, one should be humble when interpreting regression results: There are probably always variables one could have added

9 Choice of values Should have a good spread: Again, avoid collinearity Should cover the range for which the model will be used For categorical variables, one may choose to combine levels in a systematic way.

10 Generating experimental designs For n binary variables, there are 2 n ways to set them in different combinations. If 2 n is too big, there are systematic ways to choose from these 2 n experiments. If 2 n is too small, we can use several experiments at each setting.

11 Heteroscedasticity – what is it? In the standard regression model it is assumed that all have the same variance. If the variance varies with the independent variables or dependent variable, the model is heteroscedastic. Sometimes, it is clear that data exhibit such properties.

12 Heteroscedasticity – why does it matter? Our standard methods for estimation, confidence intervals, and hypothesis testing assume equal variances. If we go on and use these methods anyway, our answers might be quite wrong!

13 Heteroscedasticity – how to detect it? Fit a regression model, and study the residuals –make a plot of them against independent variables –make a plot of them against the predicted values for the dependent variable Possibility: Test for heteroscedasticity by doing a regression of the squared residuals on the predicted values.

14 Heteroscedasticity – what to do about it? Using a transformation of the dependent variable –log-linear models If the standard deviation of the errors appears to be proportional to the predicted values, a two-stage regression analysis is a possibility

15 Dependence over time Sometimes, y 1, y 2, …, y n are not completely independent observations (given the independent variables). –Lagged values: y i may depend on y i-1 in addition to its independent variables –Autocorrelated errors: Successive observations y i, y i+1,… depend similarily on unobserved variables

16 Lagged values In this case, we may run a multiple regression just as before, but including the previous dependent variable y i-1 as a predictor variable for y i.

17 Autocorrelated errors In the standard regression model, the errors are independent. Using standard regression formulas anyway can lead to errors: Typically, the uncertainty in the result is underestimated. –Example: Taking observations closer and closer together in time will not increase your knowledge about regression parameters beyond a certain point

18 Autocorrelation – how to detect? Plotting residuals against time! The Durbin-Watson test compares the possibility of independent errors with a first-order autoregressive model: Test statistic: Option in SPSS

19 Autocorrelation – what to do? It is possible to use a two-stage regression procedure: –If a first-order auto-regressive model with parameter is appropriate, the model will have uncorrelated errors Estimate from the Durbin-Watson statistic, and estimate from the model above


Download ppt "More on regression Petter Mostad 2005.10.24. More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will."

Similar presentations


Ads by Google