SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)
To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to appreciate requirements and limitations of variables used in a multiple regression recognise the dependence of anova results on the order of fitting variables interpret results of anova results when terms are fitted sequentially understand the difference between interpretation of t-probabilities and anova F- probabilities when there are 2 or more xs.
To put your footer here go to View > Header and Footer 3 The crimes example again! Recall that in the example on relating number of acts regarded as crimes to age, college years and parents income, the college variable was non-significant. Although a quantitative variable, college had only 3 possible values! This is NOT a problem since college is an x variable, and there were many observations at each of these values. It is a problem if the y-variable had only a few distinct values – normality assumption is then violated.
To put your footer here go to View > Header and Footer 4 Points to note about the variables In the regression analyses so far considered, 1.the y-variable is a quantitative measurement, assumed to have an approximate normal distribution. 2.The x-variables are quantitative variates, each contributing 1 d.f. to the model. However, some xs could be categorical factors, each contributing d.f.=number of levels -1 to the model. The latter case will be discussed later!
To put your footer here go to View > Header and Footer 5 But – care is sometimes needed… If an x-variable has only a few values, pay attention to the number of observations for each. In practical 6, variable empl was highly significant (p=0.006) The residual plot looked OK, apart from one outlier (where just 1 HH had 3 employed members). But… will empl remain significant if the outlier was removed?
To put your footer here go to View > Header and Footer 6 Results after deleting outlier lnexpdf| Coef. Std. Err. t P>|t| hhsize| empl| const.| Note that empl is now non-significant! Dangerous to use a model where conclusions depend on just 1 observation!
To put your footer here go to View > Header and Footer 7 ANOVA for 2-variables (sequential) We return again to the crimes example to show the effect of the order of fitting terms Source | df Seq.SS MS F Prob>F age | college | Residual | Total | Here, age is fitted first, then college, hence F- probs need to be interpreted accordingly.
To put your footer here go to View > Header and Footer 8 ANOVA for 2-variables (sequential) Consider now the anova with the order of fitting terms changed… Source | df Seq.SS MS F Prob>F college | age | Residual | Total | Here, college is fitted first, then age. Note change in F-probs from previous slide. Why is this?
To put your footer here go to View > Header and Footer 9 Discussion… What is the same and what is different aross slides 7 and 8 above? Order of fitting seems to matter! What do the results mean? How do the F-probs from above and the t- probs below for model estimates compare? crimes | Coef. P>|t| age | college | const. |
To put your footer here go to View > Header and Footer 10 Exercise: 2 nd example: Q2, Pract. 6 Open penrain.dta from Q2 of previous practical. Note down anova results below from a regression of rain on elevation, then altitude. Sourced.f.S.S.M.S.FProb. Elevation1 Altitude1 Residual13 Total15 Interpretation of F-probs:
To put your footer here go to View > Header and Footer 11 Changing order of fitting: Now fit altitude, then elevation. Note down the results below. Sourced.f.S.S.M.S.FProb. Altitude1 Elevation1 Residual13 Total15 Interpretation of F-probs:
To put your footer here go to View > Header and Footer 12 Model parameter estimates: Finally, note down the parameter estimates and the corresponding t-probabilities: Parameter Estimate of model parameter t-Prob. Altitude Elevation Constant Overall conclusions:
To put your footer here go to View > Header and Footer 13 Adjusted sums of squares Some software packages present adjusted sums of squares, taking results from anova tables in slides 10 and 11 into one single anova: SourcedfAdj. SSAdj MSFProb. Altitude Elevation Residual Total Note that the sums of squares now do not add to the total S.S. What do the F-probabilities now represent?
To put your footer here go to View > Header and Footer 14 Key Points Recognise the type of variable (y) being modelled. Methods discussed apply when y is quantitative The explanatory variables (the xs) can be variables of any type – but so far we have only considered quantitative xs Take care when interpreting anova F-probs to check whether the sums of squares are sequential or adjusted Note that all t-probabilities (associated with the parameter estimates) are adjusted for all other terms in the model
To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…