Download presentation
Presentation is loading. Please wait.
1
Regression diagnostics
Hein Stigum Presentation, data and programs at: Jan-19 Jan-19 H.S. H.S. 1
2
Agenda Linear regression diagnostics (Logistic regression)
Assumtions Robust results (Logistic regression) (Poisson regression) Time: Linear: min Logistic, binary, conditional: 60 min 2 January 2019 Jan-19 H.S. H.S. 2
3
Birth weight by gestational age
Linear regression Birth weight by gestational age Jan-19 2 January 2019 H.S. H.S. 3
4
Workflow Scatterplots Bivariate analysis Regression Model fitting
Cofactors in/out Interactions Test of assumptions Independent errors Linear effects Constant error variance Influence (robustness) Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier 2 January 2019 Jan-19 H.S. H.S. 4
5
Scatterplot Jan-19 2 January 2019 H.S. H.S. 5
pregnancy=280 days is normal N=518 Jan-19 2 January 2019 H.S. H.S. 5
6
Results Outcome: birthweight Covariates: gestational age, sex, parity
Model: linear regression OBS: synthetic data Study birthweigth Vary with sex, mothers age and espesially gest age (length of pregnancy in days) Descriptive (bivar analysis) Expected =3531 for boys, parity=0 gest=280 To get average must take /2+2/5*230+2*17=3580 Jan-19 H.S.
7
Model diagnostics Model Assumptions Robustness
Independent errors (residuals) Linear effects Constant error variance Robustness Y must be normal? Normal Y-skewed X Skewed Y-Normal X 2 January 2019 Jan-19 H.S. H.S. 7
8
Checking assumptions Jan-19 H.S.
9
1. Independent residuals
No diagnostic tool Possible violations Pupils nested in schools: weak correlations Repeated measurement: strong correlations Models Adjust for clustering Linear mixed models GEE Birth weight example: possible violations: repeated births by the same mother Jan-19 H.S.
10
2. Linear effects Save residuals and predicted values
Plot resid vs pred If non-linear: Plot resid vs cont. vars Add square term or cut in categories Add gest^2, linearity now OK Jan-19 H.S.
11
Linear effect test Model 1: only linear terms Significant means
non-linearity Model 2: linear terms+square term Jan-19 H.S.
12
3. Constant residual variance
Plot resid vs pred If non-constant variance: Robust regression Weighted regression SPSS: log transform, use poisson regression look for missing cofactor Jan-19 H.S.
13
Constant variance test
Significant means non-const. var. Jan-19 H.S.
14
Weighted regression Estimate residual variance Weights=1/variance
Effects Takes care of heteroskedasticity “robustification” Jan-19 H.S.
15
Summary of assumptions
Dependent residuals Mixed models: xtmixed Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex Non-constant variance regress weigth gest sex, robust Linear mixed models: xtmixed 2 January 2019 Jan-19 H.S. H.S. 15
16
Checking robustness , Measures of influence
Jan-19 H.S.
17
Measures of influence Measure change in: Predicted (y) Deviance
Remove obs 1, see change remove obs 2, see change Measure change in: Predicted (y) Deviance Coefficients (beta) 2 January 2019 Jan-19 H.S. H.S. 17
18
Influence idea Outlierness Leverage Influence Residuals
Distance from x-mean Influence Combination Jan-19 H.S.
19
Leverage versus residuals2
“Adjusted” scatterplot Added variable plot (partial regression leverage) Look at: 321: high lev, med resid 111: low lev, high resid Lack the ability to see 2 points with opposite large effects, as in delta beta Jan-19 H.S.
20
Delta beta (for gestational age)
Pro: Advantage: direct measure of the coef we are interested in, both pos and neg directions Con: Disadvatage: one measure for each covariate Scaled? Jan-19 H.S.
21
Delta fitted value, Dfits
Jan-19 H.S.
22
Summary: Robustness, influence
Linear regression sensitive! Look for influential points Leverage versus residual plots Added variable plots Delta-beta Rerun regression without influential points and look for change in: coefficients constant term p-values Found influential point in a dataset with N=30 000! (MoBa, exercise) Jan-19 H.S.
23
Logistic, Poisson regression
Assumptions Independent errors as before Linear effects as before Constant error variance no! Robustness Linear not robust! Poisson medium robust Logistic fairly robust Jan-19 H.S.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.