Download presentation
Presentation is loading. Please wait.
Published byRüdiger Seidel Modified over 6 years ago
1
Standard Statistical analysis Linear-, logistic- and Cox-regression
Hein Stigum Presentation, data and programs at: courses 2 h talk: Part 1: start->robustness Part 2: recap, robustness -> end Jan-19 H.S.
2
Goal of analysis Adjusting removes confounding and some selection bias
Outcome How much: Mean, risk, rate Bi-variate analysis (crude or unadjusted) Exposure More among exposed: Diff. in mean, RR or OR Multivariable analysis (adjusted, regression) Adjusting removes confounding and some selection bias Jan-19 H.S.
3
Workflow DAG Scatter- and density plots Bivariate analysis Regression
Model estimation Test of assumptions Independent errors Linear effects No interactions (Constant error variance) Influence E gest age D birth weight C2 education C1 sex Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Jan-19 H.S.
4
Purpose of regression Estimation Prediction
Estimate association between exposure and outcome adjusted for other covariates Prediction Use an estimated model to predict the outcome given covariates in a new dataset DAGs, bias, precision Predictive power, model fit, R2 Counfounding matter in the first Fit of the models matters in the last Jan-19 H.S.
5
Outcome and regression-types
Numerical data Continuous (weight) Linear regression Count (partners) Poisson regression Categorical data Binary (0,1) Risk Linear-risk-, log-risk model Rate Cox, Poisson Odds logistic Ordinal (small,med,large) ologit Multinomial (m. status) mlogit 1/1/2019 H.S.
6
ANALYSIS 1, CONTINUOUS OUTCOME
H.S.
7
Data: m:\pc\dokumenter\a_Courses\Master UiO\birth1.sav
SPSS from Kiosk Data: m:\pc\dokumenter\a_Courses\Master UiO\birth1.sav H.S.
8
Table 1 Outcome: Birth weight Exposure: Gestational age Covariates:
Jan-19 H.S.
9
Table 2 Outcome: Birth weight Exposure: Gestational age Adjusted for:
Education and sex Change to crude and adjusted models Jan-19 H.S.
10
DAG: Gestational age and Birthweight
C2 education C1 sex E gest age D birth weight Birth weight analysis Continuous outcome Plots by gestational age Compare means Linear regression Jan-19 H.S.
11
Regression idea Jan-19 H.S. X=gest, Y=weigth
straigth line: y=constant +slope times x beta0=constant or intercept, beta1=reg. coef. Or slope, y=dep. continous, x=.. Jan-19 H.S.
12
Model, measure and assumptions
Association measure b1 = change in y for one unit increase in x1 Assumptions Independent errors Linear effects Constant error variance Influence In GLM b-s and betas are coefficients In some fields (and in SPSS) b-s are coefs and betas are standardizes coefs Jan-19 H.S.
13
Outcome distributions by exposure
Linear regression cutoff, logistic regression Linear regression or Log-transform, linear regression Jan-19 H.S.
14
Scatter and density plots
Distribution of birth weight for low/high gestational age Scatter of birth weight by gestational age Look for deviations from linearity and outliers Look for shift in shape Jan-19 H.S.
15
Bi-variate Weight by sex (continuous by binary)
ttest bw, by(sex) t-test Weight by education (continuous by categorical-3) anova bw, by(educ) one way anova Weight by gest. age (continuous by continuous) regress bw gest regression ttest bw, by(gest2) cut in 2, t-test Jan-19 H.S.
16
Bi-variate result Jan-19 H.S.
17
Model 1: outcome+exposure
regress bw gest crude model estimates store m1 store model results Jan-19 H.S.
18
Model 2 and 3: Add covariates
regress bw gest i.educ sex add covariates estimates table m1 m2 m3 compare coefs Estimate association: m1 is biased, m2=m3 m3 more precise? m2: se(gest)=0.934 m3: se(gest)=0.926 m2: se(gest)=0.934 m3: se(gest)=0.926 estimates stats m1 m2 m3 compare fit Prediction: m3 is best Jan-19 H.S.
19
Results so far Westreich & Greenland, The table 2 fallacy, 2013 Jan-19 H.S.
20
Assumptions Jan-19 H.S.
21
Test of assumptions Assumptions Plot (Test) Independent residuals:
Linear effects: Constant variance: No interactions Plot (Test) discuss plot residuals versus predicted y predict res, residuals predict pred, xb scatter res pred Dependent residuals if many children from same mother estat hettest p=0.2 no heteroskedasticity Jan-19 H.S.
22
Violations of assumptions
Dependent residuals robust(cluster) or mixed model Non linear effects Add square term or spline Non-constant variance Use robust variance estimation regress y x, robust Interactions Add product (interaction) terms Robust variance estimation with cluster variable: regress y x1 x2, vce(cluster village) Linear mixed models: xtmixed Robust variance: regress y x1 x2, robust Jan-19 H.S.
23
Influence Measures of influence Jan-19 H.S.
Easier to understand influence when effects are linear and without interaction Influence Jan-19 H.S.
24
Influence idea (different data)
delta beta*se=-6.8 Have taken N down to 1000 and moved outlier out to gest=400 Beta changes from 10 to 17 when removing influential outlier Est change=Delta beta*se(gest)=-3.5*2.1=-6.8 Influence=“leverage*residual” Jan-19 H.S.
25
Delta-beta for gestational age
dfbeta(gest) create delta-beta scatter _dfbeta_1 id plot vs id-variable OBS, variable specific If obs nr 370 is removed, beta will change se’s -0.7*0.9 = -0.6 gr 17.9-(-0.6)=18.5 gr Jan-19 H.S.
26
Removing outlier regress bw gest i.educ sex if id!=370 est store m4
est table m3 m4, b(%8.1f) Est change=delta-beta*se=-0.7*0.9=-0.6 Jan-19 H.S.
27
Removing outlier cont. Full model N=5000 Outlier removed N=4999
One outlier affected several estimates Final model Report only the effect of exposure Jan-19 H.S.
28
bw2 Non-linear effects Jan-19 H.S.
29
bw2: Non-linear effects
scatter bw2 gest or scatter res pred Handle: add polynomial or spline Weak non-linearity in exposure: would care Weak non-linearity in confounder: may not care Jan-19 H.S.
30
Non-linear effects: polynomial
regress bw2 c.gest##c.gest i.educ sex 2. order polynomial in gest margins, at(gest=(250(10)310)) predicted bw2 by gest marginsplot plot Could also ask for the effect of gest on bw2: Margins, dydx(gest) at(gest=(250(10)310)) Jan-19 H.S.
31
Non-linear effects: cubic spline
Plot List of knots display mkspline g=gest, cubic nknots(4) make spline with 4 knots (g1,g2,g3) regress bw2 g1 g2 g3 i.educ sex regression with spline gen igest=5*round(gest/5) 5-year integer values of gest margins, over(igest) predicted bw by gest * marginsplot plot Package: findit postrcspline ssc hot author(Buis) Statistical Software Components * findit postrcspline Jan-19 H.S.
32
Non-linear effects: linear spline
Plot (as previous) mkspline g1 280 g2=gest make linear spline with knot at 280 regress bw2 g1 g2 i.educ sex regression with spline Jan-19 H.S.
33
Interaction only linear effects
bw3 Interaction only linear effects Jan-19 H.S.
34
Interaction definitions
Interaction: combined effect of two variables Scale Linear models additive y=b0+b1x1+b2x2 both x1 and x2 = b1+b2 Logistic, Poisson, Cox multiplicative both x1 and x2 = OR1*OR2 Interaction deviation from additivity (multiplicativity) effect of x1 depends on x2 Jan-19 H.S.
35
bw3: Interaction (only linear effects)
Add interaction terms Show results regress bw3 c.gest##i.sex i.educ main + gest-sex interaction margins, dydx(gest) at(sex=0) effect of gest for boys margins, dydx(gest) at(sex=1) effect of gest for girls Jan-19 H.S.
36
Summing up DAG Scatter- and density plots Bivariate analysis
Regression Model estimation Test of assumptions Independent errors Linear effects No interactions Constant error variance Influence E gest age D birth weight C2 education C1 sex Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Jan-19 H.S.
37
Summing up 1 Build model Interaction Assumptions
regress bw gest crude model est store m1 store regress bw gest i.educ sex full model est store m2 est table m1 m2 compare coefficients Interaction regress bw3 c.gest##i.sex i.educ test interaction margins, dydx(gest) at(sex=0) gest for boys Assumptions predict res, residuals residuals predict pred, xb predicted scatter res pred plot Jan-19 H.S.
38
Summing up 2 Non-linearity (linear spline) Robustness
mkspline g1 280 g2=gest spline with knot at 280 regress bw2 g1 g2 i.educ sex regression with spline Robustness dfbeta(gest) delta-beta scatter _dfbeta_1 id plot versus id Jan-19 H.S.
39
ANALYSIS 2, BINARY OUTCOME
SPSS from Kiosk Data: m:\pc\dokumenter\a_Courses\Master UiO\Smoking and Alzheimer.sav ANALYSIS 2, BINARY OUTCOME H.S.
40
Table 1 Outcome: Alzheimer Exposure: Smoking Covariates: Jan-19 H.S.
41
Table 2 Outcome: Alzheimer Exposure: Smoking Adjusted for:
Age, Physical activity and Education Change to crude and adjusted models Jan-19 H.S.
42
Logistic model and assumptions
Independent residuals Linear effects (on the log-odds scale) No interactions (on the multiplicative scale) Inverse link in the plots Mu=proportion with disease in each group Influence: use db Logistic.pdf: Natalia Sarkisian 1 January 2019 H.S.
43
Association measure, Odds ratio
Model: Start with: OR: variable and two values OR for x1 (2 vs 1) Linear effect on log odds level Hence: 1 January 2019 H.S.
44
Short: need to know Assume Association measure Scale
Linear effects on the log-odds scale Association measure OR=eb Scale Multiplicative exposed to both x1 and x2 : OR1*OR2 1 January 2019 H.S.
45
Purpose of regression Estimation Prediction
Estimate association between exposure and outcome adjusted for other covariates Prediction Use an estimated model to predict the outcome given covariates in a new dataset DAGs, bias, precision Predictive power, model fit, R2 Counfounding matter in the first Fit of the models matters in the last Jan-19 H.S.
46
Workflow DAG Bivariate analysis Regression Model fitting
Exposure + Confounders Test of assumptions Independent errors Linear effects (on the log odds scale) No interactions (on the multiplicative scale) Influence Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands 1 January 2019 H.S.
47
Go to syntax 1 January 2019 H.S.
48
The end 1 January 2019 H.S.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.