Download presentation
Presentation is loading. Please wait.
Published byAdam Curtis Modified over 9 years ago
1
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/
2
2 July 2015H.S.2 Agenda Linear regression GLM Logistic regression Binary regression (Conditional logistic)
3
2 July 2015H.S.3 Linear regression Birth weight by gestational age
4
2 July 2015H.S.4 Regression idea
5
2 July 2015H.S.5 Model and assumptions Model Assumptions –Independent errors –Linear effects –Constant error variance
6
2 July 2015H.S.6 Association measure: RD Model: Start with: Hence:
7
2 July 2015H.S.7 Purpose of regression Estimation –Estimate association between outcome and exposure adjusted for other covariates Prediction –Use an estimated model to predict the outcome given covariates in a new dataset
8
2 July 2015H.S.8 Adjusting for confounders Not adjust –Cofactor is a collider –Cofactor is in causal path May or may not adjust –Cofactor has missing –Cofactor has error
9
2 July 2015H.S.9 Workflow Scatterplots Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects Constant error variance –Influence (robustness)
10
2 July 2015H.S.10 Scatterplot
11
2 July 2015H.S.11 Syntax Estimation –regress y x1 x2linear regression –xi: regress y x1 i.c1categorical c1 Post estimation –predict yf, xbpredict Manage models –estimates store m1save model
12
2 July 2015H.S.12 Model 1: outcome+exposure
13
2 July 2015H.S.13 Model 2: Add counfounders Estimate association: m1=m2 Prediction: m2 is best
14
”Dummies” 2 July 2015H.S.14 Assume educ is coded 1, 2, 3 for low, medium and high education Choose low educ as reference Make dummies for the two other categories: generate medium=(educ==2) if educ<. generate high =(educ==3) if educ<.
15
2 July 2015H.S.15 Interaction Model: Start with: Hence:
16
2 July 2015H.S.16 Model 3: with interaction
17
2 July 2015H.S.17 Test of assumptions Predict y and residuals –predict y, xb –predict res, resid Plot resid vs y –independent? –linear? –const. var? twoway (scatter res y )(qfitci res y)
18
2 July 2015H.S.18 Violations of assumptions Dependent residuals Mixed models: xtmixed Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex Non-constant variance regress weigth gest sex, robust
19
2 July 2015H.S.19 Measures of influence Measure change in: –Outcome (y) –Deviance –Coefficients (beta) Delta beta, Cook’s distance Remove obs 1, see change remove obs 2, see change
20
2 July 2015H.S.20 Points with high influence lvr2plot, mlabel(id)
21
Added variable plot: gestational age 2 July 2015H.S.21 avplot gest, mlabel(id)
22
2 July 2015H.S.22 Removing outlier
23
2 July 2015H.S.23 Influence
24
2 July 2015H.S.24 Final model sum gest/* find smallest value */ generate gest2=gest-204/* smallest gest=204 */ generate sex2=sex-1/* boys=0, girls=1 */ regress weight gest2 sex2/* final model */ estimates store m4 Give meaning to constant term:
25
2 July 2015H.S.25 Logistic regression Being bullied
26
2 July 2015H.S.26 Model and assumptions Model Assumptions –Independent residuals –Linear effects
27
2 July 2015H.S.27 Association measure, Odds ratio Model: Start with: Hence:
28
2 July 2015H.S.28 Syntax Estimation –logistic y x1 x2logistic regression –xi: logistic y x1 i.c1categorical c1 Post estimation –predict yf, prpredict probability Manage models –estimates store m1save model –est table m1, eformshow OR
29
2 July 2015H.S.29 Workflow Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects –Influence (robustness)
30
2 July 2015H.S.30 Bivariate Generate dummies gen Island=(country==2) if country<. gen Norway=(country==3) gen Finland=(country==4) gen Denmark=(country==5)
31
2 July 2015H.S.31 Model 1: outcome and exposure xi:logistic bullied i.countryuse xi: i.var for categorical variables xi:logistic bullied i.country, coefcoefs instead of OR's xi:logistic bullied i.country if sex!=. & age!=.do if sex and age not missing Alternative commands:
32
2 July 2015H.S.32 Model 2: Add confounders Estimate associations: m1=m2 Predict:m2 best
33
2 July 2015H.S.33 Interaction Model: Start with: Hence:
34
2 July 2015H.S.34 Model 3: interaction
35
2 July 2015H.S.35 Test of assumptions Linear effects (of age) –findit linchecksearch and install –lincheck xi:logistic bullied age I.country sex
36
2 July 2015H.S.36 Points with high influence estimates restore m2restore best model predict p, pprobability (mu in our notation) predict db, dbdelta-beta (one value, not one per estimate) scatter db pdelta-beta plot
37
2 July 2015H.S.37 Removing 2 observations Conclusion: Robust results
38
2 July 2015H.S.38 Generalized Linear Models Being bullied
39
Designs and measures 2 July 2015H.S.39 Models Measures GLMRR, RD, OR Survival Rate Ratio
40
Jul-15H.S.40 Generalized Linear Models, GLM Linear regression Logistic regression Poisson regression
41
Jul-15H.S.41 GLM: Distribution and link Distribution family –Given by data –Influence p-value, CI Link function –May chose –Shape (=link -1 ) –Scale –Association measure NormalBinomialPoisson IdentityLogitLog AdditiveMulti. RDORRR
42
Jul-15H.S.42 Distribution and link examples Link: Identity linear model additive scale OBS: not for traditional case control data
43
Jul-15H.S.43 Being bullied, 3 models glm bullied Island Norway Finland Denmark sex age, family(binomial) link(logit) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(log) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(identity)
44
2 July 2015H.S.44 Convergence problems If glm does not converge, use: –poisson y x1 x2, irr robustRR –regress y x1 x2, robustRD Stop
45
2 July 2015H.S.45 Association measure, RR Model: Start with: Hence:
46
2 July 2015H.S.46 Association measure: RD Model: Start with: Hence:
47
2 July 2015H.S.47 The importance of scale Additive scale Absolute increase Females: 30-20=10 Males: 20-10=10 Conclusion: Same increase for males and females RD Multiplicative scale Relative increase Females: 30/20=1.5 Males: 20/10=2.0 Conclusion: More increase for males RR
48
2 July 2015H.S.48 Conditional logistic regression For Matched Case Control data
49
2 July 2015H.S.49 Truths and Misconceptions Cohort studies –Exposed and unexposed should be as similar as possible, except for exposure –Matching removes confounding Case-Control studies –Cases and controls should be as similar as possible, except for disease –Matching removes confounding Exposed Unexposed Diseased/Cases Healthy/Controls
50
2 July 2015H.S.50 Matching and analysis Unmatched (age) –Ordinary model –May adjust for age –May interpret age effect Frequency matched (age) –Ordinary model –Must adjust for age –Can not interpret age effect One-one matched (age) –Conditional model –No effect measure for age
51
2 July 2015H.S.51 Data preparation Save as tab-delimited in Excel Read and fix in Stata –insheet using ”file.txt", clear –mvdecode m*,mv(9) –gsort id -cc
52
2 July 2015H.S.52 Syntax Estimation clogit y x1 x2, group(id)conditional logistic clogit y x1 x2, group(id) orOR instead of coef Post estimation predict yf, pc1predict probability Manage models estimates store m1save model
53
2 July 2015H.S.53 Bivariate analysis Loop thru all variables foreach var of varlist m* { quietly: clogit cc `var', group(id) or est store `var' } Show results
54
2 July 2015H.S.54 Multivariable analysis Stepwise stepwise, pe(0.25): clogit cc m2 m4 m5 m12 m13 m18, group(id) or Final model:
55
2 July 2015H.S.55 Stata regression commands
56
2 July 2015H.S.56 Regression with simple error structure –regresslinear regression (also heteroschedastic errors) –nlnon linear least squares GLM –logisticlogistic regression –poissonPoisson regression –binregbinary outcome, OR, RR, or RD effect measures Conditional logistc –clogitfor matched case-control data Multiple outcome –mlogitmultinomial logit (not ordered) –ologitordered logit Regression with complex error structure –xtmixedlinear mixed models –xtlogitrandom effect logistic
57
2 July 2015H.S.57 Estimation –regress y x1 x2linear regression –logistic y x1 x2logistic regression –xi:regress y x1 i.x2categorical x2 Manage results –estimates store m1store results –estimates table m1 m2table of results –estimates stats m1 m2statistics of results Post estimation –predict y, xblinear prediction –predict res, residresiduals –lincom b0+2*b3linear combination Help –help logistic postestimation Syntax
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.