Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

Similar presentations


Presentation on theme: "Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:"— Presentation transcript:

1 Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/

2 2 July 2015H.S.2 Agenda Linear regression GLM Logistic regression Binary regression (Conditional logistic)

3 2 July 2015H.S.3 Linear regression Birth weight by gestational age

4 2 July 2015H.S.4 Regression idea

5 2 July 2015H.S.5 Model and assumptions Model Assumptions –Independent errors –Linear effects –Constant error variance

6 2 July 2015H.S.6 Association measure: RD Model: Start with: Hence:

7 2 July 2015H.S.7 Purpose of regression Estimation –Estimate association between outcome and exposure adjusted for other covariates Prediction –Use an estimated model to predict the outcome given covariates in a new dataset

8 2 July 2015H.S.8 Adjusting for confounders Not adjust –Cofactor is a collider –Cofactor is in causal path May or may not adjust –Cofactor has missing –Cofactor has error

9 2 July 2015H.S.9 Workflow Scatterplots Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects Constant error variance –Influence (robustness)

10 2 July 2015H.S.10 Scatterplot

11 2 July 2015H.S.11 Syntax Estimation –regress y x1 x2linear regression –xi: regress y x1 i.c1categorical c1 Post estimation –predict yf, xbpredict Manage models –estimates store m1save model

12 2 July 2015H.S.12 Model 1: outcome+exposure

13 2 July 2015H.S.13 Model 2: Add counfounders Estimate association: m1=m2 Prediction: m2 is best

14 ”Dummies” 2 July 2015H.S.14 Assume educ is coded 1, 2, 3 for low, medium and high education Choose low educ as reference Make dummies for the two other categories: generate medium=(educ==2) if educ<. generate high =(educ==3) if educ<.

15 2 July 2015H.S.15 Interaction Model: Start with: Hence:

16 2 July 2015H.S.16 Model 3: with interaction

17 2 July 2015H.S.17 Test of assumptions Predict y and residuals –predict y, xb –predict res, resid Plot resid vs y –independent? –linear? –const. var? twoway (scatter res y )(qfitci res y)

18 2 July 2015H.S.18 Violations of assumptions Dependent residuals Mixed models: xtmixed Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex Non-constant variance regress weigth gest sex, robust

19 2 July 2015H.S.19 Measures of influence Measure change in: –Outcome (y) –Deviance –Coefficients (beta) Delta beta, Cook’s distance Remove obs 1, see change remove obs 2, see change

20 2 July 2015H.S.20 Points with high influence lvr2plot, mlabel(id)

21 Added variable plot: gestational age 2 July 2015H.S.21 avplot gest, mlabel(id)

22 2 July 2015H.S.22 Removing outlier

23 2 July 2015H.S.23 Influence

24 2 July 2015H.S.24 Final model sum gest/* find smallest value */ generate gest2=gest-204/* smallest gest=204 */ generate sex2=sex-1/* boys=0, girls=1 */ regress weight gest2 sex2/* final model */ estimates store m4 Give meaning to constant term:

25 2 July 2015H.S.25 Logistic regression Being bullied

26 2 July 2015H.S.26 Model and assumptions Model Assumptions –Independent residuals –Linear effects

27 2 July 2015H.S.27 Association measure, Odds ratio Model: Start with: Hence:

28 2 July 2015H.S.28 Syntax Estimation –logistic y x1 x2logistic regression –xi: logistic y x1 i.c1categorical c1 Post estimation –predict yf, prpredict probability Manage models –estimates store m1save model –est table m1, eformshow OR

29 2 July 2015H.S.29 Workflow Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects –Influence (robustness)

30 2 July 2015H.S.30 Bivariate Generate dummies gen Island=(country==2) if country<. gen Norway=(country==3) gen Finland=(country==4) gen Denmark=(country==5)

31 2 July 2015H.S.31 Model 1: outcome and exposure xi:logistic bullied i.countryuse xi: i.var for categorical variables xi:logistic bullied i.country, coefcoefs instead of OR's xi:logistic bullied i.country if sex!=. & age!=.do if sex and age not missing Alternative commands:

32 2 July 2015H.S.32 Model 2: Add confounders Estimate associations: m1=m2 Predict:m2 best

33 2 July 2015H.S.33 Interaction Model: Start with: Hence:

34 2 July 2015H.S.34 Model 3: interaction

35 2 July 2015H.S.35 Test of assumptions Linear effects (of age) –findit linchecksearch and install –lincheck xi:logistic bullied age I.country sex

36 2 July 2015H.S.36 Points with high influence estimates restore m2restore best model predict p, pprobability (mu in our notation) predict db, dbdelta-beta (one value, not one per estimate) scatter db pdelta-beta plot

37 2 July 2015H.S.37 Removing 2 observations Conclusion: Robust results

38 2 July 2015H.S.38 Generalized Linear Models Being bullied

39 Designs and measures 2 July 2015H.S.39 Models Measures GLMRR, RD, OR Survival Rate Ratio

40 Jul-15H.S.40 Generalized Linear Models, GLM Linear regression Logistic regression Poisson regression

41 Jul-15H.S.41 GLM: Distribution and link Distribution family –Given by data –Influence p-value, CI Link function –May chose –Shape (=link -1 ) –Scale –Association measure NormalBinomialPoisson IdentityLogitLog AdditiveMulti. RDORRR

42 Jul-15H.S.42 Distribution and link examples Link: Identity  linear model  additive scale OBS: not for traditional case control data

43 Jul-15H.S.43 Being bullied, 3 models glm bullied Island Norway Finland Denmark sex age, family(binomial) link(logit) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(log) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(identity)

44 2 July 2015H.S.44 Convergence problems If glm does not converge, use: –poisson y x1 x2, irr robustRR –regress y x1 x2, robustRD Stop

45 2 July 2015H.S.45 Association measure, RR Model: Start with: Hence:

46 2 July 2015H.S.46 Association measure: RD Model: Start with: Hence:

47 2 July 2015H.S.47 The importance of scale Additive scale Absolute increase Females: 30-20=10 Males: 20-10=10 Conclusion: Same increase for males and females RD Multiplicative scale Relative increase Females: 30/20=1.5 Males: 20/10=2.0 Conclusion: More increase for males RR

48 2 July 2015H.S.48 Conditional logistic regression For Matched Case Control data

49 2 July 2015H.S.49 Truths and Misconceptions Cohort studies –Exposed and unexposed should be as similar as possible, except for exposure –Matching removes confounding Case-Control studies –Cases and controls should be as similar as possible, except for disease –Matching removes confounding Exposed Unexposed Diseased/Cases Healthy/Controls

50 2 July 2015H.S.50 Matching and analysis Unmatched (age) –Ordinary model –May adjust for age –May interpret age effect Frequency matched (age) –Ordinary model –Must adjust for age –Can not interpret age effect One-one matched (age) –Conditional model –No effect measure for age

51 2 July 2015H.S.51 Data preparation Save as tab-delimited in Excel Read and fix in Stata –insheet using ”file.txt", clear –mvdecode m*,mv(9) –gsort id -cc

52 2 July 2015H.S.52 Syntax Estimation clogit y x1 x2, group(id)conditional logistic clogit y x1 x2, group(id) orOR instead of coef Post estimation predict yf, pc1predict probability Manage models estimates store m1save model

53 2 July 2015H.S.53 Bivariate analysis Loop thru all variables foreach var of varlist m* { quietly: clogit cc `var', group(id) or est store `var' } Show results

54 2 July 2015H.S.54 Multivariable analysis Stepwise stepwise, pe(0.25): clogit cc m2 m4 m5 m12 m13 m18, group(id) or Final model:

55 2 July 2015H.S.55 Stata regression commands

56 2 July 2015H.S.56 Regression with simple error structure –regresslinear regression (also heteroschedastic errors) –nlnon linear least squares GLM –logisticlogistic regression –poissonPoisson regression –binregbinary outcome, OR, RR, or RD effect measures Conditional logistc –clogitfor matched case-control data Multiple outcome –mlogitmultinomial logit (not ordered) –ologitordered logit Regression with complex error structure –xtmixedlinear mixed models –xtlogitrandom effect logistic

57 2 July 2015H.S.57 Estimation –regress y x1 x2linear regression –logistic y x1 x2logistic regression –xi:regress y x1 i.x2categorical x2 Manage results –estimates store m1store results –estimates table m1 m2table of results –estimates stats m1 m2statistics of results Post estimation –predict y, xblinear prediction –predict res, residresiduals –lincom b0+2*b3linear combination Help –help logistic postestimation Syntax


Download ppt "Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:"

Similar presentations


Ads by Google