Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to analysis DAGitty

Similar presentations


Presentation on theme: "Introduction to analysis DAGitty"— Presentation transcript:

1 Introduction to analysis DAGitty
Hein Stigum Presentation, data and programs at: courses         Tid Samlet     09:00 09:45 Continuous symmetrical :45   10:00 10:45 (Skewed) and categorical+ intro to groups 00:45 01:30   Bivariate 11:00 11:45   Groups :45     11:45 12:30 Lunch :45   12:30 13:15   Groups cont :45 01:30   Groups 13:30 14:15 Plenary :45   14:30 16:00   Regression :30 01:30   Regression Nov-18 H.S.

2 Agenda Outcome variable decides analysis
Factors that influence analysis Concepts DAGitty Bivariate analysis Continuous symmetrical Continuous skewed Categorical Multivariable analysis Linear regression Logistic regression Here: emphasis on bivar Real world: emphasis on multivariable Need course in linear and logistic Outcome variable decides analysis Nov-18 H.S.

3 Factors that influence analysis
Nov-18 H.S.

4 Factors that influence analysis
Sampling Simple random simple analysis Stratified/clustered weighted analysis Design Cross-section - Cohort survival analysis Trad. Case-Control logistic regression Outcome variable type Decides analysis type Nov-18 H.S.

5 Sampling Area 1 100.000 Population Area 2 10.000 N Random sample N=200
prevalence? Stratified sample N=100, 100 100 100 Disease 5% 10% average=7.5% Stata: survey commands ,prefix svy: SPSS: ? Sampling p 100/ 100/10.000 Weights=1/p 1000 100 1000∗5%+100∗10% Weighted average =5.5% Nov-18 H.S.

6 Sampling and analysis C E D Stratified sampling
Use weighted analysis (Stata: svy, SPSS:?) Can ignore sampling if: Only report prevalence by strata, not overall Prevalences by strata are similar Regression: The strata act as risk factors or confounders for disease, can adjust for stratum C stratum E D Nov-18 H.S.

7 Risk, Rate and Odds Risk Rate Odds probability, proportion, %
Km/h, cases/person-time 2 math quatities prop: fraction, numerator (top) is part of denominator (bottom) ex: colds in class=2/30=7% , no dimension, 0-1 rate: change in one quantity per change in another (time), ex: speed, drive 100 km in 2 h then average speed is 50 km/h, dimension, no upper bound Odds: disease per healthy person Statistical concept: risk: probability, no dimension, ex: flip coin Nov-18 Nov-18 H.S. H.S. 7 7 7

8 Cohorts Closed cohort Open cohort Count persons, risk
Count person-time, rate Closed cohort with time varying covariates Line= follow up Circle=event (disease) Red line=exposed Count person-time, rate Nov-18 H.S.

9 Design and analysis Cross-section Cohort Traditional Case-Control -
Closed cohort risk Log-risk (or logistic) Open cohort rate Cox-regression Traditional Case-Control Unmatched odds Logistic regression Matched odds Conditional Logistic Nov-18 H.S.

10 Datatypes Outcome variable decides analysis Categorical data
Nominal: married/ single/ divorced Ordinal: small/ medium/ large binary Numerical data Discrete: number of children Continuous: weight Coding 1, 2, 3, is 2 twice as much as 1 1. Set of methods for categorical data proportion married 1. Set of methods for numerical data average weight Outcome variable decides analysis Nov-18 H.S.

11 Outcome data type dictates type of analysis
Start with continuous data Nov-18 H.S.

12 Data- and regression-types
Numerical data Continuous (weight) Linear regression Count (partners) Poisson regression Categorical data Binary (death) Risk Linear-risk-, log-risk model Rate Cox, Poisson Odds logistic Ordinal (small, med, large) ordered logistic Multinomial (mar. status) multinomial logistic 11/18/2018 H.S.

13 Why Stata Pro Con Price Aimed at epidemiology Many methods, growing
Graphics Structured, Programmable Coming soon to a course near you Con Memory>file size Used by leading univ, and at many summer schools Copy tables esttab Nov-18 Nov-18 H.S. H.S. 13

14 Bias and precision Concepts Nov-18 H.S.

15 Precision and bias Measures of populations precision random error
bias systematic error True value Estimate Precision Bias Ignore errors at individual level measure pop, measure sample aim: measure as precisely and as correctly as possible Nov-18 H.S.

16 P-values and confidence intervals
Precision Nov-18 H.S.

17 Precision: Estimation
Population Sample Estimate with confidence interval Tilfeldig utvalg Bidrag til usikkerhet: stor spredning, lite utvalg ( | ) 95% confidence interval: 95% of repeated intervals will contain the true value Nov-18 H.S.

18 Precision: Testing Population Sample |
group 1 group 2 Two groups that we want to compare So much for theory, practical analysis p-value=P(observing this difference or more, when the true difference is zero) Nov-18 H.S.

19 Precision: Significance level
Birth weight, 500 newborn, observe difference H0: boys=girls 10 gr p=0.90 50 gr p=0.40 100 gr p=0.10 130 gr p=0.04 150 gr p=0.02 Ha: boys≠girls p<0.05 Significance level Halvparten gutter, standard avvik=700 gr i begge grupper Nov-18 H.S.

20 Precision: Test situations
1 sample test Weight =10 2 independent samples Weight by sex K independent samples Weight by age groups 2 dependent samples Weight last year = Weight today Analytic Nov-18 H.S.

21 DAGitty Causal graphs Nov-18 H.S.

22 Bias: DAGs E D C2 C1 Associations Bivariate (unadjusted)
gest age D birth weight C2 parity C1 sex Associations Bivariate (unadjusted) Causal effects Multivariable (adjusted) Reasons for doing bivar: closer to data, find category codings, missing, empty cells, … Reasons for reporting bivar: ? Draw your assumptions before your conclusions Nov-18 H.S.

23 DAGitty Free program to draw and analyze DAGS Nov-18 Nov-18 Nov-18
1h presentation+exercises or Just exersises (add more) DAGitty Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 23 23 23

24 DAGitty background DAGitty Web page Draw DAGs Analyze DAGs Test DAGs
Run or download Johannes Textor, Theoretical Biology & Bioinformatics group, University of Utrecht Nov-18 HS

25 Interface Nov-18 HS

26 Draw model Draw new model New variables, connect, rename
Model>New model, Exposure, Outcome New variables, connect, rename n new variable (or double click) c connect (hit c over V1 and over V2 to connect) r rename d delete Status (toggle on/off) e exposure o outcome u unobserved a adjusted Draw Viatmin->Birth defects example Draw E and D New Age and Obesity Connect a-adjust, u-unobserved Nov-18 HS

27 Export DAG Export to Word or PowerPoint
“Zoom” the DAGitty drawing first (Ctrl-roll) Use “Snipping tool” or use Model>Export as PDF Without zooming With zooming Nov-18 HS

28 Model code Variable x y Age 1 @ 0.151, 0.840
Birth%20defects O @ 0.468, Obesity 1 @ 0.470, Vitamin E @ 0.145, x Arrow list Age Obesity Vitamin Obesity Birth%20defects Vitamin Birth%20defects y May change the x and y values to align the variables Nov-18 HS

29 Changed model code Aligning x and y coordinates (no space after ,)
Age 1 @0.1,0.8 Birth%20defects O @0.5,1.0 Obesity 1 @0.5,0.8 Vitamin E @0.1,1.0 Age Obesity Vitamin Obesity Birth%20defects Vitamin Birth%20defects 0.1 0.5 x 0.8 1.0 Copy, paste and Update DAG y Nov-18 HS

30 Exercises Nov-18 HS

31 Exercise Draw the Vitamin-Birth defects DAG
Use Obesity as an observed variable. Interpret the “Causal effect identification” Interpret the “Testable implications” Add arrow from Age to Birth defects Make obesity an unobserved variable Nov-18 HS

32 Excercise Draw the Statin-CHD DAG Use Lifestyle as an unobserved variable. Interpret the “Causal effect ident.” for total effects Interpret the “Causal effect ident.” for direct effects Interpret the “Testable implications” C cholesterol U lifestyle E statin D CHD Nov-18 HS

33 Effects of adjustment Nov-18 H.S.

34 Effects of adjustment A C B E D What variables should we adjust for?
What are the effects of adjustment? E D Variable Adjust Bias Precision A B C no bias amplification reduce precision (collinearity) maybe no improve precision (model dependent) yes remove confounding ? (Pearl 2011)

35 Effects of adjustment: Precision
B Should we adjust for B? DAG: no bias from B, need not adjust E D May include B to improve precision, depends on model! E->D=1 in linear regr C->D=2 in linear regr D2=10% in logistic, crude E->D=1.17, adjusted E->D=1.43, no E-C interaction Including B: better precision Including B: worse precision OR not collapsible Robinson and Jewell 1991; Xing and Xing 2010 Nov-18 H.S.

36 Why plot data? Nov-18 H.S.

37 Problem example Lunch meals per week
Table of means (around 5 per week) Linear regression Nov-18 H.S.

38 Problem example 2 Iron level by sex
Both linear and logistic regression Opposite results (boys: higher mean, but more deficiency) Iron level in blood Nov-18 H.S.

39 Bivariate analysis 1 Continuous symmetric outcome: Birth weight Nov-18
H.S.

40 Distribution drop if weight<2000 kdensity weight kdensity weight
Nov-18 H.S.

41 Central tendency and dispersion
Mean and standard deviation: Mean with confidence interval: Std Dev for Data Std Err for Estimate *550=2500 3600+2*550=4700 Nov-18 H.S.

42 Compare groups, equal variance?
Not equal Compare boys and girls Focus on means or focus on low tail gives opposite results!! Nov-18 H.S.

43 2 independent samples Are birth weights the same for boys and girls?
Density plot Scatterplot Scatter to see linear/no-linear effect, look for outliers Density to see equal variance Nov-18 H.S.

44 2 independent samples test
ttest weight, by(sex) unequal unequal variances ttest var1==var2 paired test Nov-18 H.S.

45 K independent samples Is birth weight the same over parity?
Density plot Scatterplot Scatter to see linear/no-linear effect, look for outliers Density to see equal variance (possibly a bit larger variance in the 2-7 (red) group) Nov-18 H.S.

46 K independent samples test
equal means? Equal variances? Nov-18 H.S.

47 Continuous by continuous
Does birth weight depend on gestational age? Scatterplot Scatterplot, outlier dropped Nov-18 H.S.

48 Continuous by continuous tests
Cut gestational age up in groups, then use T-test or ANOVA or Use linear regression with 1 covariate Nov-18 H.S.

49 Test situations 1 sample test 2 independent samples
ttest weight =10 2 independent samples test weight, by(sex) K independent samples oneway weight parity 2 dependent samples (Paired) ttest weight_last_year == weight_today 1: ttest weight=10 4: ttest weight0=weight1 (assumes paired test) Nov-18 H.S.

50 Continuous skewed outcome: Number of sexual partners
Bivariate analysis 2 Nov-18 H.S.

51 Distribution kdensity partners if partners<=50 Nov-18 H.S.
Lower 75% fractile here than on next page because partner>50 are dropped here Nov-18 H.S.

52 Central tendency and dispersion
Median and percentiles: cci binomial exact; conservative confidence interval normal normal, based on observed centiles meansd normal, based on mean and standard deviation Nov-18 H.S.

53 2 independent samples Do males and females have the same number of partners? Scatterplot Density plot Scatter to see linear/no-linear effect, look for outliers Density to see equal variance Unequal variance! Test somewhat problematic N=400, quite skewed, probably need nonparametric tests. If N is large, and/or not very skewed, the means will still follow a normal distribution and standard (parametric) test will do Nov-18 H.S.

54 2 independent samples test
equal medians? Nov-18 H.S.

55 K independent samples Do partners vary with age? Scatterplot
Density plot Scatter to see linear/no-linear effect, look for outliers Problems with unequal variance Density difficult to read, no apparent differences Nov-18 H.S.

56 K independent samples test
equal medians? Probably a cohort effect rather than an age effect Nov-18 H.S.

57 Table of descriptives Nov-18 H.S.

58 Table of tests Remarks: If unequal variance in ANOVA:
Use linear regression with robust variance estimation If N is large: may use parametric tests Categorical ordered: use nonparametric tests Mann-Whithey U=Wilcoxon rank sum Nov-18 H.S.

59 Categorical outcome: Being bullied
Bivariate analysis 3 Nov-18 H.S.

60 Frequency and proportion
Proportion with CI: Proportion: May standardize, adjust for clusters, use bootstrap or jacknife est May weigth if stratified sample Nov-18 H.S.

61 Proportion, confidence interval
x=”disease” n=total number proportion: standard error: confidence interval: How much increase n to get half the standard error? Nov-18 H.S.

62 Crosstables Are boys bullied as much as girls? equal proportions?
Nov-18 H.S.

63 Ordered categories, trend
equal proportions? Nov-18 H.S.

64 Table of tests Remarks: If unequal variance in ANOVA:
Use linear regression with robust variance estimation If N is large: may use parametric tests Categorical ordered: use nonparametric tests Mann-Whithey U=Wilcoxon rank sum Nov-18 H.S.

65 Multivariable analysis 1
Continuous outcome: Linear regression, Birth weight Multivariable analysis 1 Nov-18 H.S.

66 Regression idea Nov-18 H.S. X=gest, Y=weigth
straigth line: y=constant +slope times x beta0=constant or intercept, beta1=reg. coef. Or slope, y=dep. continous, x=.. Nov-18 H.S.

67 Model and assumptions Model Association measure Assumptions Robustness
1 = increase in y for one unit increase in x1 Assumptions Independent errors Linear effects Constant error variance Robustness influence In GLM b-s and betas are coefficients In some fields (and in SPSS) b-s are coefs and betas are standardizes coefs Nov-18 H.S.

68 Workflow DAG Scatterplots Bivariate analysis Regression
Model estimation Test of assumptions Independent errors Linear effects Constant error variance Robustness Influence E gest age D birth weight C2 parity C1 sex Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Nov-18 H.S.

69 Categorical covariates
2 categories OK 3+ categories Use “dummies” “Dummies” are 0/1 variables used to create contrasts Want 3 categories for parity: 0, 1 and 2-7 children Choose 0 as reference Make dummies for the two other categories generate Parity1 = (parity==1) if parity<. generate Parity2_7 = (parity>=2) if parity<. Nov-18 H.S.

70 Create meaningful constant
Expected birth weight at: gest= 0, sex=0, parity=0, not meaningful gest=280, sex=1, parity=0 Sex is coded 1=boys and 2=girls

71 Model estimation Nov-18 H.S.

72 Test of assumptions Plot residuals versus predicted y
Independent residuals? Linear effects? constant variance? Dependent residuals if many children from same mother Or Nov-18 H.S.

73 Violations of assumptions
Dependent residuals Use mixed models or GEE Non linear effects Add square term Non-constant variance Use robust variance estimation Linear mixed models: xtmixed Nov-18 H.S.

74 Influence Beta changes from 6 to 16 when removing influential outlier Nov-18 H.S.

75 Measures of influence Measure change in: Predicted outcome Deviance
Remove obs 1, see change remove obs 2, see change Measure change in: Predicted outcome Deviance Coefficients (beta) Delta beta Nov-18 H.S.

76 Delta beta for gestational age
If obs nr 539 is removed, beta will change from 6 to 16 Nov-18 H.S.

77 Removing outlier Full model Outlier removed
One outlier affected two estimates Final model Nov-18 H.S.

78 Multivariable analysis 2
Binary outcome: Logistic regression, Being bullied Multivariable analysis 2 Nov-18 H.S.

79 Ordered categories and model
Interval versus ordered scale: Interval scale Ordered scale 1 2 3 low medium high Categories Regression model 2 Logistic 3-7 Ordinal logistic >7 Linear (treat as interval) Nov-18 H.S.

80 Logistic model and assumptions
Association measure Odds ratio in y for 1 unit increase in x1 Assumptions Independent errors Linear effects on the log odds scale Robustness influence Nov-18 H.S.

81 Being bullied We want the total effect of country on being bullied.
The risk of being bullied depends on age and sex. The age and sex distribution may differ between countries. Should we adjust for age and sex? C1 age E country D bullied C2 sex Noncausal open=biasing path No, age and sex are mediating variables Nov-18 Nov-18 Nov-18 H.S. H.S. 81 81 81

82 Logistic: being bullied
Roughly: Same risk of being bullied in Island as in Sweden. 2 times the risk in Norway as in Sweden. 3 times the risk in Finnland Prevalence of being bullied=17% OR RR ORRR if outcome is rare OR>RR (further from 1) if the outcome is common Nov-18 H.S.

83 Summing up DAGs Plots Bivariate analysis Multivariable analysis
State prior knowledge. Guide analysis Plots Linearity, variance, outliers Bivariate analysis Continuous symmetrical Mean, T-test, anova Continuous skewed Median, nonparametric Categorical Freq, cross, chi-square Multivariable analysis Continuous Linear regression Binary Logistic regression Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Nov-18 H.S.


Download ppt "Introduction to analysis DAGitty"

Similar presentations


Ads by Google