Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD
idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data Military Service in Vietnam
Command regress ptsd sr, robust Linear regression Number of obs = F( 1, 10794) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] sr | _cons | Self Report sr |
Command regress ptsd mr, robust Self Report sr | Military Record mr | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] mr | _cons | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] mr | _cons |
intercept source indicators source by exposure interaction terms expected outcome Model 1: The General Multiple Source Model Generates same estimates as the k marginal source-specific models Allows testing for a difference in sources
idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data
idpairidPTSDsrmr Command expand 2 idpairidPTSDsrmr
idpairidPTSDsrmr Command expand
Command generate service=0 idpairidPTSDsrmr service
Command by id: replace service = sr if _n==1 idpairidPTSDsrmr service
Command by id: replace service = mr if _n==2 idpairidPTSDsrmr service
Command idpairidPTSD service
Command idpairidPTSDservices1s generate s1 = 0 generate s2 = 0
Command idpairidPTSDservices1s by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2
Command idpairidPTSDservices1s2z1z generate z1 = service * s1 generate z2 = service * s2
Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | Military Record mr | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = Scale parameter: Prob > chi2 = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = Scale parameter: Prob > chi2 = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons |
But wait... these guys are twins! Data within twin pairs might be correlated...
pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: Command svyset id [pweight = sampweight], strata(pairid) pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1:
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1 z2 Self Report sr | Military Record mr | Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members
. test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = Command test z1 = z2 Self Report sr | Military Record mr | test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = We should not combine them. Moral of the story: The two sources contain different information. Or, should we??
intercept source indicators within-pair source by within-pair effect interaction terms between-pair source by between-pair effect interaction terms Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models Allows testing for a difference in reports of within effects & between effects
Command idpairids1z
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==0
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==0
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1barz1diff bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar
Command (Repeat that procedure to make z2bar and z2diff)
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = Adjusted Wald test ( 1) z1bar - z2bar = 0 F( 1, 6172) = Prob > F = test z1bar = z2bar Within-pairestimates don’t differ much Between-pairestimatesdo!! Moral of the story: 1.Combine the within-pair info. 2.Keep between-pair info. separate
intercept source indicators within-pair combined source within-pair effect between-pair source by between-pair effect interaction terms Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources precise estimate Often yields a more precise estimate of the within-pair effect
idpairidz1diffz2diff Command
idpairidz1diffz2diffwservice Command generate wservice = z1diff + z2diff
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 4, 6169) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z1bar | z2bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 4, 6169) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members
Conclusionsfrom VET Registry analysis Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources Model 1 Sources differed in Model 1, so we did not combine them overall Model 2 Within-pair estimates in Model 2 did not differ much by source, so... Model 3 Model 3 combined within-pair estimates
Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16) Conclusionsfrom VET Registry analysis Model 2 Between-pair estimates in Model 2 differed significantly Model 3 Model 3 estimates separate between-pair effects for each source
Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models
Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW 1.Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: Nicholas Horton at Harvard 2.Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:
Thank you for listening