Download presentation
Presentation is loading. Please wait.
Published byJohnathan Bryan Modified over 9 years ago
1
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD
2
idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data Military Service in Vietnam
6
Command regress ptsd sr, robust Linear regression Number of obs = 10796 F( 1, 10794) = 639.43 Prob > F = 0.0000 R-squared = 0.0599 Root MSE =.34613 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sr |.1793066.0070909 25.29 0.000.1654071.193206 _cons | 3.130085.0039722 788.00 0.000 3.122299 3.137871 ------------------------------------------------------------------------------ Self Report sr |.1793066.0070909
7
Command regress ptsd mr, robust Self Report sr |.1793066.0070909 Military Record mr |.152672.0072727 Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE =.34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr |.152672.0072727 20.99 0.000.138416.1669279 _cons | 3.144166.0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------ Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE =.34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr |.152672.0072727 20.99 0.000.138416.1669279 _cons | 3.144166.0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------
8
intercept source indicators source by exposure interaction terms expected outcome Model 1: The General Multiple Source Model Generates same estimates as the k marginal source-specific models Allows testing for a difference in sources
9
idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data
10
idpairidPTSDsrmr 114510 211711 326601 425811 Command expand 2 idpairidPTSDsrmr 114510 211711 326601 425811 114510 211711 326601 425811
11
idpairidPTSDsrmr 114510 211711 326601 425811 Command expand 2 114510 211711 326601 425811
12
Command generate service=0 idpairidPTSDsrmr 114510 211711 326601 425811 114510 211711 326601 425811 service 0 0 0 0 0 0 0 0
13
Command by id: replace service = sr if _n==1 idpairidPTSDsrmr 114510 211711 326601 425811 114510 211711 326601 425811 service 1 0 1 0 0 0 1 0
14
Command by id: replace service = mr if _n==2 idpairidPTSDsrmr 114510 211711 326601 425811 114510 211711 326601 425811 service 1 0 1 1 0 1 1 1 1 0 1 0 0 0 1 0
15
Command idpairidPTSD 1145 2117 3266 4258 1145 2117 3266 4258 service 1 0 1 1 0 1 1 1
16
Command idpairidPTSDservices1s2 1145100 2117100 3266000 4258100 1145000 2117100 3266100 4258100 generate s1 = 0 generate s2 = 0
17
Command idpairidPTSDservices1s2 1145110 2117110 3266010 4258110 1145001 2117101 3266101 4258101 by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2
18
Command idpairidPTSDservices1s2z1z2 114511010 211711010 326601000 425811010 114500100 211710101 326610101 425810101 generate z1 = service * s1 generate z2 = service * s2
19
Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr |.1793066.0070909 Military Record mr |.152672.0072727 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter:.1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson):.1210952 Dispersion =.1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.0016444 -8.56 0.000 -.0173037 -.0108576 z1 |.1793066.0070906 25.29 0.000.1654093.1932038 z2 |.152672.0072724 20.99 0.000.1384183.1669256 _cons | 3.144166.0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------ Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter:.1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson):.1210952 Dispersion =.1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.0016444 -8.56 0.000 -.0173037 -.0108576.1793066.0070906 z1 |.1793066.0070906 25.29 0.000.1654093.1932038.152672.0072724 z2 |.152672.0072724 20.99 0.000.1384183.1669256 _cons | 3.144166.0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------
20
But wait... these guys are twins! Data within twin pairs might be correlated...
22
pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: Command svyset id [pweight = sampweight], strata(pairid) pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1:
23
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1 z2 Self Report sr |.1793066.0070909 Military Record mr |.152672.0072727 Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619.1793066.006818 z1 |.1793066.006818 26.30 0.000.1659408.1926723.152672.0069024 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
24
. test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = 44.89 Prob > chi2 = 0.0000 Command test z1 = z2 Self Report.00618 sr |.1793066.00618 Military Record.0069024 mr |.152672.0069024. test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = 45.66 Prob > chi2 = 0.0000 We should not combine them. Moral of the story: The two sources contain different information. Or, should we??
25
intercept source indicators within-pair source by within-pair effect interaction terms between-pair source by between-pair effect interaction terms Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models Allows testing for a difference in reports of within effects & between effects
26
Command idpairids1z1 1111 2111 1100 2100
27
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar 11111 21111 1100. 2100.
28
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar 11111 21111 11000 21000 bysort pairid: replace z1bar=0 if s1==0
29
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==0 32100.5 4211 32000 42000
30
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar 11111 21111 32100.5 4211 11000 21000 32000 42000 bysort pairid: replace z1bar=0 if s1==0
31
Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1barz1diff 111110 211110 32100.5-0.5 42110.5 110000 210000 320000 420000 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar
32
Command (Repeat that procedure to make z2bar and z2diff)
33
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144.0016726 -10.89 0.000 -.0214933 -.0149355 z1diff |.1669005.0134838 12.38 0.000.1404675.1933335 z1bar |.1857651.0074393 24.97 0.000.1711816.2003487 z2diff |.1618065.0138901 11.65 0.000.134577.189036 z2bar |.1482027.0074941 19.78 0.000.1335116.1628937 _cons | 3.145802.0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
34
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144.0016726 -10.89 0.000 -.0214933 -.0149355 z1diff |.1669005.0134838 12.38 0.000.1404675.1933335 z1bar |.1857651.0074393 24.97 0.000.1711816.2003487 z2diff |.1618065.0138901 11.65 0.000.134577.189036 z2bar |.1482027.0074941 19.78 0.000.1335116.1628937 _cons | 3.145802.0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
35
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144.0016726 -10.89 0.000 -.0214933 -.0149355 z1diff |.1669005.0134838 12.38 0.000.1404675.1933335 z1bar |.1857651.0074393 24.97 0.000.1711816.2003487 z2diff |.1618065.0138901 11.65 0.000.134577.189036 z2bar |.1482027.0074941 19.78 0.000.1335116.1628937 _cons | 3.145802.0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
36
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144.0016726 -10.89 0.000 -.0214933 -.0149355 z1diff |.1669005.0134838 12.38 0.000.1404675.1933335 z1bar |.1857651.0074393 24.97 0.000.1711816.2003487 z2diff |.1618065.0138901 11.65 0.000.134577.189036 z2bar |.1482027.0074941 19.78 0.000.1335116.1628937 _cons | 3.145802.0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
37
Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144.0016726 -10.89 0.000 -.0214933 -.0149355 z1diff |.1669005.0134838 12.38 0.000.1404675.1933335 z1bar |.1857651.0074393 24.97 0.000.1711816.2003487 z2diff |.1618065.0138901 11.65 0.000.134577.189036 z2bar |.1482027.0074941 19.78 0.000.1335116.1628937 _cons | 3.145802.0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 Adjusted Wald test ( 1) z1bar - z2bar = 0 F( 1, 6172) = 83.66 Prob > F = 0.0000 test z1bar = z2bar Within-pairestimates don’t differ much Between-pairestimatesdo!! Moral of the story: 1.Combine the within-pair info. 2.Keep between-pair info. separate
38
intercept source indicators within-pair combined source within-pair effect between-pair source by between-pair effect interaction terms Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources precise estimate Often yields a more precise estimate of the within-pair effect
39
idpairidz1diffz2diff 1100 2100 32-0.50 420.50 110-0.5 2100.5 3200 4200 Command
40
idpairidz1diffz2diffwservice 11000 21000 32-0.50 420.50 110-0.5 2100.5 32000 42000 Command generate wservice = z1diff + z2diff
41
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138.0016722 -10.89 0.000 -.0214919 -.0149358 wservice |.1644434.0129988 12.65 0.000.1389611.1899256 z1bar |.1857654.0074392 24.97 0.000.1711819.2003489 z1bar |.1857654.0074392 24.97 0.000.1711819.2003489 z2bar |.1482022.0074941 19.78 0.000.1335111.1628933 z2bar |.1482022.0074941 19.78 0.000.1335111.1628933 _cons | 3.145802.0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
42
Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807.001642 -8.58 0.000 -.0172995 -.0108619 z1 |.1793066.006818 26.30 0.000.1659408.1926723 z2 |.152672.0069024 22.12 0.000.1391409.166203 _cons | 3.144166.0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138.0016722 -10.89 0.000 -.0214919 -.0149358 wservice |.1644434.0129988 12.65 0.000.1389611.1899256 z1bar |.1857654.0074392 24.97 0.000.1711819.2003489 z2bar |.1482022.0074941 19.78 0.000.1335111.1628933 _cons | 3.145802.0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
45
Conclusionsfrom VET Registry analysis Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources Model 1 Sources differed in Model 1, so we did not combine them overall Model 2 Within-pair estimates in Model 2 did not differ much by source, so... Model 3 Model 3 combined within-pair estimates
46
Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16) Conclusionsfrom VET Registry analysis Model 2 Between-pair estimates in Model 2 differed significantly Model 3 Model 3 estimates separate between-pair effects for each source
47
Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models
48
Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW 1.Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: 163-173. Nicholas Horton at Harvard 2.Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:2911-2933.
49
Thank you for listening
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.