Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Addressing Alternative Explanations: Multiple Regression Spring 2007.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Returning to Consumption
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Repeated Measures, Part I April, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Diff-inDiff Are exports from i to j, the same as imports in i from j? Should they be?. gen test=xij-mji (14 missing values generated). sum test,
Md. Waheed Alam Transparency International Bangladesh 26 August Quantitative Research Techniques and Tools.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
From t-test to multilevel analyses Del-2
Violation of USN No Deco
Lecture 18 Matched Case Control Studies
From t-test to multilevel analyses (Linear regression, GLM, …)
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
Introduction to Logistic Regression
Auto Accidents: What’s responsible?
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Common Statistical Analyses Theory behind them
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Presentation transcript:

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD

Military Service in Vietnam Multiple Informant Data Military Service in Vietnam id pairid PTSD self report military rec. 1 45 yes no 2 1 17 yes 3 2 66 no yes 4 2 58 yes

Vietnam service is defined as having a self report or a military report (or both) of having served in Vietnam All veterans provided valid post traumatic stress disorder data and at least one valid report of Vietnam service (yes/no). All Vietnam veterans also provided at least one valid report of purple heart receipt (yes/no).

Vietnam service is defined as having a self report or a military report (or both) of having served in Vietnam All veterans provided valid post traumatic stress disorder data and at least one valid report of Vietnam service (yes/no). All Vietnam veterans also provided at least one valid report of purple heart receipt (yes/no).

Command Self Report regress ptsd sr, robust sr | .1793066 .0070909 Linear regression Number of obs = 10796 F( 1, 10794) = 639.43 Prob > F = 0.0000 R-squared = 0.0599 Root MSE = .34613 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sr | .1793066 .0070909 25.29 0.000 .1654071 .193206 _cons | 3.130085 .0039722 788.00 0.000 3.122299 3.137871

Self Report Military Record Command regress ptsd mr, robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054

source by exposure interaction terms Model 1: The General Multiple Source Model expected outcome Generates same estimates as the k marginal source-specific models intercept source indicators source by exposure interaction terms Allows testing for a difference in sources

Multiple Informant Data id pairid PTSD self report military rec. 1 45 yes no 2 1 17 yes 3 2 66 no yes 4 2 58 yes

Command expand 2 id pairid PTSD sr mr id pairid PTSD sr mr 1 45 1 45 1 1 45 1 45 2 1 17 2 1 17 3 2 66 1 2 1 17 4 2 58 1 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

Command expand 2 id pairid PTSD sr mr 1 45 1 45 2 1 17 2 1 17 3 2 66 1 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

Command generate service=0 id pairid PTSD sr mr service 1 45 1 45 2 1 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

Command by id: replace service = sr if _n==1 id pairid PTSD sr mr 45 1 1 45 2 1 17 1 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 1 4 2 58 1

Command by id: replace service = mr if _n==2 id pairid PTSD sr mr 1 45 1 1 1 45 2 1 17 1 1 2 1 17 1 3 2 66 1 3 2 66 1 1 4 2 58 1 1 1 4 2 58 1 1

Command id pairid PTSD service 1 45 1 1 45 2 1 17 1 2 1 17 1 3 2 66 3 2 1 17 1 2 1 17 1 3 2 66 3 2 66 1 4 2 58 1 4 2 58 1

Command id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 3 generate s1 = 0 generate s2 = 0 id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 3 2 66 1 4 2 58 1 4 2 58 1

Command id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 1 by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2 id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

Command generate z1 = service * s1 generate z2 = service * s2 id pairid PTSD service s1 s2 z1 z2 1 45 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

Self Report Military Record Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053

But wait . . . these guys are twins! Data within twin pairs might be correlated . . .

All subjects provided at least one valid source report for the specified analyses Both twins provided valid PTSD data We created sham PTSD data for the ineligible twin, and set his sampling probability to zero We created a sham twin for the existing twin, and set the sham twin’s sampling probability to zero

Command svyset id [pweight = sampweight], strata(pairid) VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero> pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero>

Self Report Military Record Command svy: regress ptsd s1 z1 z2 Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Self Report Military Record Command test z1 = z2 Self Report sr | .1793066 .00618 Military Record mr | .152672 .0069024 . test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = 44.89 Prob > chi2 = 0.0000 . test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = 45.66 Prob > chi2 = 0.0000 Moral of the story: The two sources contain different information. We should not combine them. Or, should we??

Model 2: Multiple Source Model of Within- and Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models intercept source indicators source by within-pair effect interaction terms source by between-pair effect interaction terms Allows testing for a difference in reports of within effects & between effects

Command id pairid s1 z1 1 1 2 1 2 1

Command id pairid s1 z1 z1bar 1 1 . 2 1 2 1 . bysort pairid: egen z1bar = mean(z1) if s1==1 id pairid s1 z1 z1bar 1 1 . 2 1 2 1 .

Command id pairid s1 z1 z1bar 1 1 2 1 2 1 bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 2 1

Command id pairid s1 z1 z1bar 3 2 1 0.5 3 2 4 2 1 0.5 4 2 bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 3 2 1 0.5 3 2 4 2 1 0.5 4 2

Command id pairid s1 z1 z1bar 1 1 2 1 2 1 3 2 1 0.5 3 2 4 2 1 0.5 4 2 bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 2 1 3 2 1 0.5 3 2 4 2 1 0.5 4 2

Command id pairid s1 z1 z1bar z1diff 1 1 2 1 2 1 3 2 1 0.5 -0.5 3 2 4 bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar id pairid s1 z1 z1bar z1diff 1 1 2 1 2 1 3 2 1 0.5 -0.5 3 2 4 2 1 0.5 4 2

Command (Repeat that procedure to make z2bar and z2diff)

Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Command test z1diff = z2diff Adjusted Wald test Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Combine the within-pair info. Keep between-pair info. separate Command test z1diff = z2diff test z1bar = z2bar Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 ( 1) z1bar - z2bar = 0 F( 1, 6172) = 83.66 Prob > F = 0.0000 Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Within-pair estimates don’t differ much Moral of the story: Combine the within-pair info. Keep between-pair info. separate Between-pair estimates do!!

source by between-pair effect interaction terms Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources intercept source indicators combined source within-pair effect source by between-pair effect interaction terms Often yields a more precise estimate of the within-pair effect

Command id pairid z1diff z2diff 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2 1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2 0.5 4 2

Command id pairid z1diff z2diff wservice 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5 generate wservice = z1diff + z2diff id pairid z1diff z2diff wservice 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2 0.5 4 2

Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 Note: 35 strata omitted because they contain no population members

7 – 14% gain in efficiency over individual sources Conclusions from VET Registry analysis Sources differed in Model 1, so we did not combine them overall Within-pair estimates in Model 2 did not differ much by source, so . . . Model 3 combined within-pair estimates Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources

from VET Registry analysis Conclusions from VET Registry analysis Between-pair estimates in Model 2 differed significantly Model 3 estimates separate between-pair effects for each source Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16)

Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models

Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: 163-173. Nicholas Horton at Harvard Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:2911-2933.

Thank you for listening