Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
1 Results from hsb_subset.do. 2 Example of Kloeck problem Two-stage sample of high school sophomores 1 st school is selected, then students are picked,
1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Adaptive expectations and partial adjustment Presented by: Monika Tarsalewska Piotrek Jeżak Justyna Koper Magdalena Prędota.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Addressing Alternative Explanations: Multiple Regression Spring 2007.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Returning to Consumption
Serial Correlation and the Housing price function Aka “Autocorrelation”
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Repeated Measures, Part I April, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Diff-inDiff Are exports from i to j, the same as imports in i from j? Should they be?. gen test=xij-mji (14 missing values generated). sum test,
From t-test to multilevel analyses Del-2
Lecture 18 Matched Case Control Studies
The slope, explained variance, residuals
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Presentation transcript:

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD

idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data Military Service in Vietnam

Command regress ptsd sr, robust Linear regression Number of obs = F( 1, 10794) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] sr | _cons | Self Report sr |

Command regress ptsd mr, robust Self Report sr | Military Record mr | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] mr | _cons | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] mr | _cons |

intercept source indicators source by exposure interaction terms expected outcome Model 1: The General Multiple Source Model Generates same estimates as the k marginal source-specific models Allows testing for a difference in sources

idpairidPTSDself reportmilitary rec. 1145yesno 2117yes 3266noyes 4258yes Multiple Informant Data

idpairidPTSDsrmr Command expand 2 idpairidPTSDsrmr

idpairidPTSDsrmr Command expand

Command generate service=0 idpairidPTSDsrmr service

Command by id: replace service = sr if _n==1 idpairidPTSDsrmr service

Command by id: replace service = mr if _n==2 idpairidPTSDsrmr service

Command idpairidPTSD service

Command idpairidPTSDservices1s generate s1 = 0 generate s2 = 0

Command idpairidPTSDservices1s by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2

Command idpairidPTSDservices1s2z1z generate z1 = service * s1 generate z2 = service * s2

Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | Military Record mr | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = Scale parameter: Prob > chi2 = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = Scale parameter: Prob > chi2 = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons |

But wait... these guys are twins! Data within twin pairs might be correlated...

pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: Command svyset id [pweight = sampweight], strata(pairid) pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1:

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1 z2 Self Report sr | Military Record mr | Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

. test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = Command test z1 = z2 Self Report sr | Military Record mr | test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = We should not combine them. Moral of the story: The two sources contain different information. Or, should we??

intercept source indicators within-pair source by within-pair effect interaction terms between-pair source by between-pair effect interaction terms Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models Allows testing for a difference in reports of within effects & between effects

Command idpairids1z

Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar

Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==0

Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==

Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1bar bysort pairid: replace z1bar=0 if s1==0

Command bysort pairid: egen z1bar = mean(z1) if s1==1 idpairids1z1z1barz1diff bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar

Command (Repeat that procedure to make z2bar and z2diff)

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 5, 6168) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = Adjusted Wald test ( 1) z1bar - z2bar = 0 F( 1, 6172) = Prob > F = test z1bar = z2bar Within-pairestimates don’t differ much Between-pairestimatesdo!! Moral of the story: 1.Combine the within-pair info. 2.Keep between-pair info. separate

intercept source indicators within-pair combined source within-pair effect between-pair source by between-pair effect interaction terms Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources precise estimate Often yields a more precise estimate of the within-pair effect

idpairidz1diffz2diff Command

idpairidz1diffz2diffwservice Command generate wservice = z1diff + z2diff

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 4, 6169) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z1bar | z2bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 3, 6170) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = Number of PSUs = Population size = Design df = 6172 F( 4, 6169) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members

Conclusionsfrom VET Registry analysis Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources Model 1 Sources differed in Model 1, so we did not combine them overall Model 2 Within-pair estimates in Model 2 did not differ much by source, so... Model 3 Model 3 combined within-pair estimates

Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16) Conclusionsfrom VET Registry analysis Model 2 Between-pair estimates in Model 2 differed significantly Model 3 Model 3 estimates separate between-pair effects for each source

Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models

Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW 1.Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: Nicholas Horton at Harvard 2.Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:

Thank you for listening