Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.

Slides:



Advertisements
Similar presentations
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Advertisements

Lecture 11 (Chapter 9).
GENERAL LINEAR MODELS: Estimation algorithms
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
1 Results from hsb_subset.do. 2 Example of Kloeck problem Two-stage sample of high school sophomores 1 st school is selected, then students are picked,
1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Analysis of Clustered and Longitudinal Data Module 3 Linear Mixed Models (LMMs) for Clustered Data – Two Level Part A 1 Biostat 512: Module 3A - Kathy.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Advanced Panel Data Techniques
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.

Linear Regression with One Regression
Multilevel Models 2 Sociology 8811, Class 24
Multilevel Models 2 Sociology 229A, Class 18
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Multilevel Models 3 Sociology 8811, Class 25 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
G Lecture 111 SEM analogue of General Linear Model Fitting structure of mean vector in SEM Numerical Example Growth models in SEM Willett and Sayer.
GEE and Generalized Linear Mixed Models
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)
Chapter 11 Simple Regression
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Behavior in blind environmental dilemmas - An experimental study Martin Beckenkamp Max-Planck-Institute for the Research on Collective Goods Bonn – Germany.
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
Repeated Measures, Part I April, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
HSRP 734: Advanced Statistical Methods June 19, 2008.
Modeling Repeated Measures or Longitudinal Data. Example: Annual Assessment of Renal Function in Hypertensive Patients UNITNOYEARAGESCrEGFRPSV
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Estimation in Marginal Models (GEE and Robust Estimation)
Multilevel Models 3 Sociology 229A, Class 10 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 5. Linear Models for Correlated Data: Inference.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
From t-test to multilevel analyses Del-3
Chapter 15 Panel Data Models.
Chapter 12 Simple Linear Regression and Correlation
From t-test to multilevel analyses Del-2
Probability Theory and Parameter Estimation I
CHAPTER 7 Linear Correlation & Regression Methods
Lecture 18 Matched Case Control Studies
From t-test to multilevel analyses (Linear regression, GLM, …)
Linear Mixed Models in JMP Pro
Chapter 12 Simple Linear Regression and Correlation
J.-F. Pâris University of Houston
Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 2 Multilevel analysis in Stata (with a focus on random slope models for.
Table 4. Panel Regression with Fixed Effects
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
Simple Linear Regression
Financial Econometrics Fin. 505
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Presentation transcript:

Lecture 4 (Chapter 4)

Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference we make about the parameters of interest recognize the likely correlation structure of the data. There are two ways of achieving this: 1. To build explicit parametric models of the covariance structure 2. To use methods of inference which are robust to misspecification of the covariance structure

General Linear Models for Correlated Data: Examples Uniform Correlation Model –One-sample repeated measures ANOVA Growth Model Exponential Correlation Model Autoregressive Model of Order 1

A simple example Covariance matrix Correlation matrix

Notation: Balanced Data

Notation: Unbalanced Data Outcome measures on subject “i” repeated ni times 12y white 1 3y white 1 10y white Ex: Values of the covariates for subject “i” in long format Covariance matrix for subject i Regression model for longitudinal data

Notation Vector of responses in the super-population Design matrix Vector of regression coefficients

Notation We assume (i.e. everyone has the same covariance matrix) Covariance matrix of subject i Covariance matrix between subject “j” and subject “k”

Covariates may be:

Covariates may be… (cont’d)

General Linear Model with Correlated Errors Balanced Data Nx1 (Nxp)x(px1)(NxN)

Uniform Correlation Model: Parametric form of the covariance matrix When measurements are equally-spaced and the data are balanced, one assumption is that the correlation between any pair of measurements is always the same (or, “exchangeable”)

Example: Weight of Pigs

Example: Weight of Pigs (cont’d) What do we see? 1.All pigs gain weight over time. 2.The pigs which are the largest at the beginning are the largest at the end. 3.Variance across pigs increases over time. (Increasing variation in the growth rates of the individual pigs.) Figure 3.1. Data on the weights of 48 pigs over a 9- week period.

1.Between: There is heterogeneity between pigs, due for example to natural biological (genetic?) variation (random intercept) 2.Within: There is random variation in the measurement process for a particular unit at any given time. For example, on any given day a particular guinea pig may yield different weight measurements due to differences in scale (equipment) and/or small fluctuations in weight during a day (slope on time) Example: Weight of Pigs For this type of repeated measures study, we recognize two sources of random variation:

A) Linear model with random intercept Variance between units (clusters) Variance within units (measurement error variance) Proportion of total variance due to between units variance Random effect Total variance

Simulated Data: Non-Clustered Cluster Number (units) Repeated measures within a cluster Total variance = 9.8 Variance within = 9.8 Variance between =0 Within cluster correlation = 0

Simulated Data: Clustered Cluster Number (units) Repeated measures within a cluster Total variance = 9.8 Variance within = 3.2 Variance between =6.6 Within cluster correlation = 6.6/9.8=0.67

Model for the mean Model for the covariance matrix B) Marginal Model with a Uniform Correlation Structure nxn (nx1)(1xn)nxn

Models A and B are equivalent Variance between Variance within

Pigs – Independent fit. xtreg weight time, pa i(Id) corr(ind) GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: independent max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = Pearson chi2(432): Deviance = Dispersion (Pearson): Dispersion = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | Independence correlation model results

xtreg weight time, pa i(Id) corr(exch) Iteration 1: tolerance = 5.585e-15 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: exchangeable max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | “Population Average”, Marginal Model with Exchangeable Correlation structure results Pigs – Marginal Model

Marginal Model E[ Y i ] = β 0 + β 1 time

Pigs – RE model xtreg weight time, re i(Id) mle Random-effects ML regression Number of obs = 432 Group variable (i): Id Number of groups = 48 Random effects u_i ~ Gaussian Obs per group: min = 9 avg = 9.0 max = 9 LR chi2(1) = Log likelihood = Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | /sigma_u | /sigma_e | rho | Linear model with a random intercept - “conditional model” std between std within

Random Effects Model E[ Y i | U i ] = β 0 + β 1 time  + U i E[ Y i ] = β 0 + β 1 time

Pigs – GEE Fit. xtgee weight time, i(Id) corr(exch) Iteration 1: tolerance = 5.585e-15 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: exchangeable max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | GEE fit – Marginal Model with Exchangeable Correlation structure results

Pigs – GEE Fit. xtgee weight time, i(Id) corr(exch). xtcorr Estimated within-Id correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r GEE fit – Marginal Model with Exchangeable Correlation structure results

One sample repeated measures ANOVA

One sample repeated measures ANOVA (cont’d)

One group polynomial growth curve model

Cov(Y i ) can be uniform or exponential

Pigs – RE model, quadratic trend. gen timesq = time*time. xtreg weight time timesq, re i(Id) mle Random-effects ML regression Number of obs = 432 Group variable (i): Id Number of groups = 48 Random effects u_i ~ Gaussian Obs per group: min = 9 avg = 9.0 max = 9 LR chi2(2) = Log likelihood = Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | timesq | _cons | /sigma_u | /sigma_e | rho | Exchangeable Correlation structure results

Pigs – Marginal model, quadratic trend. xtgee weight time timesq, i(Id) corr(exch) GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: exchangeable max = 9 Wald chi2(2) = Scale parameter: Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | timesq | _cons | Exchangeable Correlation structure results

Exponential Correlation / Autoregressive Model STATA: xtgee corr(ar1) “Auto-correlated Errors”

Autoregressive

Pigs – Marginal model: AR(1) xtgee weight time, i(Id) corr(AR1) t(time) GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: AR(1) max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | GEE-fit Marginal Model with AR1 Correlation structure

Pigs – RE model: AR(1) xtregar weight time RE GLS regression with AR(1) disturbances Number of obs = 432 Group variable (i): Id Number of groups = 48 R-sq: within = Obs per group: min = 9 between = avg = 9.0 overall = max = 9 Wald chi2(2) = corr(u_i, Xb) = 0 (assumed) Prob > chi2 = weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] time | _cons | rho_ar | (estimated autocorrelation coefficient) sigma_u | sigma_e | rho_fov | (fraction of variance due to u_i) theta | Random Effects Model with AR1 Correlation structure

Marginal Model

Important Points Modelling the correlation in longitudinal data is important to be able to obtain correct inferences on regression coefficients β There are correspondences between random effect and marginal models in the linear case because the interpretation of the regression coefficients is the same as that in cross-sectional data Correlation can be formulated in terms of subject-specific models and/or transition models Exchangeable correlation model: subject-specific formulation Exponential correlation model: transition model formulation

(Still More) Important Points Three basic elements of correlation structure: Random effects Autocorrelation or serial dependence Noise, measurement error Incorporating correlation into estimation of regression models is achieved via weighted least squares

(Still More) Important Points There are many ways of estimating correlation parameters. We will study some of these. Correlation models can be approximate We will call these working correlation models (“our best shot”) Regression coefficients estimates will still be correct We will see how to “fix up” standard errors to account for inaccuracies in correlation models