Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.

Slides:



Advertisements
Similar presentations
Lecture 11 (Chapter 9).
Advertisements

SC968: Panel Data Methods for Sociologists Random coefficients models.
1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Analysis of Clustered and Longitudinal Data Module 3 Linear Mixed Models (LMMs) for Clustered Data – Two Level Part A 1 Biostat 512: Module 3A - Kathy.
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
CBER Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.

Multilevel Models 2 Sociology 229A, Class 18
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Multilevel Models 3 Sociology 8811, Class 25 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Generalized Linear Models
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
GEE and Generalized Linear Mixed Models
Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Repeated Measures, Part I April, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
HSRP 734: Advanced Statistical Methods June 19, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Count Models 1 Sociology 8811 Lecture 12
Humans and Hydrology at High Latitudes (H 3 L) Richard B. Lammers Water Systems Analysis Group, UNH Dan WhiteUniversity of Alaska, Fairbanks Lawrence C.
Limited Dependent Variables Ciaran S. Phibbs May 30, 2012.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Lecture 10. Examples of count data –Number of panic attacks occurring during 6-month intervals after receiving treatment –Number of infant deaths per.
Estimation in Marginal Models (GEE and Robust Estimation)
Multilevel Models 3 Sociology 229A, Class 10 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Lecture 5. Linear Models for Correlated Data: Inference.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Analysis of Experimental Data IV Christoph Engel.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
BINARY LOGISTIC REGRESSION
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
From t-test to multilevel analyses Del-2
assignment 7 solutions ► office networks ► super staffing
Lecture 18 Matched Case Control Studies
Generalized Linear Models
Generalized Linear Models
Introduction to Logistic Regression
Regression diagnostics
Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Count Models 2 Sociology 8811 Lecture 13
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Presentation transcript:

Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF

2 Outline 1.More on XTGEE 2.Examples 3.Robust standard errors 4.Binary outcomes 5.Changing the link function 6.Modeling practice 7.Summary

3 More on xtgee Recall the command format: xtgee depvar predvars, family (distribution) link (how to relate mean to predictors) corr (correlation structure) i (cluster variable) t (time variable) robust

4 More on xtgee Here are the commonly used options: Family binomial Gaussian (i.e., normal) default gamma nbinomial Poisson

5 More on xtgee Link identity (model mean directly) default for Gaussian log default for Poisson logit default for binomial power probit

6 More on xtgee: main menu

7 More on xtgee Correlation structure independent exchangeable default ar # unstructured

8 Examples Let’s try this on the birthweight data.. xtgee bweight birthord initage, family(gaussian) link(identity) corr(exchangeable) i(momid). xtgee bweight birthord initage, fam(gau) link(i) corr(exch) i(momid). xtgee bweight birthord initage, i(momid) all give the same output:

9 Examples Iteration 1: tolerance = 7.180e-13 GEE population-averaged model Number of obs = 1000 Group variable: momid Number of groups = 200 Link: identity Obs per group: min = 5 Family: Gaussian avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(2) = Scale parameter: Prob > chi2 = bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons |

10 Examples The command. xtcorr Estimated within-momid correlation matrix R: c1 c2 c3 c4 c5 r r r r r gives the estimated correlation structure.

11. xtgee bweight birthord initage, i(momid) corr(uns) GEE population-averaged model Number of obs = 1000 Group and time vars: momid birthord Number of groups = 200 Link: identity Obs per group: min = 5 Family: Gaussian avg = 5.0 Correlation: unstructured max = 5 Wald chi2(2) = Scale parameter: Prob > chi2 = bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons | xtcorr Estimated within-momid correlation matrix R: c1 c2 c3 c4 c5 r r r r r Examples: variation I The estimated correlations are a bit different, but the coefficients and SEs have remained almost unchanged

12 Examples: variation II. xtgee bweight birthord initage, i(momid) corr(uns) robust Iteration 1: tolerance = Iteration 2: tolerance = Iteration 3: tolerance = Iteration 4: tolerance = 1.668e-07 GEE population-averaged model Number of obs = 1000 Group and time vars: momid birthord Number of groups = 200 Link: identity Obs per group: min = 5 Family: Gaussian avg = 5.0 Correlation: unstructured max = 5 Wald chi2(2) = Scale parameter: Prob > chi2 = (standard errors adjusted for clustering on momid) | Semi-robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons | Robust doesn’t change the coefficients, but recalculates the SEs

13 Robust standard errors The “robust” option asks Stata to estimate the standard errors empirically from the data. This has the significant advantage that it gives valid standard errors even when the assumed correlation structure is wrong. It is also better than assuming an unstructured variance-covariance structure, because it bypasses the estimation of the correlations over “time” to directly get an estimate of the standard errors. The robust option works well when there are many “subjects” and not too much data per subject and not much missing data. So, for example, it would work very well when there are 2,000 subjects most measured yearly for four years. It would not work well if the “subjects” were 8 centers in a multi-center trial, each with 1,000 patients enrolled.

14 Examples: variation III (xtmixed) xtmixed bweight birthord initage || momid: Mixed-effects REML regression Number of obs = 1000 Group variable: momid Number of groups = 200 Obs per group: min = 5 avg = 5.0 max = 5 Wald chi2(2) = Log restricted-likelihood = Prob > chi2 = bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] momid: Identity | sd(_cons) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 =

15 Binary outcomes/logistic regression Now let’s take a look at the use of xtgee for clustered logistic regression. I took the Georgia babies data set and artificially dichotomized it as to whether birthweight was above or below 3000 grams. What options do we use now? family = link = corr =

16 Binary outcomes: low birthweight xtgee lowbirth birthord initage, i(momid) family(binomial) GEE population-averaged model Number of obs = 1000 Group variable: momid Number of groups = 200 Link: logit Obs per group: min = 5 Family: binomial avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(2) = Scale parameter: 1 Prob > chi2 = lowbirth | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons | How does this compare to the continuous data fit? OK to assume exchangeable?

17 Binary outcomes: robust option xtgee lowbirth birthord initag, i(momid) family(bino) robust GEE population-averaged model Number of obs = 1000 Group variable: momid Number of groups = 200 Link: logit Obs per group: min = 5 Family: binomial avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(2) = Scale parameter: 1 Prob > chi2 = (Std. Err. adjusted for clustering on momid) | Semi-robust lowbirth | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons |

18 Changing the link function What kind of model would this command fit? xtgee bweight birthord initage, i(momid) link(log) GEE population-averaged model Number of obs = 1000 Group variable: momid Number of groups = 200 Link: log Obs per group: min = 5 Family: Gaussian avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(2) = Scale parameter: Prob > chi2 = bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] birthord | initage | _cons | Interpretations:

19 Changing the link function

20 Changing the link function What kind of model would this command fit? xtgee lowbirth birthord initage, i(momid) family(bino) link(log)

21 Modeling practice 1 Epileptics were randomly allocated to a placebo or an anti-seizure drug (Progabide) group. The number of seizures was recorded during a baseline period and for four periods after beginning treatment. Is the drug effective at reducing the number of seizures? family = link = corr = predictors =

22 Modeling practice 2 A quality improvement program was designed to reduce prescription of antibiotics for antibiotic resistant infections in emergency departments. 16 hospitals were randomized to the program or control group. Is the program effective? family = link = corr = predictors =

23 Notes on xtgee xtgee is a flexible regression command Handles a single level of clustering Handles a wide variety of distributions, links and correlation structures Five questions: distribution, predictors, link, correlation structure, cluster variable Not designed for inferences about the correlation structure Doesn’t give predicted values for each cluster

24 Summary Hierarchical data structures are common. Lead to correlated data. Ignoring the correlation can be a serious error. xtgee can handle single level of clustering and a variety of outcome types. xtmixed can handle multiple levels for normally distributed data. Not discussed in class: xtmelogit and xtmepoisson can handle two levels of clustering for binary and Poisson outcomes. Random effects models and robust SEs are also available for time-to-event data. GEE methods have the advantage of robust SEs. Random effects models have the advantage of being able to generate predicted values and partition variability.