Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)

Similar presentations


Presentation on theme: "Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)"— Presentation transcript:

1 Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)

2 Marginal Logistic Regression Model and GEE
Marginal models are suitable to estimate population average parameters For example, in the Indonesian study, a marginal model can be used to address questions such as: What is the prevalence of respiratory infection in children as a function of age? Is the prevalence of respiratory infection greater in the sub-population of children with vitamin A deficiency? How does the association of vitamin A deficiency and respiratory infection change with age? The scientific objective is to characterize and contrast populations of children.

3 Marginal Models for Binary Responses: Logistic Regression
Model for the Mean Model for the Association Marginal odds ratio,

4 Marginal Odds Ratio a greater value indicates positive association
Two possible specifications: Degree of association is the same for all pairs of observations from the same subject Degree of association is inversely proportional to the time between observations from the same subject

5 Parameter Interpretation in Logistic Regression: ICHS Study
Marginal Logistic Regression

6 Parameter Interpretation in Logistic Regression: ICHS Study
Logistic Regression with Random Effects

7 Parameter Interpretation in Logistic Regression: ICHS Study
Transition Logistic Regression Model

8 Parameter Interpretation in Logistic Regression: ICHS Study
Transition Logistic Regression Model (cont’d)

9 Maximum Likelihood Estimation of β in GLM: Cross-sectional Data
If Yi is binary or a count, we specify the likelihood function and estimate the parameters of interest using Maximum Likelihood Estimation

10 Maximum Likelihood Estimation of β in GLM: Cross-sectional Data
For example, if Y is binary, i.e.: …we estimate β0 and β1 by maximizing

11 Maximum Likelihood Estimation of β in GLM: Cross-sectional Data
For example, if Y is a count, i.e.: …we estimate β0 and β1 by maximizing

12 Maximum Likelihood Estimation of β in GLM: Cross-sectional Data
In general, we have: Solving the score equation is equivalent to maximizing the likelihood function. is called the score equation. Solutions to the score equation are not available in closed form, and so require an iterative procedure called iterative weighted least squares (IWLS) algorithm…

13 Maximum Likelihood Estimation of β in GLM: Cross-sectional Data
Main ideas of IWLS i = EYi, vi = var(Yi) = v(i) Choose to make close to yi on average Weight yi by vi-1

14 GEE Estimation of β in GLM: Longitudinal Data
In the case of a linear regression model with the assumption of normality, the extension from ordinary linear regression to longitudinal problems was facilitated by thinking about a multivariate normal distribution. By specifying a model for the mean E[Yi] and the model for the covariance matrix Vi, we can fully specify the multivariate normal distribution: and use MLE.

15 GEE Estimation of β in GLM: Longitudinal Data
Unfortunately, if the elements of Yi are counts or binary response, we cannot naturally extend the Bernoulli or Poisson distributions to take into account of correlation. Multivariate extensions of these distributions are quite complex (except for biostat students!). The main impediments with binary and count data are: There are not multivariate generalizations of the necessary probability distributions Population-average and subject-specific approaches do not lead to the same model for the mean response

16 GEE (applies only to marginal models)
Under a GEE approach, we forget about trying to specify a model for the whole multivariate distribution of a data vector. Instead, the idea is to just model the mean response E[Yi] and the covariance matrix Vi of a data vector as in the normal case. GEE is based on the concept of “estimating equations” and provides a very general approach for analyzing correlated responses that can be discrete or continuous.

17 GEE The idea behind GEE is to generalize and extend the usual likelihood equations for a GLM with a univariate response by incorporating the covariance matrix of the vector of responses Y For the case of linear models, the Generalized Least Square (GLS) estimator for the vector of regression coefficients is a special case of the GEE approach

18 GEE In the absence of a convenient likelihood to work with, it is sensible to estimate β by solving the following multivariate equation: where Note: with continuous data, the estimate from this score equation reduces to the MLE

19 GEE (cont’d) The method of generalized estimating equations provides consistent estimates for the mean parameter when a model for the correlation may not be reliably specified. is a multivariate generalization of the score equation used to maximize the likelihood function under a GLM

20 GLM for Longitudinal Data (GEE)
In summary: For GEE models, we specify a GLM for the mean response independence, completely unstructured The estimates of β and their standard errors will be consistent (i.e. unbiased for large sample size). If the specification of Vi is correct, then the GEE solution is the maximum likelihood estimate.

21 GEE One important property of the GLM family is that the score function depends only on the mean and variance of Yi. Therefore the estimating equation: can be used to estimate the regression coefficients for any choices of link and variance functions, whether or not they correspond to a particular member of the exponential family. is the generalized estimating equation.

22 GEE Properties is nearly efficient relative to the maximum likelihood estimate of , provided that var(Yi) has been reasonably approximated. GEE is the maximum likelihood score equation for multivariate Gaussian data, and for binary data, when var(Yi) is correctly specified. is consistent for , even when var(Yi) is incorrectly specified.

23 What we need to specify for implementing GEE
Model for the mean Known variance function Working correlation matrix: model for the pairwise correlations among the responses

24 Working covariance matrix
V is called the working covariance matrix to distinguish it from the true underlying covariance of Y

25 GEE minimize GEE equations Solution of the GEE equation

26 Properties of GEE Estimates
The GEE estimator is consistent whether or not the within-subject associations/correlations have been correctly modelled. That is, for the GEE estimator to provide a valid estimate of the true β, we only require that the model for the mean response has been correctly specified.

27 Asymptotic distribution of the GEE estimator
In large samples, the GEE estimator is multivariate normal True covariance matrix

28 Sandwich estimate of bread meat Consistent estimate of the true
covariance matrix of Y

29 Link to stata command xtgee for continuous data
substitute into GEE equations, to get… xtgee, identity link, corr(exch) Use Weighted Least Square for

30 Link to stata command xtgee for continuous data (cont’d)
xtgee, identity link, corr(exch), robust Use Sandwich Estimator for

31 Link to stata commands xtgee for binary data
Substitute into GEE equation. But no closed-form solution, so need iterative procedure. Difference between using robust or not, is analogous to continuous data. xtgee,logit link, corr(exch) xtgee, logit link, corr(exch), robust

32 Using GEE

33 Bottom Line Focus on modeling the mean structure
If the scientific focus is on the regression coefficients β: Focus on modeling the mean structure Use a reasonable approximation of the covariance structure Check the inferences for β by comparing β’s robust standard errors with respect to different covariance assumptions If the β’s standard errors differ substantially, a more careful treatment of the covariance model might be necessary.

34 Maximum Likelihood Estimation for Binary Data

35 Example: 2x2 crossover trial
Data from the 2x2 crossover trial on cerebrovascular deficiency adapted from Jones and Kenward, where treatment A and B are active drug and placebo, respectively: the outcome indicates whether an electrocardiogram was judged abnormal (0) or normal (1).

36 Example: 2x2 crossover trial (cont’d)
Goal: To compare the effect of an active drug (A) and a placebo (B) on cerebrovascular deficiency 34 patients received A followed by B 33 patients received B followed by A Yij = 1 if normal electrocardiogram reading At period 1:

37 Example: 2x2 crossover trial (cont’d)
Calculate MLE of odds ratios separately for period 1 and period 2. Odds ratio of being normal for the active drug versus the placebo is: This estimate is larger than 1, and therefore indicates that the active drug produces a higher proportion of normal readings. However, the estimate is not statistically significant. Should we compare (???) the data for Periods 1 and 2?

38 Example: 2x2 crossover trial (cont’d)
This approach has several limitations: Ignore the carry-over effect, i.e. the effect of the treatment at period 1 might influence the response at period 2 (treatment x period interaction) Two responses for the same subject are likely to be correlated In fact, the odds ratio is estimated to be So, let’s use GEE to estimate a population average odds ratio, taking into account within-subject correlation

39 Example: 2x2 crossover trial (cont’d)
GEE Approach We combine data from both periods. We can analyze a 2x2 crossover trial as a longitudinal study with ni = n = 2 and m = 67.

40 Example: 2x2 crossover trial (cont’d)
GEE Approach (cont’d) Fit a logistic regression model:

41 exp(0.57)-1 = 0.77  Population average odds of a normal reading are estimated to be 77% higher when using the drugs as compared to the placebo exp(3.56) = 35  Subjects with normal responses at the first visit have odds of normal reading at the next visit that are almost 35 times higher than those whose first response was abnormal

42 In summary Model 1 includes the treatment x period interaction (little support from the data), and estimates marginal odds ratio by GEE Model 2 drops the period x treatment interaction and estimates the marginal odds ratio by GEE Model 3 assumes that the marginal odds ratio is 1, here = 0.56, with standard error 0.38 (much larger than under Models 1 and 2) Note: If we fit Model 3, but using robust standard errors, then we obtain similar results to the GEE approach.

43 The prevalance of respiratory infections in six consecutive quarters reveals a positive seasonal trend with a summer maximum The prevalence of xerophthalmia also indicates some seasonality with a winter maximum

44 Example: Respiratory Infections
275 children in Indonesia were examined for up to six consecutive quarters for the presence of respiratory infections (i=1,…,m=275; j=1,…,6 visits). Goals of the analysis Determine whether prevalence of respiratory infection is higher among children who suffer from xerophthlamia (an ocular manifestation of chronic vitamin A deficiency)? Estimate the change of respiratory infection with age. Consider seasonality as a potential confounder.

45 Cross-Sectional Analysis
Model 1: First visit only Look only at the data from the first visit Fit a logistic regression model of respiratory infection on xerophthalmia and age, adjusting for other covariates We find a strong non-linear cross-sectional age effect on the prevalence of respiratory infection Cross-sectional analysis suggests that the prevalence of respiratory infection increases from age 12 months and reaches its peak at age 20 months before starting to decline

46 Cross-Sectional Analysis
Model 2: All visits + controlling for seasonality Look at data from all visits Fit a logistic regression model of respiratory infection on xerophthalmia and age, adjust for other covariates We still find a strong non-linear cross-sectional age effect on prevalence of respiratory infection The age coefficient in Model 2 can be interpreted as weighted averages of the cross-sectional age coefficients for each visit.

47 Longitudinal Analysis
Here we want to distinguish the contributions of cross-sectional and longitudinal information to the estimated relationship of respiratory infection and age.

48 Longitudinal Analysis
Model 3: Separate CS from LDA Separate differences among sub-populations of children at different ages and a fixed time (CS) from changes in children over time (LD)

49 Longitudinal Analysis
Model 3: Separate CS from LDA

50 Longitudinal Analysis
Model 4: Separates CS from LDA + controlling for seasonality

51 Summary of Results Pattern of convex relationship between age and the risk of respiratory infection appears to coincide with the pattern of seasonality. If we include harmonic terms, then the longitudinal parameters (corresponding to the follow-up in the table) are not statistically significant. The longitudinal information (i.e. variation over time of respiratory infections versus variations over time of age) is highly confounded by seasonality. Therefore, in the presence of a strong seasonal signal, we can learn little about the effects of aging from data collected over an 18-month period if we restrict our attention to longitudinal information. However, much can be learned by comparing children at different ages so long as we can assume that there are not cohort effects confounding the inferences about age. BE CAREFUL – in longitudinal analysis, always look for time-varying confounders!

52 Table Logistic regressions of the prevalence of respiratory function on age and xerophthalmia adjusting for gender, season, and height for age. Model 1 and 2 estimate cross-sectional effects; Models 3 and 4 distinguish cross-sectional from longitudinal effects. Models 2-4 are fitted using the alternating logistic regression implementation of GEE. βc βL CS, 1st visit only CS, All visits LD LD

53 The association between RI and Xero
The association between RI and Xero. is positive, although not statistically significant at the 5% level. Model 2

54 Prevalence of respiratory infection increases from age 12 months and reaches its peak at 20 months before starting to decline. The age coefficients in Model 2 can be interpreted as a weighted average of the cross-sectional age coefficients from each visit.

55 The risk of RI declines in the first 7 to 8 months of follow-up, before rising later in life.
(ageij – ageik)


Download ppt "Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)"

Similar presentations


Ads by Google