Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State University
The research described in this presentation has been funded by the U.S. Environmental Protection Agency through the STAR Cooperative Agreement CR National Research Program on Design-Based/Model- Assisted Survey Methodology for Aquatic Resources at Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred.
BACKGROUND Generalized estimating equations (GEE) and quadratic inference functions (QIF) are used in longitudinal studies The techniques are useful when observations are clustered or correlated Can be generalized to any survey data collected over time or space
Work by Liang and Zeger (1986) Question of interest: –Pattern of change in time –Dependence of response on covariates Approach: –Working GLM for the marginal distribution of the response –Advantages of estimating equations –Give consistent estimates of regression parameters and their variances –Increase efficiency –Methods reduce to maximum likelihood when responses are multivariate normal
Work by Liang and Zeger (cont’d) Let R(α) be a nxn correlation matrix –The “working correlation matrix” Let α be an sx1 vector that fully characterizes R(α) Define is the true correlation matrix
GEE Define GEE as: Similar to quasi-likelihood approach –Substitute estimators for α and φ –Solve for Consistency of depends on –Correct specification of the mean, not of R(α) –MCAR data
Zeger, Liang, & Albert (1988) Two approaches: –Subject-specific (SS) model –Population-averaged (PA) model When there is no heterogeneity between subjects, SS model = PA model Applications –Site-specific trend over time –Population-averaged trend of many sites over time
Rao, Yung, Hidiroglou (2002) GEE used with poststratification to obtain GREG estimator Use calibration weights = (design weight) x (PS Adj factor) Simple cases (mean, LS) –Closed form solution available –Taylor linearization variance estimator Complex cases (logistic) –Newton-Raphson to obtain estimate –Jackknife variance estimator
Qu, Lindsay, & Li (2000) Drawbacks of the GEE approach: –When R misspecified, Moment estimator of α doesn’t give optimal Moment estimator of α doesn’t exist in some cases Goal: introduce strategy for estimating the working correlation to correct problems Quadratic inference functions (QIF) –Form:
Developing QIF Plays inferential role similar to negative of log-likelihood Optimal linear combination of elements of the score vector reduces to QL equation Combine parameter estimates optimally when dimension of parameters differ for different missing data patterns Model the inverse of correlation matrix as a linear combination of known matrices
QIF QL equation is a linear combination of the “extended score” g is efficient if weights are inverse of variance QIF analogous to Rao’s score test statistic –May be used to test MCAR (ignorable missing) data assumption for several missing data patterns
Applications Extend use of GEE in survey methodology to include QIF Use QIF to –Estimate trend in time For one site Over all sites –Account for spatial correlation at a point in time –Account for revisit sites within a year (e.g. ODFW habitat surveys)
Research directions Weight QIF by within-cluster variance for cluster samples Account for variable probability sampling Conduct tests of trend using asymptotic distribution
Acknowledgements Annie Qu, Oregon State University
References Liang, K. and S.L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73: Rao, J.N.K., W. Yung, and M.A. Hidiroglou (2002). Estimating equations for the analysis of survey data using poststratification infromation. Sankhya 64(A): Qu, Annie, B.G. Lindsey, and B. Li (2000). Improving generalised estimating equations using inference functions. Biometrika, 87: Qu, Annie and Peter X.-K. Song (2002). Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika, 89: Zeger, S.L. and K. Liang (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42: Zeger, S.L., K. Liang, and P.S. Albert (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics, 44:
Work by Liang and Zeger (cont’d) Marginal density of response is:
GEEs Define GEEs as: Similar to quasi-likelihood approach Substitute estimators for α and φ Solve for