HSRP 734: Advanced Statistical Methods June 19, 2008
Extensions of Logistic Regression Outcomes with more than 2 categories –Categories have order –Unordered Conditional logistic regression –Analysis of matched data
Extensions of Logistic Regression Exact methods for small samples –Fisher’s exact –Exact logistic regression Correlated/Clustered data –GEE method –Mixed models
Extensions of Logistic Regression Outcomes with more than 2 categories (polytomous or polychotomous) Cumulative logit model – Proportional odds model for ordinal outcomes (ordered categories) Generalized logit model for nominal outcomes or non-proportional odds models (unordered categories)
Extensions of Logistic Regression Cumulative logit model –Fits a logistic regression model with g-1 intercepts for a g category outcome and one model coefficient for each predictor –Models cumulative probability of being in a “lower” category
Ordinal Logistic Regression Odds ratios take on interpretation “% increase/decrease in the odds of being in a lower/higher category” Subject to the “Proportional Odds” assumption
Extensions of Logistic Regression Generalized logit model –Fits a logistic regression model with g-1 intercepts and g-1 model coefficients for a g category outcome –Model captures the multinomial probability of being in a particular category using generalized logits
Nominal Logistic Regression Odds ratios have regular interpretation, just have to be careful with which comparisons are being made (reference category) Does not assume “Proportional Odds”
SAS
Conditional logistic regression Can use for matched data (e.g., case- control studies) Provides unbiased estimates of odds ratios and CI’s
SAS
Extensions to Logistic Regression Exact Logistic Regression Small Sample Size Adequate sample size but rare event (sparse data)
Fisher’s exact test Exact test for RxC table where Chi- square test assumptions are doubtful Why not always use Fisher’s exact test and Exact logistic regression?
SAS
Extensions of Logistic Regression Longitudinal data / repeated measures data / Clustered data with binary outcomes Multilevel models (nested data structures) GEE (Generalized Estimating Equations) GLMM (Generalized Linear Mixed Models)
Two methods for handling clustered outcomes Mixed models –Likelihood based –Use random effects to model clustered observations –continuous outcome (but now extended for categorical) Generalized Estimating Equation (GEE) –Non-likelihood based –Can handle large number of clusters –categorical outcome
GEE GEE can be used in –Longitudinal studies repeated measures of the same individual form a cluster –Community studies subjects clustered by neighborhood –Familial studies subjects clustered by family –Epidemiological studies Different forms of clusters – e.g., pedigree
GEE In general GEE has 3 sets of parameters to estimate: –Regression parameter (population-averaged effects) –Correlation parameter (cluster parameter) –Scale factor (not uncommon to assume =1)
Comparing SLR and GEE SLRGEE No dispersion allowed for variance Var (y)= mu(1-mu) Dispersion allowed for variance Var (y)= mu(1-mu)*scale_factor No need to specify correlation matrix Need to specify correlation structure Has odds ratio interpretation of exp(coefficient)
GEE In its simplest form, GEE can be considered an extension of logistic regression for clustered data Clustered data are common –Time: Longitudinal analysis with repeated measurements on individual (e.g., BL, 1m, 2m, 6m follow-up) –Individual: Cross-sectional analysis with multiple outcomes (e.g., left eye, right eye) –Background: Subjects clustered because of common geographical or social background (e.g., clinic)
Correlation structure –Often called the working correlation structure in GEE –Specifies how the observations within a cluster are related –Often assumes correlation structure uniform throughout clusters
Unstructured –All correlation coefficients free to take any value –E.g.,
Exchangeable –Any responses within the same cluster has the same correlation –Simple (1 parameter to estimate)
Autogressive AR(1) Correlation between responses depends on the interval of time between responses –Farther apart responses => weaker correlation –Only 1 parameter to estimate!
Correlation matrix Selection of a “working correlation structure” is at the discretion of the researcher! How does the correlation structure affects the results?
Properties of GEE estimators How about estimate of correlation if “working” correlation matrix is not correctly specified? Model-based estimate => not consistent Empirical (robust) estimate => still consistent
Properties of GEE estimators Even if correlation structure misspecified, estimate for logistic regression is still consistent –if correlation misspecified, estimate not as efficient (SE is larger) –This property contributes to the popularity of GEE GEE works well with larger #’s of clusters
SAS
Review