Methods for Multilevel Analysis XH Andrew Zhou, PhD Professor, Department of Biostatistics University of Washington
Examples of Multilevel (Hierarchical) Data Individual-family-neighborhood Students-classroom-school- district Patient-provider-facility (the Ambulatory Care Quality Improvement Project (ACQUIP). Other types, multiple outcomes nested within individual
ACQUIP Alcohol Trial A group-randomized trial Intervention: Feedback given to the providers at each visit on patient’s general perceived health status as well as the condition specific perceived health status for 6 common conditions — chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), hypertension, depression, diabetes, and alcohol problems. Outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol from their provider; binary outcome.
Hierarchical Nature of Data Patients – Providers – facility Patient’s characteristics, e.g. advice at baseline, co-morbility Provider’s characteristics, e.g panel size Facility’s characteristics, e.g. urban vs rural.
Research Questions Whether the intervention was significantly related with patient self-reports of advice about alcohol from their providers after one year of the intervention. Independent effects of patient-level, provider-level, and facility-level factors. Quantification of provider-to-provider variability and facility-to-facility variability and the degree which it can be explained by patient-level, provider-level, and facility-level factors
Research Questions, Cont Do facilities differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors? Do providers differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors?
Multilevel (Hierarchical) Models A hierarchical model analysis will treat the sites and the providers as random effects and will parse out the amount of total variation in the outcome that is attributable to each level of hierarchy.
An example using two-level linear model on schools A study of the relationship between a single student-level predictor variable (say, socioeconomic status (SES)) and one student-level outcome variable (mathematics achievement) in J schools randomly drawn from the entire population of schools.
The SES-Achievement relationship in one school Our regression model would be Figure 2.1 provides a scatterplot of this relationship.
Centering in covariates 0 is defined as the expected achivement of a student whose SES is zero. It may be helpful to scale the independent variable, X, so that the intercept will be meaningful. We center SES by subtracting the mean SES from each score. Figure 2.2 shows the regression model with centering.
The SES-Achievement relationship in two schools Figure 2.3 shows separate regression models for two schools.
The two lines indicate that School 1 and School 2 differ in two ways. (1) School 1 has higher mean than school 2 ( 01 > 02 ) (2) SES is less predictive of achievement in School 1 than School 2 ( 11 < 12 ) If students had been randomly assigned to the two schools, we could say that School 1 is both more “effective” and more “equitable”. Of course, students are not assigned at random, so such interpretations of school effects are unwarranted without taking into account other differences in student composition.
The SES-Achievement relationship in J schools (2-level Variance Component)
Often sensible and convenient to assume that the intercept and slope have a bivariate normal distribution across the population of schools.
Interpretation 0 : the average school mean for the population of schools 00 : the population variance among the school means 1 : the average SES-achievement slope for the population of schools 11 :the population variance among the slopes 01 : the population covariance between slopes and intercepts
Figure 2.4 provides a scatterplot of the relationship between 0j and 1j for a hypothetical sample of 200 schools. There is more dispersion among means than slopes ( 00 > 11 ) Two effects tend to be negatively correlated ( 01 <0); schools with high averaged achievment, 0j, tend to have weak SES-achievement relationship, 1j
Modeling the second level Having examined graphically how schools vary in terms of their intcepts and slopes, we wish to develop a model to predict 0j and 1j using school characteristics. Let W j be an indicator, which takes on a value of one for Catholic schools and a value of zero for public
Two-level Linear Model, Cont
Interpretation 00 : the mean achievement for public schools 01 : the mean achievement difference between Catholic and public schools 10 : the average SES-achievement slope in public schools 11 : the mean difference in SES- achievement slope in between Catholic and public schools u 1j :the unique effect of school j on mean achievement holding W j constant u 0j : the unique effect of school j on SES- achievement slope holding W j constant
Estimation methods It is not possible to estimate the parameters of these regression models directly because the outcomes ( 0j, 1j ) are not observed. However, the data contain information needed for this estimation.
Estimation methods, cont Combining models in two stages, we obtain
Estimation methods, Cont The overall linear regression model is not the typical linear model assumed in standard ordinary least squares (OLS). Efficient estimation and accurate hypothesis testing based on OLS require that the random errors are independent, normally distributed, and have constant variance. In contrast, random errors in our overall model are dependent within each school and also have non-constant variances.
Estimation methods, cont. The variance of random errors has the following complicated form:
Estimation methods, cont Through standard regression analysis is not appropriate, such models can be estimated by iterative maximum likelihood procedure. Figure 2.5 provides a graphical representation of the model specified above. Here we see two hypothetical plots of the association between 0j and 1j, one for public and a second for Catholic schools. Plots show Catholic schools have both higher mean achievement and weaker SES effects than do the public school
Estimation methods, Cont Three types of parameters to estimate to be estimated: 1. Fixed effects ( 00, 01, 10, 11 ) 2. Random level-1 coefficients ( 0j, 1j ) 3. Variance-covariance components ( 2, 00, 11, 01 )
Three common estimation methods Maximum likelihood (ML) method is a general estimation procedure, which produces estimates for the population parameters that maximize the probability of the observing the data given the model. Iterative generalized least squares (IGLS) and Restricted Iterative generalized least squares. Bayesian method
ML method Two different likelihood functions: 1. Full Maximum Likelihood (FML) – both the regression coefficients and the variance components are included in the likelihood function. 2. Restricted Maximum Likelihood (RML) – only the variance components are included in the likelihood function, and the regression coefficients are estimated in a second estimation step.
Comparison of these two methods FML is more efficient and can provide estimates for both variance components and fixed effect parameters. But, FML may produce biased estimates for variance components. RML can provide less biases estimates for the variance components and is equivalent to ANOVA estimates, which are optimal, if the groups are balanced. FML still continues to be used because (1) its computation is generally easier, and (2) it is easier to compare two models that differ in the fixed parameters using the likelihood-based tests. However, with RML, only differences in the random part can be compared with likelihood-based tests
IGLS and RIGLS The combined model is
IGLS and RIGLS, Cont If , 00, 11, and 01 were known, then the covariance matrix,, could be constructed immediately, and the estimation could be performed with generalized least squares. However, without knowledge of the covariance matrix, the estimation method is instead and iterative process known as iterative generalized least squares (IGLS).
IGLS and RIGLS, Cont The first step is to start with reasonable estimates of the fixed parameters. Typically these are the estimates from Ordinary Least Squares (OLS) that assumes 00 = 11 = 01 =0. From these estimates, the raw residuals are formed:
IGLS and RIGLS, Cont
With the estimates of and from GLS, the iterative procedure returns to the fixed part of the model and calculates new estimates of the fixed effects. The procedure alternates between the fixed and random effects in this way until convergence, or until the parameter estimates do not change from iteration to iteration.
IGLS and RIGLS, Cont IGLS estimation may produce biased estimates of the random parameters because it does not take into account the sampling variation of the estimates for variance components. This may be most severe in small samples. However, unbiased estimates can be produced using Restricted Iterative Generalized Least Squares (RIGLS). The main difference between IGLS and RIGLS is that IGLS uses maximum likelihood and RIGLS uses restricted maximum likelihood.
Bayesian method Bayesian methods combine any prior information about the parameters with the information contained in the data to produce a posterior distribution. MCMC methods are commonly used computational methods for generaring a random sample from a posterior distribution. MCMC methods are also iterative and include Gibbs sampling and Metropolis-Hastings sampling. MCMC methods tend to produce more accurate interval estimates for small samples.
Three-level binary response models for the Alcohol Drinking Let Y ijk be the binary response variable for whether to receive drinking advice by subject i cared by provider j in hospital k X ijk is an intervention status for subject i by provider j in hospital k.
Three-level logistic regression
The parameter e is a natural test for whether the assumption of Binomial variation is valid. If is significantly different from one, the data is said to exhibit extra-binomial variation. If is less than one is, the data is said to be under-dispersed and if is greater than one, the data is is said to be over- dispersed.
Two estimation methods Two estimation methods for multi- level logistic regression models: A quasi-likelihood approach Bayesian approach with MCMC methods. I will briefly describe these two approaches below.
Two Quasi-likelihood methods For the quasi-likelihood approach, the first step in the estimation is to approximate the non-linear logistic regression equation using a Taylor series expansion. A Taylor series approximates a nonlinear function by an infinite series of terms. If only the first term in the series is used, then the estimation is known as a first order approximation. If the second term in the series is also used, then is referred to as second order approximation. If the Taylor series is expanded about the fixed parameters only, then the estimation is known as Marginal Quasi-likelihood (MQL).
Two Quasi-likelihood methods,Cont If the Taylor series is expanded about the fixed and the random parameters, then the estimation is known as Penalized Quasi-Likelihood (PQL). Once the quasi-likelihood has been formed, the estimation procedures, IGLS and RIGLS, can be applied to estimate the parameter values.
Bayesian method The MCMC method used for the logistic regression equations in this paper will be Metropolis-Hastings sampling.
ACQUIP Alcohol Trial Binary outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol patients receive from their provider. Patient-level covariates Provider-level covariates.
The Alcohol Example, Cont Random assignment at the firm level should ensure that, on average, the two groups should be balanced on the baseline covariates. However, imbalance may still occur and confounding may still present a problem. Patient-level potential confounders: hypertension, liver disease, being a smoker in the past year, and the AUDIT score. Provider-level potential confounders: the number of patients per provider (Panel Size) and provider training.
Alcohol example
Three-level logistic regression, Cont Here, the variables Hypertension, LiverDisease, and PastYearSmoker are dichotomous variables that are equal to one if the patient reported the condition and zero if the patient did not report the condition. The variable BaselineAUDIT is the patients AUDIT score at the baseline and is a continuous variable that ranges from 0 to 40, the variable PanelSize indicates the range of the provider’s panel size. The variables Fellow, NP, PA, Resident, and RN are dichotomous variables representing the categorical variable of provider type. The referent provider type is staff physician
Results Table shows the MQL estimates under the combinations of first order and second order approximation and the binomial and extra-binomial assumptions and Table shows the PQL estimates under the combinations of first order and second order approximation and the binomial and extra-binomial assumptions.
Results for fixed effects The estimates for the fixed effects are quite stable between estimation procedures. The estimate of the intervention effect is approximately 1.35, indicating that a patient in the intervention group is more likely to report advice than a patient in the control group. This result is not significant if a two-tailed test is used. However, this result is significant if a one-tailed test is used. The p-values for the one-tailed test range from 0.02 to 0.05 depending on which estimate is considered.
Results for fixed effects, Cont A patient self-report of advice at baseline as well as the patient’s baseline AUDIT score are the only additional variables significantly associated with a patient self-report of advice on the one-year follow-up survey. None of the provider-level variables are associated with a patient self-report of advice on the one-year follow-up survey.
Results for variances and covariances Estimates of the variance components are slightly more variable across estimation procedures in this model. The estimate of the site level variance component has increased from approximately zero to be in the range of 0.01 to However, these estimates tend to include zero in the confidence interval, indicating as before, that there may be little or no residual clustering at the site level. The provider level variance components estimates are between 0.01 and 0.16, thus showing the greatest variation under the different estimations.
Results for variances and covariances, cont The majority of the residual variation is at the patient level. The estimates for the patient level variance component remain close to one and support the assumption of binomial variance at the patient level.