Part 2 Schematic of the alcohol model Marginal and conditional models Variance components Random Effects and Bayes General, linear MLMs
MULTI-LEVEL MODELS Biological, physical, psycho/social processes that influence health occur at many levels: –Cell Organ Person Family Nhbd City Society ... Solar system –Crew Vessel Fleet ... –Block Block Group Tract ... –Visit Patient Phy Clinic HMO ... Covariates can be at each level Many "units of analysis" More modern and flexible parlance and approach: "many variance components"
Factors in Alcohol Abuse Cell: neurochemistry Organ: ability to metabolize ethanol Person: genetic susceptibility to addiction Family: alcohol abuse in the home Neighborhood: availability of bars Society: regulations; organizations; social norms
ALCOHOL ABUSE A multi-level, interaction model Interaction between prevalence/density of bars & state drunk driving laws Relation between alcohol abuse in a family & ability to metabolize ethanol Genetic predisposition to addiction Household environment State regulations about intoxication & job requirements
ONE POSSIBLE DIAGRAM Personal Income Family income Percent poverty in neighborhood State support of the poor Predictor Variables Alcohol abuse Response
NOTATION ( the reverse order of what I usually use!)
X & Y DIAGRAM Person X.p(sijk) Family X.f(sij) Neighborhood X.n(si) State X.s(s) Predictor Variables Response Y(sijk) Response
Standard Regression Analysis Assumptions Data follow normal distribution All the key covariates are included Xs are measured without error Responses are independent
Non-independence (dependence) within-cluster correlation Two responses from the same family (cluster) tend to be more similar than do two observations from different families Two observations from the same neighborhood tend to be more similar than do two observations from different neighborhoods Why?
EXPANDED DIAGRAM Personal income Family income Percent poverty in neighborhood State support for poor Predictor Variables Alcohol Abuse Genes Availability of bars Efforts on drunk driving Response Unobserved random intercepts; omitted covariates
X & Y EXPANDED DIAGRAM Person X.p(sijk) Family X.f(sij) Neighborhood X.n(si) State X.s(s) Predictor Variables Response Y(sijk) a.f(sij) a.n(si) a.s(s) Response Unobserved random intercepts; omitted covariates
Variance Inflation and Correlation induced by unmeasured or omitted latent effects Alcohol usage for family members is correlated because they share an unobserved "family effect" via common –genes, diet, family culture,... Repeated observations within a neighborhood are correlated because neighbors share common – traditions, access to services, stress levels,… Including relevant covariates can uncover latent effects, reduce variance and correlation
Key Components of a Multi-level Model Specification of predictor variables (fixed effects) at multiple levels: the "traditional" model –Main effects and interactions at and between levels –With these, it's already multi-level! Specification of correlation among responses within a cluster –via Random effects and other correlation-inducers Both the fixed effects and random effects specifications must be informed by scientific understanding, the research question and empirical evidence
INFERENTIAL TARGETS Marginal mean or other summary "on the margin" For specified covariate values, the average response across the population Conditional mean or other summary conditional on: Other responses (conditioning on observeds) Unobserved random effects
Marginal Model Inferences Public Health Relevant Features of the distribution of response averaged over the reference population –Mean response –Variance of the response distribution –Comparisons for different covariates Examples Mean alcohol consumption for men compared to women Rate of alcohol abuse for states with active addiction treatment programs versus states without –Association is not causation!
Conditional Inferences Conditional on observeds or latent effects Probability that a person abuses alcohol conditional on the number of family members who do A person's average alcohol consumption, conditional on the neighborhood averageWarning For conditional models, don't put a LHS variable on the RHS "by hand" Use the MLM to structure the conditioning
The Warning Model: Y it = 0 + 1 smoking it + e ij Don't do this Y i(t+1) | Y it = 0 + 1 smoking it + Y it + e* i(t+1) Do this (better still, let probability theory do it) Y i(t+1) | Y it = 0 + 1 smoking i(t+1) + (Y it – 0 - 1 smoking it ) + e** i(t+1) Because Unless you center the regressor, the smoking effect will not have a marginal model interpretation, will be attenuated, will depend on , won't be "exportable,"... See Louis (1988), Stanek et al. (1989)
Random Effects Models Latent effects are unobserved – inferred from the correlation among residuals Random effects models prescribe the marginal mean and the source of correlation Assumptions about the latent variables determine the nature of the correlation matrix
Conditional and Marginal Models Conditioning on random effects For linear models, regression coefficients and their interpretation in conditional & marginal models are identical: average of linear model = linear model of average For non-linear models, coefficients have different meanings and values -Marginal models: -population-average parameters -Conditional models: -Cluster-specific parameters
Death Rates for Coronary Artery Bypass Graft (CABG)
CABAG DEATH RATE
Observed & Predicted Deviations of Annual Charges (in dollars) for Specialist Services vs. Primary Care Services John Robinson's research Deviation, Specialists' Charges Square (blue) = Posterior Mean of Predicted Deviation Dot (red) = Posterior Mean of Observed Deviation
Observed and Predicted Deviations for Specialist Services: Log(Charges>$0) and Probability of Any Use of Service John Robinson's research Mean Deviation of Log(Charges >$0) Dot (red) = Posterior Mean of Observed Deviation Square (blue) = Posterior Mean of Predicted Deviation
Informal Information Borrowing
DIRECT ESTIMATES
Effect of Regressors at Various Levels Including regressors at a level will reduce the size of the variance component at that level And, reduce the sum of the variance components Including may change "percent accounted for" but sometimes in unpredictable ways Except in the perfectly balanced case, including regressors will also affect other variance components
"Vanilla" Multi-level Model (for Patients Physicians Clinics) i indexes patient, j physician, k clinic Y ijk = measured value for i th patient, j th physician in the k th clinic Pure vanilla Y ijk = + a i + b j + c k With no replications at the patient level, there is no residual error term Total Variance
Cascading Hierarchies
With a physician-level covariate X jk is a physician level covariate This is equivalent to using the full subscript X ijk but noting that X ijk = X ijk for all i and i Model with a covariate Yijk = + a i + b j + c k + X jk Compute the total variance and percent accounted for as before, but now there is less overall variability, less at the physician level and, usually, a reallocation of the remaining variance
Hypothetical Results Variance Component Percent of total Variance
Hypothetical Results Variance Component Percent of total Variance
Random Effects should replace "unit of analysis" Models contain Fixed-effects, Random effects (Variance Components) and other correlation- inducers There are many "units" and so in effect no single set of units Random Effects induce unexplained (co)variance Some of the unexplained may be explicable by including additional covariates MLMs are one way to induce a structure and estimate the REs