Download presentation
Presentation is loading. Please wait.
Published byMeryl Fisher Modified over 9 years ago
1
Bayesian factor and structural equation models in spatial applications. Specification, identification and model assessment, with case study illustrations Peter Congdon, Queen Mary University of London Dept of Geography & Centre for Statistics
2
Outline Background: Bayesian approaches to LV models, advantages & disadvantages Computational options including WINBUGS Wider application contexts of Bayesian LV & SEM models Spatial Priors; Common Spatial Factors
3
Outline (continued) Different sorts of spatial factor model (depending on form of manifest variables) and possible identification issues Assessing models, model fit & model choice. Possible variable/model choice approaches Case studies
4
Case Studies Social capital & mental health, multilevel model using Health Survey for England (HSE) Multilevel model for joint prevalence of obesity & diabetes, BRFSS respondents nested within US counties & states (CDC Behavioral Risk Factor Surveillance System) Suicide & self-harm, ecological study for small areas (wards) in Eastern England
5
Background SEM and factor models originate in (& still most widely used) in psychological, educational & behavioural applications. Recent Bayesian applications to psychological & education testing data include SEM (e.g. Lee & Song, 2003), LCA, item analysis, and factor analysis per se (e.g. Aitkin & Aitkin, 2005; Press & Shigemasu, 1998). Also some work on automated Bayesian model choice in normal linear factor model
6
Advantages of Bayesian Approach Straightforward to depart from standard assumptions often built into classical estimation methods (e.g. factor scores multivariate normal & independent over subjects) Advantage in generalizations such as nonlinear factor effects, multiplicative factor schemes
7
Advantages of Bayesian Approach (continued) Random effect models (of which factor/SEM models are subclass) can be fitted without relying on numerical methods to integrate out random effects Potential for Bayesian model choice procedures (e.g. stochastic search variable selection) in factor/SEM models
8
Disadvantages of Bayesian Approach Identification issues (re “naming” of factors): can have label switching for latent constructs during MCMC updating if there aren’t constraints to ensure consistent labelling. Slow convergence of model parameters or global model fit measures (e.g. DIC and effective parameter estimate) in large latent variable applications (e.g. 1000 or 10000 subjects)
9
Disadvantages of Bayesian Approach Formal Bayes model assessment (marginal likelihoods/Bayes factors) difficult for large realistic applications Sensitivity to priors on hyperparameters (e.g. priors for factor covariance matrix) Bayesian approach may need sensible priors when applied to factor models (“diffuseness“ not necessarily suitable)
10
Bayesian Computing Many Bayesian applications to SEM and factor analysis facilitated by WINBUGS package. See Congdon (Applied Bayesian Modelling, 2003); Lee (Structural Equation Modeling: a Bayesian Approach, 2007) Alternative is R…more programming involved BayesX can’t model common factors
11
WINBUGS Despite acronym, WINBUGS employs Metropolis-Hastings updating where necessary as well as Gibbs sampling Program code is essentially a description of the priors & likelihood, but can monitor model-related quantities of interest
12
Computing Illustration: a Normal SEM Wheaton Study: 3 latent variables, each measured by two indicators. Alienation67 measured by anomia67 (1967 anomia scale) and powles67 (1967 powerlessness scale). Alienation71 is measured in same way, but using 1971 scales. Third latent variable, SES (socio-economic status) measured by years of schooling and Duncan's Socioeconomic Index, both in 1967.
13
Structural model relates alienation in 1971 (F 2 ) to alienation in 1967 (F 1 ) and SES (G) Structural model relates alienation in 1971 (F 2 ) to alienation in 1967 (F 1 ) and SES (G) F 2i = βF 1i + 2 G i +u 2i F 2i = βF 1i + 2 G i +u 2i F 1i = G i + u 1i F 1i = G i + u 1i Measurement model for alienation Measurement model for alienation y ji = j + j F 1i j=1,2 y ji = j + j F 1i j=1,2 y ji = j + j F 2i j=3,4 y ji = j + j F 2i j=3,4 Measurement model for SES Measurement model for SES x ji = j + j G i j=1,2 x ji = j + j G i j=1,2
14
WINBUGS for Wheaton study model { for (i in 1:n) { # structural model F2[i] ~ dnorm(mu.F2[i],1); mu.F2[i] <- beta* F1[i]+gam[2]*G[i] F1[i] ~ dnorm(mu.F1[i],1); mu.F1[i] <- gam[1]*G[i]} # priors (normal uses inverse variance) for (j in 1:2) {gam[j] ~ dnorm(0,0.001)} beta ~ dnorm(0,0.001)
15
# measurement equations for alienation for (i in 1:n) { for (j in 1:4) { y[i,j] ~ dnorm(mu[i,j],tau[j])} mu[i,1] <- alph[1]+lam[1]* F1[i]; mu[i,2] <- alph[2]+lam[2]* F1[i] mu[i,3] <- alph[3]+lam[3]* F2[i]; mu[i,4] <- alph[4]+lam[4]* F2[i]} # PRIORS for (j in 1:4){ alph[j] ~ dnorm(0,0.001); # gamma prior on precisions tau[j] ~ dgamma(1,0.001) # alternative prior starts with s.d. of residuals # sd.y[j] ~ dunif(0,100); tau[j] <- 1/(sd.y[j]*sd.y[j]) # identifiability constraint on loadings to ensure # positive alienation measure lam[j] ~ dnorm(1,1) I(0,)}
16
# measurement of SES (G[i]) for (i in 1:n) { G[i] ~ dnorm(0,1) for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])} mu.x[i,1] <- del[1]+kappa[1]* G[i]; mu.x[i,2] <- del[2]+kappa[2]* G[i]} for (j in 1:2) {del[j] ~ dnorm(0,0.001); # gamma prior on precisions tau.x[j] ~ dgamma(1,0.001) # identifying constraint ensures +ve SES scale kappa[j] ~ dnorm(1,1) I(0,)}}
17
Monitoring model related quantities Suppose one were interested in posterior probs that F 2i > F 1i (alienation increasing for i th subject) Add code for (i in 1:n) {delF[i] <- step(F2[i]-F1[i])} for (i in 1:n) {delF[i] <- step(F2[i]-F1[i])} Then posterior means of delF provide required probabilities
18
Widening Applications of Latent Variable Methods In particular: application contexts of Bayes SEM/factor models now include ecological (area level) studies of health variations. Usually no longer valid to assume units (i.e. areas) are independent. Instead spatial correlation in latent variable(s) (common spatial factors) over the areas should be considered
19
Multi-Level Latent Variable Models Latent variable methods also more widely applied in multilevel health studies Such models consider joint impact of individual level and area level risk factors on health status With several outcomes (data both multivariate & multilevel) can model area effects using common factor(s)
20
SOME SPATIAL PRIORS: THE BASIS FOR COMMON SPATIAL FACTORS
21
Priors incorporating spatial structure: basis for common spatial factors May be specified over continuous space (geostatistical models often used for “kriging”) OR for discrete sets of areas with irregular boundaries (“lattices” or “polygons”) Major classes: Simultaneous Autoregressive (SAR) or Conditional Autoregressive (CAR) priors
22
Spatial Priors My focus: CAR priors for “lattices” (e.g. administrative areas) These are priors for “structured” effects (where labels of area units are important) as opposed to unstructured effects (unaffected or exchangeable over different labelling scheme for areas)
25
Substantive Basis Generally taken to represent unmeasured area level risk factors for health that vary relatively smoothly over space (regardless of arbitrary administrative boundaries that may define units of analysis) Substantive grounding: increased recognition of genuine spatial effects on health (“contextual” effects)
26
DIFFERENT TYPES OF COMMON SPATIAL FACTOR
27
(A) Manifest health variables Manifest variables are health outcomes y ij (areas i, variable j) Common residual factor s i, expresses spatial clustering recurring over several outcomes j Interpretable as index of common health risks over outcomes Example: Wang & Wall 2003
28
(B) Census Indicator Confirmatory Model. Common Spatial Socioeconomic Factor or Factors (deprivation, rurality, etc) based on relevant indicators Z ik (k=1,..,K) such as unemployment, low income etc. Often census indicators form bulk of manifest variables Example: Hogan & Tchernis JASA 2004
29
(C) Two Classes of Manifest Variable Common factor(s) used to explain variations in observed Y variables (health outcomes). But factors mainly measured by socioeconomic indicators Z (e.g. census data) Example: my Eastern region suicide study Partly confirmatory, partly exploratory
30
MANIFEST VARIABLES: AREA HEALTH VARIABLES
31
(A) Shared Spatial Residual Effects Unobserved area effects common to several health outcomes modelled by shared spatial effect Typical scenario: area counts y ij for areas i and outcomes j. Poisson or binomial likelihood
32
Types of Event May be deaths, hospitalizations, incidence counts for different cancer types, prevalence counts, etc Expected events (offset) E ij based on standard age rates applied to area populations: y ij ~ Poisson(E ij ij ) Can also have populations at risk: y ij ~ Poisson(N i ij ) or y ij ~ Bin(N i, ij )
33
Multivariate Spatial Effects One option for such data: no reduction Multivariate residual effects log( ij )= j +s ij log( ij )= j +s ij (or log( ij )= j + j x i +s ij ) (or log( ij )= j + j x i +s ij ) For s ij could use multivariate version of conditional autoregressive prior
34
Multivariate Spatial Effects Multivariate normal CAR Prior is example of Markov Random Field (Rue & Held, 2005). Easily applied in WINBUGS using mv.car prior. May fit well but proliferation of parameters (more parameters than data points)
35
Alternative : common spatial factor log( ij )= j + j s i Parsimonious and provides interpretable summary measure of health risk s i is univariate CAR (or some other prior with spatial dependence) Correlation between outcomes within areas modelled via loadings j.
36
Identification: Location & Scale Need i s i =0 for location identification. Centre effects at each MCMC iteration. Scale identifiability: EITHER set var(s)=1 and all j are free loadings (fixed scale), OR leave var(s) unknown and constrain a loading, e.g. 1 =1.0 (anchoring constraint)
37
Labelling Problems in Repeated Sampling Even in simple model, labelling may be an issue. Consider fixed variance identification option, var(s)=1, loadings all unknown. Suppose diffuse priors are taken on loadings in log( ij )= j + j s i log( ij )= j + j s i without directional constraint.
38
Labelling Problems (continued) Then can have: s i acting as positive measure of health risk (higher s i in areas with higher cancer rates) a) j all positive combined with s i acting as positive measure of health risk (higher s i in areas with higher cancer rates) OR OR b) all negative combined with s i acting as negative measure of health risk (s i higher in areas with lower cancer rates) b) j all negative combined with s i acting as negative measure of health risk (s i higher in areas with lower cancer rates)
39
Identifying constraints for consistent labelling For unambiguous labelling advisable to constrain one or more j to be positive (e.g. truncated normal or gamma prior) Note that anchoring constraint with var(s) unknown, and preset loading (e.g. 1 =1.0), may be intrinsically better identified – steers remaining unknown coefficients to consistent labelling
40
Loadings and Labellings May not be sufficient just to rely on constraining one loading (e.g. assume +ve) to ensure consistent labelling Sometimes said that constraining direction on one loading ensures consistent identification… What if indicator chosen for constrained loading (e.g. ii > 0) is poor measure for construct
41
Loadings and Labellings If twenty indicators are measuring a construct, the 19 unconstrained loadings may “fit” a different label (e.g. deprivation) to that implied by the remaining constrained loading (e.g. affluence) Personal View: Much depends on suitable selection of manifest indicators and which (and how many, maybe >1 ) are chosen to have constrained loadings
42
WINBUGS Code for manifest variable scenario A
43
Extensions of Spatial Common Factors Product schemes. Consider health outcomes arranged by area i and age x. Populations at risk N ix y ix ~ Poisson(N ix ix ) y ix ~ Poisson(N ix ix ) log( ix )= x + x s i log( ix )= x + x s i x show which age groups are most sensitive to spatial variations in risk represented by s i Variation on Lee-Carter (JASA 1992) mortality forecasting model
44
Random Effect Loadings x potentially random, rather than fixed effects. Identified using sum to 1 or averaging to 1 constraint, e.g. x multinomial, or x ~Gamma(h,h)
45
Nonlinear effects of common factor One possibility: just take powers of s i, e.g. log( ij )= j + j s i + j s 2 i log( ij )= j + j s i + j s 2 i Or: spline for nonlinear effects in common factor score s i. e.g. under fixed variance var(s)=1 option, locate knots k at selected quantiles on cumulative standard normal.
46
Linear Spline Then linear spline log( ij )= j + j s i + k b jk (s i - k ) + log( ij )= j + j s i + k b jk (s i - k ) + b jk might be random effects, but raises identification issues…?
47
INDICATOR BASED SPATIAL CONSTRUCTS
48
(B) Indicator Based Spatial Constructs Many studies use latent constructs to analyze population health variations. Such constructs (e.g. deprivation) not directly observed Instead derived from a collection of relevant indicator variables that are observed, using multivariate techniques or other “composite variable” methods Many health outcomes show “deprivation gradient”
49
Latent Constructs in Population Health Example: Townsend deprivation score based on summing standardized census area values for 4 input variables (sum of “z scores”) % unemployed, % with no car, % households overcrowded, % households not owner occupiers
50
Other area constructs Other examples of latent constructs relevant to area health variations: rurality/urbanicity, social fragmentation Social fragmentation scores used to analyze variations in area suicide rates and psychiatric hospitalization rates
51
Confirmatory Indicator Based Model Confirmatory model: indicators k=1,..,K are established proxies for latent construct e.g. area unemployment rates, welfare recipients, social housing rates as indicators of area deprivation Census rates r ik =z ik /D ik where z ik are counts (e.g. unemployed), D ik are relevant denominators (e.g. econ active populations).
52
One option for confirmatory model Use Gaussian approximation to binomial (Hogan & Tchernis JASA 2004) with variance stabilizing transformation: R ik = r ik, var(R ik )= k /D ik. → normal measurement equations → normal measurement equations R ik ~N( k k F i, k /D ik) R ik ~N( k k F i, k /D ik) where F i scores follow spatial CAR prior where F i scores follow spatial CAR prior
53
Or use relevant Exponential Family links in deriving common spatial factor P(z ik | ik ) = exp([z ik ik -b( ik )]/ +c(z ik, )) e.g. z ik binomial with populations N i, z ik ~ Bin(N i, ik ) Logit link, plus overdispersion effects w ik logit( ik )= k k F i +w ik logit( ik )= k k F i +w ik w ik : normal and uncorrelated over indicators k.
54
For other indicators transform to normality For intrinsic proportions (e.g. proportion of area that is green space as indicator of rurality) take logit transform to approximate normality for population density take log transform etc
55
TWO CLASSES OF MANIFEST VARIABLE
56
(C) Spatial Factors in Model with 2 classes of manifest variable Health Outcomes Y ij (j=1,…,J); e.g. mortality or incidence counts Social Indicators Z ik (k=1,..k); e.g. census rates of unemployment Typical Scenario: multiple common spatial factors (F 1i,..,F Qi ) primarily measured by Z variables (indicators established as relevant).
57
2 class model But Factors F also act to potentially explain area variations in health outcomes Y. Z to F links confirmatory, Y to F links exploratory
58
Example Four Poisson health outcomes Y1-Y4, Eight indicators: Z1-Z4 measure F1; Z5-Z8 measure F2 ; both F1 and F2 may explain Y Y ij ~ Po(E ij ij ) log( ij )= j + j1 F 1i + j2 F 2i Z ik ~ EF( ik ) g( i1 )= 1 F 1i +w i1 …… g( i5 )= 5 F 2i +w i5 ………
59
MODEL CHOICE
60
Formal Choice or Not Formal Bayes model criteria (e.g. marginal likelihood/Bayes factor) difficult to derive; also change with priors Popular alternative (AIC analogue): Deviance Information Criterion (DIC). Average deviance Dev.bar + effective parameter count d e DIC=Dev.bar+ d e DIC=Dev.bar+ d e
61
Model Fit in Realistic Applications Multilevel applications to health survey data may involve thousands of subjects (e.g. HSE study). Ecological applications may involve hundreds of small areas (Eastern region suicide study)
62
Model Fit in Realistic Applications Convergence of DIC and d e typically slow in models with many random effects (such as factor scores) Slow convergence also applies to other measures of fit, e.g. Monte Carlo estimates of conditional predictive ordinates Model selection alternatives…
63
Model Choice using Variable Selection Model selection potentially for both loadings and factor variance/covariance structure. Don’t necessarily apply selection for all elements in any particular application (e.g. depending whether exploratory or confirmatory) Apply to selected aspects of spatial SEM models, e.g. loadings only or correlations between factors only
64
Selection in 2 manifest variable SEM Spatial factor models with 2 types of manifest variable (health outcomes Y j + socioeconomic indices Z k ) Apply selection to loadings jq linking Y j to F q (exploratory part of model) But don’t apply selection to Z on F loadings (confirmatory sub-model based on extensive prior knowledge)
65
Mixture Priors for Selecting Loadings
67
Random Effects Selection Selection procedures for random effects and/or their variance/covariance structure e.g. Cai and Dunson (2008), Tüchler & Frühwirth-Schnatter (2008) e.g. Cai and Dunson (2008), Tüchler & Frühwirth-Schnatter (2008) These extend to factor and SEM models as factors are shared random effects
68
RE Selection: Multivariate Spatial Prior Q>1 for shared common spatial factors Within area covariance matrix in MCAR prior denoted F
69
Cholesky Decomposition of Covariance Matrix F
70
Selection on variances and/or covariances Suppose investigator sure about number of factors (confirmatory model based on substantial evidence) BUT not sure whether correlations between factors are needed Selection can be applied to relevant parameters in decomposition of F → mixture prior selection on qr parameters to decide whether correlations needed
71
CASE STUDIES
72
Social capital and mental health, multilevel model using Health Survey for England (HSE) Multilevel model, joint prevalence of obesity & diabetes, BRFSS subjects nested within US counties & states Suicide & self-harm, ecological (area) study for wards in Eastern England
73
Case Study 1, Mental Health & Social Capital, Health Survey for England Y is observed mental health status (binary). Y is observed mental health status (binary). Y=1 if subject’s GHQ12 score is 4 or more, Y = 0 otherwise. Pr(Y=1) related to known socioeconomic risk factors X at individual subject level Pr(Y=1) also related to known indicators of geographic context, G (e.g. micro-area deprivation quintile, region of residence, urban- rural residence). Micro-areas (32K in England) called Super Output Areas
74
Latent Risks Finally Pr(Y=1) also related to latent subject level risks, {F 1i,F 2i,...,F Qi } Examples: social capital, perceived stress. Structural model: Y~f(Y|X,G,F, )
75
Health Outcome Sub-Model Regression involves 9065 adult subjects. Y i ~ Bin(1, i ). Use log-link (→relative risk interpretation). Q=1 for single latent risk factor (social capital) log( i )=βX i +γG i + F i =β ₀ +β 1,gend[i] +β 2,age[i] +β 3,eth[i] +β 4,oph[i] +β 5,own[i] +β 6,noqual[i] + 1,reg[i] + 2,dep[i] + 3,urb[i] + F i
76
Multiple Indicators for Social Capital Social capital measured by a battery of K survey `items' (e.g. questions about neighbourhood perceptions, organisational memberships etc), {Z ₁,...,Z K } Z~g(Z|F, ) Z~g(Z|F, ) e.g. with binary questions, link probability of positive response k =Pr(Z k =1) to latent construct via logit( k )= k + k F logit( k )= k + k F
77
Indicators of Social Capital Social Support Score (Z1) 5 binary items (Z2-Z6) relate to neighbourhood perceptions (e.g. can people be trusted?; do people try to be helpful?; this area is a place I enjoy living in; etc) Final item (Z7) relates to membership of organisations or groups.
78
Multiple Causes of Social Capital Social capital varies by demographic groups and geographic context (urban status, region, small area deprivation category, etc). So have multiple causes of F as well as multiple indicators F ~ h(F|X*,G*, φ) F ~ h(F|X*,G*, φ) X* and G* are individual and contextual variables relevant to “causing” social capital variations
79
Multiple Cause Sub-Model F i ~N(μ i,1) μ i =φ 1,gend[i] +φ 2,eth[i] +φ 3,noqual[i] +φ 4,urb[i] +φ 5,reg[i] +φ 6,dep[i]. +φ 6,dep[i]. φ: fixed effects parameters with reference category (zero coeff) for identification φ: fixed effects parameters with reference category (zero coeff) for identification Only small number of regions in HSE If had finer spatial detail could take area φ effects spatially random (but weak identification…?)
80
Effect of F on Y Social capital has significant effect in reducing the chances of psychiatric caseness. The effect of social capital apparent in relative risk 0.35 of psychiatric morbidity for high capital individuals (with score F=+1) as compared to low capital individuals (with F=-1). Obtained as exp(-0.525)/exp(0.525) = -0.525 is coefficient for social capital effect.
81
Geographic Context: Micro-area Deprivation Gradient from Multiple Cause Model
82
Case Study 2: Diabetes & Obesity in US Data from 2007 Behavioral Risk Factor Surveillance System (BRFSS) Multinomial outcome (J=6 categories) defined by diabetic status and weight category (obese, overweight, normal).
83
Multinomial Categories Reference category are subjects with neither condition. All other categories are “ill” relative to reference category
84
Multilevel multicategory regression Regression includes: o subject level risk factors (age, ethnicity, gender, education), o known geographic effects (e.g. county poverty), o county and state random effects to model unknown geographic influences (e.g. unknown environmental exposures).
85
Regression & Likelihood
86
Model Form Model includes known subject risk factors and contextual variables (e.g. county poverty) Unknown contextual risks: assume county and state latent effects, shared over categories j=1,..,J-1. Illustrates nested latent spatial effects
87
County & State Effects Take county effects v c (c=1,..,3142) to be spatially correlated CAR But u s (state effects, s=1,..,51) taken to be unstructured. Avoids confounding of two spatially structured effects
88
Regression Terms for j=1,..J-1
89
Case Study 3, Suicide & Self Harm: Eastern Region Wards in England Two classes of manifest variables Y 1 -Y 4 : suicide totals in small areas Z 1 -Z 14 : Fourteen small area social indicators Q=3 latent constructs (F 1 fragmentation, F 2 deprivation, F 3 urbanicity). Converse of F 3 is “rurality”. Common spatial factors.
90
Local Authority Map: Eastern England
91
Geographic Framework N=1118 small areas (called wards, subdivisions of local authorities). Small area focus beneficial: people with similar socio-demographic characteristics tend to cluster in relatively small areas, so greater homogeneity in risk factors related to social status On other hand, health events may be rare…
92
Confirmatory Sub-Model Confirmatory Z-on-F model Each indicator Z k loads only on one construct F q. Most indicators binomial. A few taken as normal after transformation. Mostly 2001 Census, a few non-census (service access score, proportion greenspace)
93
Exponential Family Model for modelling Z-on-F effects For indicator k 1,..,14, G k 1,2,3 denotes which construct it loads on. which construct it loads on. Regression with link g allows for overdispersion via “unique” w effects g( ik )= k k,G k F[G k,i]+w ik g( ik )= k k,G k F[G k,i]+w ik
94
Expected Direction of Confirmatory Model Loadings Expected Direction of Confirmatory Model Loadings
95
Health Outcome Sub-Model (Y-on- F effects) Model for Y-on-F effects Y ij ~ Po(E ij ij ) j=1,..,4 Y ij ~ Po(E ij ij ) j=1,..,4 log( ij )= j + j1 F 1i + j2 F 2i + j3 F 3i +u ij log( ij )= j + j1 F 1i + j2 F 2i + j3 F 3i +u ij Coefficient selection on jq using relatively informative priors under “retain” option when J jq =1. Using diffuse priors means null model tends to be selected
97
Redundant Coefficients Some coefficients (e.g. urbanicity on male and female suicide, deprivation on female suicide) not retained under model selection Four coefficients in the Y-on-F model were set to zero in at least some MCMC iterations → averaging over 2 4 Y-on-F models
102
Future Directions in Spatial Factor Modelling Extend model selection to interactions between factors, nonlinear effects etc In England, model area socioeconomic structure (and maybe some health outcomes) at “neighbourhood” level (32000 “Super Output Areas” with mean population 1500). In US, similar scope for modelling SES structure in relation to health events for Zip Code Tabulation Areas or ZCTAs (around 31K across US, on average about 10K population)
103
More generally Bayesian software options for latent variable and SEM applications more widely available Potentialities of WINBUGS in this context not always appreciated Scope for dedicated Bayesian factor analysis package
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.