Disease Prevalence Estimates for Neighbourhoods: Combining Spatial Interpolation and Spatial Factor Models Peter Congdon, Queen Mary University of London 1
Data on disease prevalence Health data may be collected across one spatial framework (e.g. health providers), but policy interest may be contrasts in health over another spatial framework (e.g. neighbourhoods). Seek to use data for one framework to provide spatially interpolated estimates of disease prevalence for the other. But also incorporate neighbourhood morbidity indicators that may also provide information on prevalence 2
Data Framework Focusing on England, prevalence totals for chronic diseases maintained by 8200 general practices for their populations (subject to measurement error, excess or deficits in “case- finding”). See Prevalence data tables at See Prevalence data tables at These data not provided for any small area populations, e.g neighbourhoods across England (Lower Super Output Areas or LSOAs) Study focus: GP populations and LSOAs in Outer NE London (970K population) and on estimating neighbourhood psychosis prevalence 3
London Borough Map 4
Discrete Process Convolution Use principles of discrete process convolution to estimate neighbourhood prevalence. Geostatistical techniques (multivariate Gaussian process) computationally demanding for large number of units involved Base Framework: Prevalence for GP Populations Target Framework: Prevalence for Neighbourhoods 5
Discrete Process Model 6
Model for Base Framework, Study Data 7
Model for Target Framework 8
INCORPORATING OBSERVED INDICATORS of NEIGHBOURHOOD PREVALENCE 9
SCHEMATIC REPRESENTATION 10
LIKELIHOOD: REFLEXIVE INDICATORS 11
PARAMETER IDENTIFICATION 12
POTENTIAL SENSITIVITY IN INFERENCES & FIT Sensitivity to kernel density choice Sensitivity to constraint adopted (kernel scale set or known; process variance set or unknown) Sensitivity to form of process effects: e.g. w j normal vs Student t Sensitivity to density of discrete grid 13
SPATIAL SENSITIVITY IN INTERPOLATED NEIGHBOURHOOD PREVALENCE Can compare models in terms of localised hot spot probabilities of high psychosis risk Pr( k >1|y,h)>0.9 Or compare clustering of excess psychosis risk. Define binary indicators J k =I( k >1) Over MCMC iterations monitor excess risk in both neighbourhood k and its adjacent neighbourhoods l =1,..,L k. C k is probability indicator of high risk cluster centred on neighbourhood k. 14
Study Specifications 15
Fit Comparisons 16
Comparing Neighbourhood Spatial Risk Patterns 17
OVERLAP AT NEIGHBOURHOOD LEVEL (K=562) 18
Density plot (M4), prevalence rate 19
Map of Interpolated Neighbourhood Prevalenceunder M4 20
Map of Clustering Probabilities under M4 (posterior means of C k ) 21
Future Research Modify interpolation to include “formative” influences on prevalence (e.g. area deprivation) How does model work with other chronic diseases, or with jointly dependent disease outcomes (e.g. diabetes, obesity) Space-time prevalence models, etc 22
References Austerlitz C et al (2004) Using genetic markers to estimate the pollen dispersal curve Molecular Ecology, 13, 937–954 Clark J et al (1999) Seed dispersal near and far: patterns across temperate and tropical forests. Ecology, 80, 1475–