Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring spatial clustering in disease patterns. Peter Congdon, Queen Mary University of London

Similar presentations


Presentation on theme: "Measuring spatial clustering in disease patterns. Peter Congdon, Queen Mary University of London"— Presentation transcript:

1 Measuring spatial clustering in disease patterns. Peter Congdon, Queen Mary University of London p.congdon@qmul.ac.uk http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/ 1

2 Background: spatial principles and spatial correlation  Tobler’s First Law of Geography: “All places are related but nearby places are more related than distant places”  Spatial correlation (similar values in nearby spatial units) a common feature of geographic datasets (spatial econometrics, area health, political science etc).  Can have positive or negative correlation, but positive correlation most common  So spatial correlation indices measure correlation but also account for distance between spatial units (including spatial contiguity)  Reference (null) pattern: spatial randomness. Values observed at one location do not depend on values observed at neighboring locations 2

3 Background: spatial principles and spatial heterogeneity  Michael Goodchild in “Challenges in geographical information science”, Proc RSA 2011” mentions also a second empirical principle: spatial heterogeneity.  In fact, an example of such heterogeneity is local variation in the degree of spatial dependence, leading to local indices of spatial association 3

4 Background: observation types  My focus is on spatial lattice data: N areal subdivisions (e.g. administrative areas) which taken together constitute the entire study region.  Unlike point data (geostatistics), where major focus is on interpolating a response between observed locations. 4

5 Global Indices of Spatial Association  Moran Index (for N areas, continuous centred data Z i ) 5

6 Spatial Weights  Possible options for spatial weights W=[w ij ]  Adjacency: if area j is adjacent to area i, then w ij =1; otherwise w ij =0.  w ij a distance-based weight such as the inverse distance between locations i and j: w ij =1/d ij 6

7 Global Indices of Spatial Association: Binary data 7

8 Background: Area health data and spatial correlation  Health data with full population coverage (as opposed to survey data) often only available for geographic aggregates.  These may be small neighbourhoods, such as English lower super output areas (LSOAs). Average 1500-2000 population.  Small area units (with homogenous social structure, environment and other exposures) preferable for reducing ecologic bias 8

9 Background: Area health data and spatial correlation  Examples of area health data (e.g. for electoral wards, LSOAs): mortality data by cause, cancer incidence data, health prevalence data  Spatial correlation in area health outcomes reflects clustering in risk factors (observed and unobserved), such as deprivation/affluence, health behaviours, environmental factors, neighbourhood social capital 9

10 Bayesian Relative Risk Models for Area Spatial Data  Bayesian models for area disease risks now widely applied (to detect smooth underlying risk surface over space, etc).  Assume observed disease counts y i Poisson distributed, y i ~Po(e i r i ), (e i = expected counts)  Relative risks r i have average 1 when sum(expected)=sum(observed). Expected counts (demographic sense) based on applying region-wide disease rates to each small area population 10

11 Bayesian Relative Risk Models for Area Spatial Data  One option for modelling area relative risks, convolution scheme (Besag et al, 1991)  log(r i )=  +s i +u i,  Spatial error: s i ~Conditional Autogressive (CAR)  Heterogeneity/overdispersion error: u i ~ Unstructured White Noise 11

12 Neighbourhood Clustering in Elevated Risk  Consider binary risk measures: b i =1 if relative risk r i >1, b i =0 otherwise. This is latent (unknown) as r i is latent.  Can use other thresholds (e.g. r i >1.5)  Interest often in posterior exceedance probabilities of elevated disease risk E i =Pr(r i >1|y)=Prob(b i =1|y) in each area separately.  Possible rules: area i a hotspot if E i > 0.9 or if E i >0.8. Suitable threshold may depend on data frequency 12

13 13

14 Neighbourhood Clustering in Elevated Risk  “Hotspot” detection does not measure broader local clustering in relative risks.  High risk clustering:  (a) area i embedded in high risk cluster (aka, high risk cluster centre) both area i and all surrounding areas j have elevated risk, (E i and E j both high).  (b) High risk outlier or high risk cluster edge: high risk area i (E i high), but all or majority of adjacent areas j are low risk (E j low) 14

15 Neighbourhood Clustering in Elevated Risk  Low risk clustering:  (c) area i embedded in low risk cluster: both area i and surrounding areas have low risk (E i and E j both low).  (d) low risk outlier or low risk cluster edge: low risk area (E i low) but all or many adjacent areas are high risk 15

16 Spatial Scan Clusters  Most well known approach based on spatial scan method: produces lists of areas in a cluster at given significance, e.g. under Poisson model for {y i,e i } data  Spatial scan: circle (or ellipse) of varying size systematically scans the study region (moving window).  Each geographic unit (e.g. census tract) is a potential cluster centre.  Clusters are reported for those circles where observed values within circle are greater than expected values. 16

17 Stochastic Approach to Measuring Clustering in Elevated Risk  Method to be described provides measure of cluster status for each area in situation where relative health risks r i (and health status b i ) are unknowns  Can be considered a method of cluster detection, included in MCMC updating  Encompasses high risk and low risk clustering and also outliers (isolated high or low risk hotspot) 17

18 Synthetic Data 18  Known adjacency structure: 113 middle level super output areas (MSOAs) in Outer NE London  15 out of 113 areas have high RR (r i circa 1.75). Remainder have below average RR (r i circa 0.9).  High risk areas are located in three high risk clusters  Known y i and e i, and hence known crude relative risks, but whether RRs significantly elevated or not depends on information in data

19 Synthetic Data 19  Assess E i and b i (using convolution model) according to different expected cases: e i =20.39, or e i =58.77.  For e i =20.39, y i are either 18 or 36 (to ensure sum of observed and expected are the same)  For e i =58.77, y i are either 52 or 103

20 20 Synthetic Data. Exp=20.39, Known RRs

21 How to Detect Clustering in Relative Disease Risk: Local Join-Counts  Join counts (BB-WW-BW) measure global spatial clustering in binary risk indicators b i  How to detect local clustering of excess risk  Use local version of global BB statistic with summation only over neighbours of area i (not double summation) J 11i =b i ∑ j w ij b j  w ij =1 if areas (i,j) adjacent, w ij =0 otherwise 21

22 Local Join-Counts to describe local clustering  J 11i measures high risk “cluster embeddedness”  J 11i will be high for areas surrounded by other high risk areas  i.e. when area i and all/most neighbours j both have high risk. 22

23 Local Join-Counts to describe local clustering  Local version of BW statistic : J 10i =b i ∑ j w ij (1-b j )  Measure of “cluster marginality” (cluster edge areas) or of outlier status  Will be high when area i has elevated risk, but most/all neighbours have low risk 23

24 Local Join-Counts for low risk clustering  Local version of WW statistic : J 00i =(1-b i )∑ j w ij (1-b j ) area i and its neighbours both have low risk  Finally, local WB statistic. Measures situation of low risk area but discrepant from neighbours J 01i =(1-b i )∑ j w ij b j 24

25 Local Join-Counts under Binary Spatial Weights  Consider binary weights w ij  Denote areas adjacent to area i as its “neighbourhood”  L i =number areas adjacent to area i. That is total number of areas in neighbourhood N i of area i.  Common high risk joins formula (local BB count) is now J 11i =b i ∑ j  Ni b j  High risk discrepant join count: J 10i =b i ∑ j  Ni (1-b j )  Also: J 01i =(1-b i )∑ j  Ni b j  J 00i =(1-b i ) ∑ j  Ni (1-b j ) 25

26 Local Join-Counts under Binary Spatial Weights  Have L i =J 11i +J 10i +J 01i +J 00i  Multinomial sampling: Denominators L i known, but {J 11i,J 10i,J 01i,J 00i } are unknowns in modelling situation with relative disease risks r i and risk indicators b i as unknowns. 26

27 Probabilities of Local Clustering  Proportion π 11i of joins representing joint high risk, defined by E(J 11i )=L i π 11i  Estimate during MCMC run (J 11i and b i varying by iterations) as π 11i =J 11i /L i =b i ∑ j  Ni b j /L i  π 11i estimates probability that area i is member of high risk cluster.  As  11i  E i, area i likely to be cluster centre  Term ∑ j  Ni b j /L i  1 when all adjacent areas have definitive high risk  27

28 Probabilities of Local Clustering  Proportion of local joins that are (1,0) pairs, defined by E(J 10i )=L i π 10i  Estimates probability that area i is high risk local outlier  Estimate during MCMC run: π 10i =J 10i /L i =b i ∑ j  Ni (1-b j )/L i,  28

29 Decomposition of Exceedance Probability  Can show that E i =Pr(r i >1|y)=π 11i +π 10i  Have J 11i +J 10i =b i ∑ j  Ni b j +b i ∑ j  Ni (1-b j )=b i L i So E(J 11i )+E(J 10i )=E(b i )L i =E i L i Also E(J 11i )+E(J 10i )=L i π 11i +L i π 10i 29

30 Synthetic Data Example: Cluster Focus  Area 25, cluster centre. So also is area 23 in terms of having just high risk neighbours  Areas 27 and 28, cluster edges (have as many background risk neighbours as high risk neighbours) 30

31 Cluster Focus (simulation with average E i =20.39, and b i =1 if r i >1) 31

32 Cluster Focus (simulation with average E i =58.77, and b i =1 if r i >1) 32

33 Cluster Centres and Edges 33  Cluster centre status verified:  11i  E i for areas 25 and 23.  Cluster edge status becomes clearer with more frequent data (for areas 27 and 28)

34 Cluster Focus (simulation with average E i =20.39) Map of High Risk Cluster Probabilities  11i 34

35 Cluster Focus (simulation with average E i =58.77) Map of High Risk Cluster Probabilities  11i 35

36 Another simulation where clustering pattern known: cluster centre status under uneven risk scenario 36  Performance of  11i for measuring cluster centre status for contrasting situations  (1) EVEN RISK. High risk characterises all neighbours surrounding area i (so area i is cluster centre), and risk evenly distributed among neighbors  (2) UNEVEN RISK. High risk is not common to all neighbours, but unevenly concentrated among a few neighbors, so area i is no longer a cluster centre, and possibly a cluster edge.

37 Even risk vs uneven risk scenarios 37

38 38

39 Winbugs code 39  model {for (i in 1:N) {y[i] ~ dpois(mu[i]); mu[i] <- e[i]*r[i]  log(r[i]) <- alph+s[i]+u[i]; u[i] ~ dnorm(0,tau.u);  b[i] <- step(r[i]-1);  # joins and join counts  for (j in C[i]+1:C[i+1]) {  j11[i,j] <- b[i]*b.map[j]; j10[i,j] <- b[i]*(1-b.map[j])  j01[i,j] <- (1-b[i])*b.map[j]; j00[i,j] <- (1-b[i])*(1-b.map[j])}  J11[i] <- sum(j11[i,C[i]+1 : C[i+1]]); J10[i] <- sum(j10[i,C[i]+1 : C[i+1]])  J01[i] <- sum(j01[i,C[i]+1 : C[i+1]]); J00[i] <- sum(j00[i,C[i]+1 : C[i+1]])  pi.L[1,i] <- J11[i]/L[i]; pi.L[2,i] <- J10[i]/L[i]; pi.L[3,i] <- J01[i]/L[i];  pi.L[4,i] <- J00[i]/L[i]}  # neighbourhood vector of risks and indicators  for (i in 1:NN) { wt[i] <- 1; r.map[i] <- r[map[i]]; b.map[i] <- b[map[i]]}  # priors  alph ~ dflat(); tau.s ~ dgamma(1,0.001); rho ~ dexp(1); tau.u <- rho*tau.s  s[1:N] ~ car.normal(map[], wt[], L[], tau.s)}

40 Real Example: Suicide in North West England  Suicide counts {y i,E i } for 922 small areas (middle level super output areas, MSOAs) in NW England over 5 years (2006-10).  Model: y i ~Po(E i r i ), relative risks r i averaging 1 log(r i )=  +s i +u i,s i ~CAR,u i ~ WN o Overdispersion: u i needed as well as spatial term  Monitor exceedance and high risk clustering with b i =1 if r i >1, b i =0 otherwise.  Spatial interactions w ij binary, based on adjacency 40

41 41 Smoothed Suicide Risk Note small expected values e i, average 3.5: impedes strong inferences about elevated risk, and also about clustering

42 Real Example: Suicide in North West England  Flexscan (developed by Toshiro Tango) detects five significant clusters (p value under 0.05): most likely cluster (albeit irregular shape) consists of 9 areas in Blackpool. 42

43 43 High Suicide Risk Cluster, Blackpool and Surrounds

44 Real Example: Suicide in North West England, Areas within the Flexscan cluster 44

45 Exceedance Probs for Blackpool Suicide Cluster (ARCMAP area IDs) Possible Questions What is most plausible cluster centre (if any)? Which areas are more likely to be cluster edges? Of two areas inside the doughnut, area 7 has higher exceedance prob (E 7 =0.72, E 4 =0.48). Area 9 has E 9 =0.98, and five of 6 neighbours have E j >0.8. Other neighbour has E j =0.72. Area 9 has highest π 11i namely 0.87. Area 6 has four neighbours, only two with E j >0.8, two with E j below 0.5 (E 4 =0.48, E 41 =0.26). Has π 11i =0.54, π 10i =0.34  cluster edge Possible Questions What is most plausible cluster centre (if any)? Which areas are more likely to be cluster edges? Of two areas inside the doughnut, area 7 has higher exceedance prob (E 7 =0.72, E 4 =0.48). Area 9 has E 9 =0.98, and five of 6 neighbours have E j >0.8. Other neighbour has E j =0.72. Area 9 has highest π 11i namely 0.87. Area 6 has four neighbours, only two with E j >0.8, two with E j below 0.5 (E 4 =0.48, E 41 =0.26). Has π 11i =0.54, π 10i =0.34  cluster edge 45

46 Local Join-Counts for Bivariate Clustering  Local BB statistic for two outcomes A, B with event counts y Ai, y Bi. Binary indicators b ABi =1 if both r Ai >1 and r Bi >1 b ABi =0 otherwise  Bivariate high risk clustering local join count J 11ABi =b ABi ∑ j w ij b ABj 46

47 Local Join-Counts for Bivariate Clustering  J 11ABi high in bivariate high risk cluster – when area i, and neighbours j of area i, both have high risk on both outcomes.  Bivariate high risk clustering probability π 11ABi, proportion of joins that are joint high risk, defined by E(J 11ABi )=L i π 11ABi  Estimate during MCMC run via π 11ABi =J 11ABi /L i 47

48 Two outcomes: Likelihood and Prior  y A suicide deaths, y B self-harm hospitalisations  Self harm much more frequent than suicide, average e i is 93.  Likelihood y Ai ~ Po(e Ai r Ai ), y Bi ~ Po(e Bi r Bi )  Assume correlated spatial effects log(r Ai )=  A +s Ai +u Ai ; log(r Bi )=  B +s Bi +u Bi, u Ai ~ WN, u Bi ~ WN S A:B,i ~BVCAR, 48

49 49 Example: suicide mortality and self-harm hospitalisations in North West England Smoothed suicide risk, Wigan and adjacent boroughs

50 50 Example: suicide mortality and self-harm hospitalisations in North West England Smoothed self-harm risk, Wigan and adjacent boroughs

51 51 Bivariate clustering: suicide and self-harm, Wigan and surrounds Probabilities π 11ABi of joint outcome high risk cluster status

52 Another Bivariate Example: Pre-Primary Obesity (y A ) and End-Primary Child Obesity (y B ) in NE London. Map is of RRs in Pre-Primary Obesity 52

53 RRs for End-Primary Child Obesity (y B ). Relative risks in this outcome show negative skew 53

54 Probabilities of Joint High Risk Clustering 54

55 Probabilities of Joint Low Risk Clustering 55

56 Final Thoughts  Cluster status approach provides alternative/complementary perspective to “list of areas” approach, and provides additional insights with regard to  cluster centres vs edges,  low risk clustering as well as high risk clustering in an integrated perspective,  high/low risk outliers  Allows assessment of impacts of covariates on spatial clustering  Can also apply bivariate method when outcome A is disease, outcome B is risk factor. Detects varying strength of association between disease and risk factor 56


Download ppt "Measuring spatial clustering in disease patterns. Peter Congdon, Queen Mary University of London"

Similar presentations


Ads by Google