Download presentation
Presentation is loading. Please wait.
Published byAshlynn Bruce Modified over 8 years ago
1
A genetic algorithm for irregularly shaped spatial clusters Luiz Duczmal André L. F. Cançado Lupércio F. Bessegato 2005 Syndromic Surveillance Conference Statistics Department, Universidade Federal de Minas Gerais, Brazil
2
We propose a new approach to the detection and inference of irregularly shaped spatial clusters, using a genetic algorithm. We minimize the graph-related operations by means of a fast offspring generation and evaluation of the Kulldorff´s scan likelihood ratio statistic. This algorithm is more than ten times faster and exhibits less variance compared to a similar approach using simulated annealing, and thus gives better confidence intervals for the Monte Carlo inference process of significance evaluation for the most likely cluster found. An application to spatial disease cluster detection is discussed. ABSTRACT
3
Spatial Scan Statistics Kulldorff (1997) Map with m regions Total population N C cases Under the null hypothesis there is no cluster in the map, and the number of cases in each region is Poisson distributed.
4
For each circle centered in each centroid’s region, let z be the collection of regions that lie inside it. Let = number of cases inside z = expected cases inside z z if and one otherwise. The scan statistic is defined as
5
The collection (or zone) z with the highest L(z) is the most likely cluster. We sweep through all the m 2 possible circular zones, looking for the highest L(z) value. The whole procedure is repeated for thousands of times, for each set of randomly distributed cases. (Monte Carlo, Dwass(1957)). We need to compare this value against the max L(z) for maps with cases distributed randomly under the null hypothesis.
6
Duczmal L, Kulldorff M, Huang L. (2006) Extreme example of an irregularly shaped cluster
7
A(z)=area of the zone z H(z)=perimeter of the convex hull of z Compactness: Intuitively, the convex hull of a planar object is the cell inside a rubber band stretched around it. K(z) = the area of z divided by the area of the circle with perimeter H(z).
8
Circle: K(z) = 1 Square: K(z) = π/4 Compactness for some common shapes
9
Penalty function for the log of the likelihood ratio (LLR(z)) K(z).LLR(z).LLR(z) Generalized compactness correction: a = 1 : full compactness correction a = 0.5 : medium compactness correction a = 0.0 : no compactness correction
10
OBJECTIVE: Find a quasi-optimal solution for a maximization problem. Initial population. Random crossing-over of parents and offspring generation. Selection of children and parents for the next generation. Random mutation. Repeat the previous steps for a predefined number of generations or until there is no improvement in the functional. Genetic Algorithms
11
Initial population construction Start at a region of the map.
12
Initial population construction Add the neighbor which forms the highest LLR 2-cell zone.
13
Initial population construction Add the neighbor which forms the highest LLR 3-cell zone.
14
Initial population construction Add the neighbor which forms the highest LLR 4-cell zone.
15
Initial population construction Stop. (It is impossible to form a higher LLR 5-cell zone)
16
Initial population construction Start at another region of the map.
17
Initial population construction Add the neighbor which forms the highest LLR 2-cell zone.
18
Initial population construction etc. Repeat the previous steps for all the regions of the map.
19
THE OFFSPRING GENERATION (a simple example)
20
THE OFFSPRING GENERATION (a simple example)
21
THE OFFSPRING GENERATION (a simple example)
22
THE OFFSPRING GENERATION (a simple example) Another possible numbering
23
THE OFFSPRING GENERATION (a more sofisticated example)
24
One instance of two parent trees
25
Advantages: The offspring generation is very inexpensive; All the children zones are automatically connected; Random mutations are easy to implement; The selection for the next generation is straightforward; Fast evolution convergence; The variance between different test runs is small.
26
Population Evolution Performance
27
Irregularly shaped clusters benchmark, Northeast US counties map. Duczmal L, Kulldorff M, Huang L. (2006) Evaluation of spatial scan statistics for irregularly shaped clusters. To appear in J. Comput. Graph. Stat.
28
Power evaluation of the genetic algorithm, compared to the simulated annealing algorithm.
29
0 100 km Cluster of high incidence of breast cancer. São Paulo State, Brazil, 2002. Population adjusted for age and under-reporting. Compactness correction: 1.0 Cluster cases: 2,924 Cluster population: 346,024 Incidence: 0.00845 LLR: 298.9 p-value:0.001 Data source: DATASUS, G.L.Souza
30
0 100 km Compactness correction: 0.5 Cluster cases: 3,078 Cluster population: 361,373 Incidence: 0.00852 LLR: 343.8 p-value:0.001 Data source: DATASUS, G.L.Souza Cluster of high incidence of breast cancer. São Paulo State, Brazil, 2002. Population adjusted for age and under-reporting.
31
0 100 km Compactness correction: 0.0 Cluster cases: 3,324 Cluster population: 394,294 Incidence: 0.00843 LLR: 449.6 p-value:0.001 Data source: DATASUS, G.L.Souza Cluster of high incidence of breast cancer. São Paulo State, Brazil, 2002. Population adjusted for age and under-reporting.
32
Conclusions The genetic algorithm for disease cluster detection is fast and exhibits less variance compared to similar approaches; The potential use for epidemiological studies and syndromic surveillance is encouraged; The need of penalty functions for the irregularity of cluster’s shape is clearly demonstrated by the power evaluation tests; The power of detection of clusters is similar to the simulated annealing algorithm; The flexibility of shape control gives to the practitioner more insight of the geographic cluster delineation.
33
Duczmal L, Kulldorff M, Huang L. (2006) Evaluation of spatial scan statistics for irregularly shaped clusters. To appear in J. Comput. Graph. Stat. Duczmal L, Assunção R. (2004), A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters, Comp. Stat. & Data Anal., 45, 269-286. Kulldorff M, Huang L, Pickle L, Duczmal L. (2005) An Elliptic Spatial Scan Statistic. Submitted. Patil GP, Taillie C. (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Envir. Ecol. Stat., 11, 183-197. Tango T, Takahashi K. (2005) A flexibly shaped spatial scan statistic for detecting clusters. Int. J. Health Geogr., 4:11. Kulldorff M. (1997), A Spatial Scan Statistic, Comm. Statist. Theory Meth., 26(6), 1481-1496. Kulldorff M, Tango T, Park PJ. (2003) Power comparisons for disease clustering sets, Comp. Stat. & Data Anal., 42, 665-684. Kulldorff M, Feuer EJ, Miller BA, Freedman LS. (1997) Breast cancer clusters in the Northeast United States: a geographic analysis. Amer. J. Epidem., 146:161-170. de Souza Jr. GL (2005) The Detection of Clusters of Breast Cancer in São Paulo State, Brazil. M.Sc. Dissertation, Univ. Fed. Minas Gerais. References
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.