NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.
Geoinformatic hotspot surveillance system
Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal Surveillance Typology of Space-Time Hotspots Hotspot Prioritization Ranking Without Having to Integrate Multiple Indicators Surveillance Geoinformatics for Hotspot Detection, Prioritization, Early Warning and Sustainable Management Upper Level Set Scan System Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome Federal Agency Partnerships CDC DOD EPA NASA NIH NOAA USFS USGS Features of ULS Scan Statistic: Identifies arbitrarily shaped hotspots Applicable to data on a network Confidence sets and hotspot ratings Computationally efficient Generalizes to space-time scan Poset Prioritization System Objective: Prioritize or rank hotspots based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Haase diagrams and partially ordered sets. Example: Prioritization of disease clusters with Multiple Indicators National Applications and Case Studies Biosurveillance Carbon Management Costal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Sensor Networks Robotic Networks Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Syndromic Surveillance Social Networks Stream Networks g Changing Connectivity of ULS as Level Drops G.P. Patil, R. Acharya, W.L. Myers, P. Patankar, Y. Cai, and S.L. Rathbun The Penn State University, University Park, PA R. Modarres George Washington University, Washington, D.C. Example: West Nile Virus First isolated in 1937, this mosquito born disease, indigenous to north Africa, the Middle East and west Asia was first introduced into the United States in Disease Count Quintiles Population Quintiles Disease Rate QuintilesLikelihood Quintiles Comparison of ULS Scan with Circular Scan ULS ScanCircular Scan Confidence set for ULS Hotspot Hotspot Membership Rating Example: Lyme Disease Infections from the bacterium Borelia burgdorfei vectored by ticks from the genus Ixodes. ULS Scan Cylindrical Scan Example: Human-environment indicator values for 16 European countries. There are a total of 3,764,448 admissible linear extensions. The cumulative rank function for Sweden exceeds that of all remaining countries. The crf’s of all countries dominate that of Ireland. The remaining countries cannot be uniquely ordered based on their crf’s. Belgium, Netherlands and United Kingdom have identical crf’s. Admissible linear extensions are comprised of rankings compatible with the rankings of all indicators. Treating each linear extension as a voter, the cumulative rank function is obtained from the frequencies at which each object receives each rank. Disease Rates Comparison of ULS Scan with Cylindrical Scan Year Haase Diagram The crf’s also form a partially ordered set. There are only 182 admissible linear extensions for this poset, yielding the cumulative rank function: One more iteration yields the rankings in the data table. Center for Statistical Ecology and Environmental Statistics
Demonstration Example DataSet
Demonstration Example Data Cases Disease Rate Population Likelihood
Demonstration Example Upper Level Set Tree: Cells
Demonstration Example Upper Level Set Tree: Zones [3] [3,18] [3,18,0] [17] [17,16] [14] [17,16;14;15] [3,18,0,4,8,7,19,5;17,16,14,15;11] [8] [3,18,0,4] [3,18,0,4;8,7;19] [8,7]
ULS Scan Disease RateHotspots using ULS ULS Scan Hotspot 1 Hotspot 2 (red) (orange) Log Likelihood p-value Relative Risk
Confidence Set for ULS Hotspot Hotspot membership rating
Bivariate Hotspot Detection The circle-based SaTScan and data-driven ULS scan statistics are designed to identify hotspots based on the elevated responses of one variable over the scan region. These techniques are appropriate for detecting univariate hotspots. What can be done when the data under consideration provide many correlated responses in each cell?
Bivariate Hotspot Detection A simple and effective approach to multivariate hotspot detection applies the univariate ULS to each variable in the data set and identifies the univariate hotspots. Multivariate hotspots are those connected cells that appear in the intersection of the univariate hotspots of all variables. We will refer to this strategy as the intersection method.
Use of Covariates Another approach to multivariate hotspot detection calls for the use of explanatory variables, Patil and Taillie (2004). The size (population, area, etc) are proportional to model expectations and provide a link between a response variable and other explanatory variables. Regression techniques often provide a basis for adjusting the rates when a functional relationship is identified. To obtain hotspots based on all variables, the univariate ULS scan statistic is applied to the response variable and the adjusted sizes.
Bivariate Data For each cell a, observations are available in the form of quadruplets (X_a,Y_a,B_a, A_a) where X_a, Y_a and B_a are non-negative integers and A_a is a fixed and known constant. Suppose N_a=A_a people reside in cell a where each person has two certain diseases with probabilities Πx and Π y. The variable X_a is a count of the number of people in cell a who have disease X. Similarly, Y_a counts the number of people in cell a who have disease Y. The variable B_a counts the number of people in cell a who have both diseases. One can also formulate an equivalent approach when a count of individuals who are disease-free is available for every cell.
Table I: bivariate Bernoulli distribution defined on cell a. Y=0Y=1Total X=0P00P011-Πx X=1P10P11ΠxΠx Total1- ΠyΠyΠy1
Table I: bivariate Bernoulli distribution defined on cell a. Y=0Y=1Total X=0P00P011-Πx X=1P10P11ΠxΠx Total1- ΠyΠyΠy1
Bivariate Binomial Model If (X_a,Y_a) has a bivariate Binomial distribution with parameters (P11,P01,P10;N_a), then the correlation coefficient is ρ=(P11-Πx Π y)/ sqrt(Π x(1- Π x) Π y (1- Π y)) It is possible for one of the counts, say Y, to account for absence of a certain condition (disease), which may accompany X. In this case, the two disease counts are negatively correlated and the joint hot spot analysis is in fact a hot/cold spot analysis as we look for low values of one variable and high values of another.
Joint Hotspot Analysis In joint hotspot analysis, we look for zones with elevated responses relative to the rest of the region. Elevated responses are measured in terms of large values of the intensity function G_a=(G_{X_a}, G_{Y_a}) where G_{X_a} and G_{Y_a} are X and Y rates in cell a. Under the null hypothesis of no joint hotspots, we state H_0: Π_{X_a}= Π_x is the same for all cells a in R (no hotspots with respect to disease X), Π_{Y_a}= Π_y is the same for all cells a in R (no hotspots with respect to disease Y), and that P11 is specified.
Joint Hotspot Analysis Specifying the marginals, Πx and Π y, do not completely specify the distribution under the null hypothesis of no joint hotspots. We also need to specify P11; e.g. the probability of an individual with both diseases. We will study H_0 under different values of P11. Note that when P11 is specified apriori (by specifying a correlation coefficient, for example) one does not need the individual counts B_a for each cell a, and only the pairs (X_a,Y_a) are used. We can assume that the variables are independent; hence, P11= Π x Π y and study the hotspots obtained under independence. One can also set ρ and hence P11 at a fixed high (low) value. Using these values, one can study the sensitivity of the hotspots obtained and compare to the independence case.
Exceedance The rates define a piece-wise constant surface over the tessellation. This surface is 3-dimensional for each rate and 4-dimensional when both rates are considered. One can generalize the exceedance approach of defining the ULS to the multivariate setting. We may define the multivariate level vector G=(g,g,…,g) and multivariate exceedance vector G>g. Thus, the multivariate ULS: U_g={a: G_a> g}. Similarly, we can define multivariate exceedance in terms the levels of the norm sqrt{Gx^2+Gy^2}, G_x+G_y, max(G_x,G_y), among others. This function is defined for all cells of R and over the vertices of the associated abstract graph. This function has a finite number of values (levels) in the tessellation and each level g determines an upper level set.
Sensitivity Analysis How sensitive are the joint hotspots to the degree of association between X and Y? We do not expect to see common hotspots when X and Y are independent whereas as the strength of association between the variables increases, we expect to see many more common hotspots. In some cases information on B_a, the number of individuals with both diseases in cell a may not be available apriori. Consider the bivariate binomial model and pairs of random observations (X_a,Y_a), where X and Y have marginal binomial distributions, with a given degree of association.
Sensitivity Analysis At each cell a in R, we simulate a bivariate binomial random vector with parameters Π_x, Π_y, and P11, where Π_x, Π_y are estimated from the marginal distributions and P11 is specified. The resulting data set will be used to obtain the new hotspots with the correlation, ρ. The generated sample will exhibit marginal hotspots that are similar to the ones obtained from the original data. The joint hotspots will reflect the effects of the new degree of association on the data. We assume that the variables are independent; hence, P11= Πx Πy or ρ=0 and study the hotspots obtained under independence.
Case Stdy I: Microbial Hotspots Cryptosporidium and Giardia are microscopic parasites that, if swallowed, cause diarrhea and stomach cramps in immunocompetent persons and severe illness in susceptible individuals. Cryptosporidium and Giardia oocysts exist in surface waters and have been detected in drinking water. Cryptosporidium and Giardia have caused a number of waterborne disease outbreaks in the U.S.
A comparison of Cryptosporidium parvum oocysts (4-6 microns in length) and Giardia lamblia cysts (11-14 microns in length). Bar = 10 microns (Lindquist, 2005).
Case Stdy I: Microbial Hotspots The dataset we consider is the number of people diagnosed with Cryptosporidiosis and Giardiasis in the state of Ohio in Figures show the top hotspots along with their likelihoods for Cryptosporidiosis and Giardiasis, respectively. Figure 1 shows the likelihood of Cryptosporidiosis in each county, where only the top two hotspots are statistically significant. Figure 2 shows the likelihood of Giardiasis in each county, where the top hotspot is not significant. Hence, there is no joint hotspot to consider as the two diseases do not define hotspots with any cells in common.
Figure 1: Cryptosporidiosis hotspots and likelihoods in the State of Ohio, based on reported cases of Cryptosporidiosis by country, The top two hotspots are statistically significant.
Figure 2: Giardiasis hotspots and likelihoods in the State of Ohio, based on reported cases of Giardiasis by country of residence, The top hotspot is not statistically significant.
Mapping of Crime Hotspots Also called hot addresses (Eck and Weisburd, 1995; Sherman, Gartin and Buerger, 1989), hotspots are concentrations of individual events that suggest a series of related crimes (Eck, Chainey, Cameron, Leitner and Wilson, 2005). Similar to disease counts, crime rates are not uniformly distributed across the tessellation. Crime is usually more prevalent in some areas while largely absent in others. Allocation of resources is usually based on where the demand for law enforcement is highest.
Mapping of Crime Hotspots The uniform crime reporting program (ICPSR, 2004) provides data collected at the county-level for all states and several offenses, including murder, rape, robbery, aggravated assault, burglary, larceny, auto theft, among others. Robbery is defined as taking of personal property in the possession or immediate presence of another by the use of violence or intimidation. Burglary is the act of breaking into a house at night to commit theft or other felony.
Figure 3: The top five hotspots of Burglary in counties of the state of Ohio are significant at level.
Figure 4: The top three hotspots of Robbery in counties of the state of Ohio are significant at level.
Figure 5: The top significant hotspots at level obtained by the intersection method for Burglary and Robbery in counties of the state of Ohio, 2002.
ULS Scan on Multivariate Data In a scenario of multivariate data, ULS is operated as many times as the dimensions of the data, with every individual run of ULS operating only on one dimension. Finally the clusters are those that are the intersection of the clusters obtained by the runs of ULS. This example considers the crime data for every US state. Every state has two observations, the count of robbery and the count of murders committed in that state. The aim is to cluster those states that have a high incidence of both robbery and murder. Hotspot for Murder Hotspot for Robbery Intersection Hotspot
References Eck, J. E. and Weisburd, D. (1995). Crime places in crime theory. In J. E. Eck and D. Weisburd (eds.) Crime Places, Vol. 4, Monsey, NY. Crime Justice Press. Eck, J. E., Chainey, S., Cameron, J. G., Leitner, M. and Wilson, R. E. (2005). Mapping Crime: understanding hotspots. National Institute of Justice ( ICPSR (2004). U.S. Department of Justice, Federal Bureau of Investigation. Uniform Crime Reporting Program Data: County-Level Detailed Arrest and Offense data. Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26, Lindquist, H.D.A. (2005). Photo from US EPA microbiology Web page: Patil, G. P. and Taillie, C. (2004). Upper level set statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics 11, Patil, G. P., Modarres, R. and Patakar, P. (2005). The ULS software, version 1.0. Center for Statistical Ecology and Environmental Statistics. Department of Statistics, Pennsylvania State University. Sherman, L. W., Gartin, P. R. and Buerger, M E. (1989). Hotspots of predatory crime: routine activities and criminology of place. Criminology, V. 27, 1,