NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.

Slides:

Advertisements

Similar presentations

Copula Representation of Joint Risk Driver Distribution

Advertisements

Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.

SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.

Chapter 13: The Chi-Square Test

Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.

Multivariate distributions. The Normal distribution.

How Much Crime Reduction Does the Marginal Prisoner Buy? Rucker Johnson Goldman School of Public Policy UC Berkeley Steven Raphael Goldman School of Public.

GIS and Spatial Statistics: Methods and Applications in Public Health

Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

Correlation and Autocorrelation

Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.

Chapter18 Determining and Interpreting Associations Among Variables.

Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.

Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.

Inferences About Process Quality

Spatial Statistics for Cancer Surveillance Martin Kulldorff Harvard Medical School and Harvard Pilgrim Health Care.

Geographic Information Science

The Practice of Social Research

Using ArcGIS/SaTScan to detect higher than expected breast cancer incidence Jim Files, BS Appathurai Balamurugan, MD, MPH.

Safer College Campuses and Communities Through the Use of Geospatial Information Technology George Roedl and Gregory Elmes West Virginia University.

Tabulate, chart, map, download: Pre-tabulated health indicators.

Spatial Statistics Applied to point data.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

CHAPTER 14 MULTIPLE REGRESSION

Mapping and analysis for public safety: An Overview.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.

Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.

Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.

C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.

MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.

Implementing the Analysis Information System IN 2004 In the sub Saharan region of Africa In the Northern Africa region WHY This difference of level? Overall.

Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.

Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.

Development of Spatial Probability Models to Estimate, Integrate, and Assess Ground- Water Vulnerability at Multiple Scales Earl A. Greene and Andrew E.

Analyzing the Geospatial Imbalance of the Primary Care Physician Labor Supply in the Contiguous United States By Russ Frith University of W. Florida Capstone.

Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.

The Statistical Urban Zoning. The Experience of the Municipality of Firenze La zonizzazione statistica in ambito urbano. L’esperienza del Comune di Firenze.

Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.

STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.

Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.

CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

INDIAN SCIENCE CONGRESS Mumbai 2015 Actuarial Science Symposium G. P. Patil Penn State University, University Park, PA USA.

Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems.

Spatial Scan Statistic for Geographical and Network Hotspot Detection C. Taillie and G. P. Patil Center for Statistical Ecology and Environmental Statistics.

1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.

Myers, W. L., Bishop, J., Brooks, R., and Patil, G. P. (2001). Composite spatial indexing of regional habitat importance. Community Ecology, 2(2), 213—220.

Motivation, Description, and Timeliness Geoinformatics for spatial and temporal hotspot detection and prioritization is a critical need for.

Project Geoinformatic Surveillance NSF DGP Grant G. P. Patil, Penn State, PI EPA: Watershed Characterization and Prioritization PADOH: Disease Clusters.

1 Bivariate Hotspot Detection The circle-based SaTScan and data- driven ULS scan statistic are designed to identify hotspots based on the elevated responses.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

1 Seattle JSM Session G. P. Patil August 7, 2006.

Albany New York (1) G. P. Patil. Albany New York (2) G. P. Patil.

1 Seattle JSM Session G. P. Patil August 6, 2006.

Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

1 Annual Digital Government Research Conference San Diego, CA Project Highlights G.P. Patil May 2006.

Comparative Knowledge Discovery with Partial Order and Composite Indicator Partial Order Ranking of Objects with Weights for Indicators and Its Representability.

1 Multi-criterion Ranking and Poset Prioritization G. P. Patil December 2004 – January 2005.

1 Spatial Temporal Surveillance. 2 3 Geographic Surveillance and Hotspot Detection for Homeland Security: Cyber Security and Computer Network Diagnostics.

A genetic algorithm for irregularly shaped spatial clusters Luiz Duczmal André L. F. Cançado Lupércio F. Bessegato 2005 Syndromic Surveillance Conference.

1 Fukuoka Conference, Japan G. P. Patil November 2005.

Inference about the slope parameter and correlation

Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics G.P.Patil, Penn State University Reza Modarres,

5/22/2018 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.

EPA Presentation March 13,2003 G. P. Patil

NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.

Geographic and Network Surveillance for Arbitrarily Shaped Hotspots

Albany New York (1) G. P. Patil

Presentation transcript:

NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.

Geoinformatic hotspot surveillance system

Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal Surveillance Typology of Space-Time Hotspots Hotspot Prioritization Ranking Without Having to Integrate Multiple Indicators Surveillance Geoinformatics for Hotspot Detection, Prioritization, Early Warning and Sustainable Management Upper Level Set Scan System Definition: A hotspot is that portion of the study region with an elevated risk of an adverse outcome Federal Agency Partnerships CDC DOD EPA NASA NIH NOAA USFS USGS Features of ULS Scan Statistic: Identifies arbitrarily shaped hotspots Applicable to data on a network Confidence sets and hotspot ratings Computationally efficient Generalizes to space-time scan Poset Prioritization System Objective: Prioritize or rank hotspots based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Haase diagrams and partially ordered sets. Example: Prioritization of disease clusters with Multiple Indicators National Applications and Case Studies Biosurveillance Carbon Management Costal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Sensor Networks Robotic Networks Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Syndromic Surveillance Social Networks Stream Networks g Changing Connectivity of ULS as Level Drops G.P. Patil, R. Acharya, W.L. Myers, P. Patankar, Y. Cai, and S.L. Rathbun The Penn State University, University Park, PA R. Modarres George Washington University, Washington, D.C. Example: West Nile Virus First isolated in 1937, this mosquito born disease, indigenous to north Africa, the Middle East and west Asia was first introduced into the United States in Disease Count Quintiles Population Quintiles Disease Rate QuintilesLikelihood Quintiles Comparison of ULS Scan with Circular Scan ULS ScanCircular Scan Confidence set for ULS Hotspot Hotspot Membership Rating Example: Lyme Disease Infections from the bacterium Borelia burgdorfei vectored by ticks from the genus Ixodes. ULS Scan Cylindrical Scan Example: Human-environment indicator values for 16 European countries. There are a total of 3,764,448 admissible linear extensions. The cumulative rank function for Sweden exceeds that of all remaining countries. The crf’s of all countries dominate that of Ireland. The remaining countries cannot be uniquely ordered based on their crf’s. Belgium, Netherlands and United Kingdom have identical crf’s. Admissible linear extensions are comprised of rankings compatible with the rankings of all indicators. Treating each linear extension as a voter, the cumulative rank function is obtained from the frequencies at which each object receives each rank. Disease Rates Comparison of ULS Scan with Cylindrical Scan Year Haase Diagram The crf’s also form a partially ordered set. There are only 182 admissible linear extensions for this poset, yielding the cumulative rank function: One more iteration yields the rankings in the data table. Center for Statistical Ecology and Environmental Statistics

Demonstration Example DataSet

Demonstration Example Data Cases Disease Rate Population Likelihood

Demonstration Example Upper Level Set Tree: Cells

Demonstration Example Upper Level Set Tree: Zones [3] [3,18] [3,18,0] [17] [17,16] [14] [17,16;14;15] [3,18,0,4,8,7,19,5;17,16,14,15;11] [8] [3,18,0,4] [3,18,0,4;8,7;19] [8,7]

ULS Scan Disease RateHotspots using ULS ULS Scan Hotspot 1 Hotspot 2 (red) (orange) Log Likelihood p-value Relative Risk

Confidence Set for ULS Hotspot Hotspot membership rating

Bivariate Hotspot Detection The circle-based SaTScan and data-driven ULS scan statistics are designed to identify hotspots based on the elevated responses of one variable over the scan region. These techniques are appropriate for detecting univariate hotspots. What can be done when the data under consideration provide many correlated responses in each cell?

Bivariate Hotspot Detection A simple and effective approach to multivariate hotspot detection applies the univariate ULS to each variable in the data set and identifies the univariate hotspots. Multivariate hotspots are those connected cells that appear in the intersection of the univariate hotspots of all variables. We will refer to this strategy as the intersection method.

Use of Covariates Another approach to multivariate hotspot detection calls for the use of explanatory variables, Patil and Taillie (2004). The size (population, area, etc) are proportional to model expectations and provide a link between a response variable and other explanatory variables. Regression techniques often provide a basis for adjusting the rates when a functional relationship is identified. To obtain hotspots based on all variables, the univariate ULS scan statistic is applied to the response variable and the adjusted sizes.

Bivariate Data For each cell a, observations are available in the form of quadruplets (X_a,Y_a,B_a, A_a) where X_a, Y_a and B_a are non-negative integers and A_a is a fixed and known constant. Suppose N_a=A_a people reside in cell a where each person has two certain diseases with probabilities Πx and Π y. The variable X_a is a count of the number of people in cell a who have disease X. Similarly, Y_a counts the number of people in cell a who have disease Y. The variable B_a counts the number of people in cell a who have both diseases. One can also formulate an equivalent approach when a count of individuals who are disease-free is available for every cell.

Table I: bivariate Bernoulli distribution defined on cell a. Y=0Y=1Total X=0P00P011-Πx X=1P10P11ΠxΠx Total1- ΠyΠyΠy1

Table I: bivariate Bernoulli distribution defined on cell a. Y=0Y=1Total X=0P00P011-Πx X=1P10P11ΠxΠx Total1- ΠyΠyΠy1

Bivariate Binomial Model If (X_a,Y_a) has a bivariate Binomial distribution with parameters (P11,P01,P10;N_a), then the correlation coefficient is ρ=(P11-Πx Π y)/ sqrt(Π x(1- Π x) Π y (1- Π y)) It is possible for one of the counts, say Y, to account for absence of a certain condition (disease), which may accompany X. In this case, the two disease counts are negatively correlated and the joint hot spot analysis is in fact a hot/cold spot analysis as we look for low values of one variable and high values of another.

Joint Hotspot Analysis In joint hotspot analysis, we look for zones with elevated responses relative to the rest of the region. Elevated responses are measured in terms of large values of the intensity function G_a=(G_{X_a}, G_{Y_a}) where G_{X_a} and G_{Y_a} are X and Y rates in cell a. Under the null hypothesis of no joint hotspots, we state H_0: Π_{X_a}= Π_x is the same for all cells a in R (no hotspots with respect to disease X), Π_{Y_a}= Π_y is the same for all cells a in R (no hotspots with respect to disease Y), and that P11 is specified.

Joint Hotspot Analysis Specifying the marginals, Πx and Π y, do not completely specify the distribution under the null hypothesis of no joint hotspots. We also need to specify P11; e.g. the probability of an individual with both diseases. We will study H_0 under different values of P11. Note that when P11 is specified apriori (by specifying a correlation coefficient, for example) one does not need the individual counts B_a for each cell a, and only the pairs (X_a,Y_a) are used. We can assume that the variables are independent; hence, P11= Π x Π y and study the hotspots obtained under independence. One can also set ρ and hence P11 at a fixed high (low) value. Using these values, one can study the sensitivity of the hotspots obtained and compare to the independence case.

Exceedance The rates define a piece-wise constant surface over the tessellation. This surface is 3-dimensional for each rate and 4-dimensional when both rates are considered. One can generalize the exceedance approach of defining the ULS to the multivariate setting. We may define the multivariate level vector G=(g,g,…,g) and multivariate exceedance vector G>g. Thus, the multivariate ULS: U_g={a: G_a> g}. Similarly, we can define multivariate exceedance in terms the levels of the norm sqrt{Gx^2+Gy^2}, G_x+G_y, max(G_x,G_y), among others. This function is defined for all cells of R and over the vertices of the associated abstract graph. This function has a finite number of values (levels) in the tessellation and each level g determines an upper level set.

Sensitivity Analysis How sensitive are the joint hotspots to the degree of association between X and Y? We do not expect to see common hotspots when X and Y are independent whereas as the strength of association between the variables increases, we expect to see many more common hotspots. In some cases information on B_a, the number of individuals with both diseases in cell a may not be available apriori. Consider the bivariate binomial model and pairs of random observations (X_a,Y_a), where X and Y have marginal binomial distributions, with a given degree of association.

Sensitivity Analysis At each cell a in R, we simulate a bivariate binomial random vector with parameters Π_x, Π_y, and P11, where Π_x, Π_y are estimated from the marginal distributions and P11 is specified. The resulting data set will be used to obtain the new hotspots with the correlation, ρ. The generated sample will exhibit marginal hotspots that are similar to the ones obtained from the original data. The joint hotspots will reflect the effects of the new degree of association on the data. We assume that the variables are independent; hence, P11= Πx Πy or ρ=0 and study the hotspots obtained under independence.

Case Stdy I: Microbial Hotspots Cryptosporidium and Giardia are microscopic parasites that, if swallowed, cause diarrhea and stomach cramps in immunocompetent persons and severe illness in susceptible individuals. Cryptosporidium and Giardia oocysts exist in surface waters and have been detected in drinking water. Cryptosporidium and Giardia have caused a number of waterborne disease outbreaks in the U.S.

A comparison of Cryptosporidium parvum oocysts (4-6 microns in length) and Giardia lamblia cysts (11-14 microns in length). Bar = 10 microns (Lindquist, 2005).

Case Stdy I: Microbial Hotspots The dataset we consider is the number of people diagnosed with Cryptosporidiosis and Giardiasis in the state of Ohio in Figures show the top hotspots along with their likelihoods for Cryptosporidiosis and Giardiasis, respectively. Figure 1 shows the likelihood of Cryptosporidiosis in each county, where only the top two hotspots are statistically significant. Figure 2 shows the likelihood of Giardiasis in each county, where the top hotspot is not significant. Hence, there is no joint hotspot to consider as the two diseases do not define hotspots with any cells in common.

Figure 1: Cryptosporidiosis hotspots and likelihoods in the State of Ohio, based on reported cases of Cryptosporidiosis by country, The top two hotspots are statistically significant.

Figure 2: Giardiasis hotspots and likelihoods in the State of Ohio, based on reported cases of Giardiasis by country of residence, The top hotspot is not statistically significant.

Mapping of Crime Hotspots Also called hot addresses (Eck and Weisburd, 1995; Sherman, Gartin and Buerger, 1989), hotspots are concentrations of individual events that suggest a series of related crimes (Eck, Chainey, Cameron, Leitner and Wilson, 2005). Similar to disease counts, crime rates are not uniformly distributed across the tessellation. Crime is usually more prevalent in some areas while largely absent in others. Allocation of resources is usually based on where the demand for law enforcement is highest.

Mapping of Crime Hotspots The uniform crime reporting program (ICPSR, 2004) provides data collected at the county-level for all states and several offenses, including murder, rape, robbery, aggravated assault, burglary, larceny, auto theft, among others. Robbery is defined as taking of personal property in the possession or immediate presence of another by the use of violence or intimidation. Burglary is the act of breaking into a house at night to commit theft or other felony.

Figure 3: The top five hotspots of Burglary in counties of the state of Ohio are significant at level.

Figure 4: The top three hotspots of Robbery in counties of the state of Ohio are significant at level.

Figure 5: The top significant hotspots at level obtained by the intersection method for Burglary and Robbery in counties of the state of Ohio, 2002.

ULS Scan on Multivariate Data In a scenario of multivariate data, ULS is operated as many times as the dimensions of the data, with every individual run of ULS operating only on one dimension. Finally the clusters are those that are the intersection of the clusters obtained by the runs of ULS. This example considers the crime data for every US state. Every state has two observations, the count of robbery and the count of murders committed in that state. The aim is to cluster those states that have a high incidence of both robbery and murder. Hotspot for Murder Hotspot for Robbery Intersection Hotspot

References Eck, J. E. and Weisburd, D. (1995). Crime places in crime theory. In J. E. Eck and D. Weisburd (eds.) Crime Places, Vol. 4, Monsey, NY. Crime Justice Press. Eck, J. E., Chainey, S., Cameron, J. G., Leitner, M. and Wilson, R. E. (2005). Mapping Crime: understanding hotspots. National Institute of Justice ( ICPSR (2004). U.S. Department of Justice, Federal Bureau of Investigation. Uniform Crime Reporting Program Data: County-Level Detailed Arrest and Offense data. Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26, Lindquist, H.D.A. (2005). Photo from US EPA microbiology Web page: Patil, G. P. and Taillie, C. (2004). Upper level set statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics 11, Patil, G. P., Modarres, R. and Patakar, P. (2005). The ULS software, version 1.0. Center for Statistical Ecology and Environmental Statistics. Department of Statistics, Pennsylvania State University. Sherman, L. W., Gartin, P. R. and Buerger, M E. (1989). Hotspots of predatory crime: routine activities and criminology of place. Criminology, V. 27, 1,