1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.

1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010

Ankara Turkey Workshop Digital Governance and Hotspot GeoInformatics G.P.Patil Penn State University Center for Statistical Ecology and Environmental Statistics April 30-May 1 2011

3 Federal Agency Partnership CDC DOD EPA NASA NIH NOAA USFS USGS Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard SurvellanceGeoinformaticsof Hotspot Detection, Prioritization and Early Warning NSF Digital Government Project #0307010 PI: G. P. Patil gpp@stat.psu.edu Websites: http://www.stat.psu.edu/~gpp/ http://www.stat.psu.edu/hotspots/ http://www.stat.psu.edu/%7Egpp/DGOnlineNews2006.mht NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance. Cellular Surface National and International Applications Biosurveillance Carbon Management Coastal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Robotic Networks Sensor Networks Social Networks Syndromic Surveillance Tsunami Inundation Urban Crime Water Management

4 Hotspot Detection On Synoptic Regions and Networks

5 Examples of Hotspot Analysis Spatial Disease surveillance Biodiversity: species-rich and species-poor areas Geographical poverty analysis Network Water resource impairment at watershed scales Water distribution systems, subway systems, and road transport systems Social Networks/Terror Networks

6 Issues in Hotspot Analysis Estimation: Identification of areas having unusually high (or low) response Testing: Can the elevated response be attributed to chance variation (false positive) or is it statistically significant? Explanation: Assess explanatory factors that may account for the elevated response

7 The Spatial Scan Statistic Move a circular window across the map. Move a circular window across the map. Use a variable circle radius, from zero up Use a variable circle radius, from zero up to a maximum where 50 percent of the population is included.

8 A small sample of the circles used

11 For each circle: – Obtain actual and expected number of cases inside and outside the circle. – Calculate Likelihood Function Compare Circles: – Pick circle with maximum likelihood. This is the most likely cluster, i.e., the cluster least likely to have occurred by chance Inference: – Generate random replicas of the data set under the null- hypothesis of no clusters (Monte Carlo sampling) – Compare most likely clusters in real and random data sets (Likelihood ratio test)

12 Hospital Emergency Admissions in New York City Hospital emergency admissions data from a majority of New York City hospitals. Hospital emergency admissions data from a majority of New York City hospitals. At midnight, hospitals report last 24 hour of At midnight, hospitals report last 24 hour of data to New York City Department of Health A spatial scan statistic analysis is performed every morning A spatial scan statistic analysis is performed every morning If an alarm, a local investigation is conducted If an alarm, a local investigation is conducted

13 Detecting Emerging Clusters Instead of a circular window in two dimensions, we use a cylindrical window in three dimensions. Instead of a circular window in two dimensions, we use a cylindrical window in three dimensions. The base of the cylinder represents space, while the height represents time. The base of the cylinder represents space, while the height represents time. The cylinder is flexible in its circular base and starting date, but we only consider those cylinders that reach all the way to the end of the study period. Hence, we are only considering ‘alive’ clusters. The cylinder is flexible in its circular base and starting date, but we only consider those cylinders that reach all the way to the end of the study period. Hence, we are only considering ‘alive’ clusters. West Nile Virus, NY City, Success Story. West Nile Virus, NY City, Success Story.

14 Spatial Scan Statistic Limitations and Needs Circles capture only compactly shaped clusters Circles capture only compactly shaped clusters Want to identify clusters of arbitrary shape Want to identify clusters of arbitrary shape Circles handle only synoptic (tessellated ) data Circles handle only synoptic (tessellated ) data Want to handle data on a network Want to handle data on a network Circles provide point estimate of hotspot Circles provide point estimate of hotspot Want to assess estimation uncertainty (hotspot confidence set) Want to assess estimation uncertainty (hotspot confidence set)

15 Crime Rate Hotspots

16 Circular spatial scan statistic zonation Cholera outbreak along a river flood plain Small circles miss much of the outbreak Large circles include many unwanted cells

17 Poor Hotspot Delineation by Circular Zones Hotspot Circular zone approximations Circular zones may represent single hotspot as multiple hotspots

18 Attractive Features Identifies arbitrarily shaped clusters Identifies arbitrarily shaped clusters Data-adaptive zonation of candidate hotspots Data-adaptive zonation of candidate hotspots Applicable to data on a network Applicable to data on a network Provides both a point estimate as well as a confidence set for the hotspot Provides both a point estimate as well as a confidence set for the hotspot Uses hotspot-membership rating to map hotspot boundary uncertainty Uses hotspot-membership rating to map hotspot boundary uncertainty Computationally efficient Computationally efficient Applicable to both discrete and continuous syndromic responses Applicable to both discrete and continuous syndromic responses Identifies arbitrarily shaped clusters in the spatial-temporal domain Identifies arbitrarily shaped clusters in the spatial-temporal domain Provides a typology of space-time hotspots with discriminatory surveillance potential Provides a typology of space-time hotspots with discriminatory surveillance potential Hotspot Detection Innovation Upper Level Set Scan Statistic

19 Candidate Zones for Hotspots Goal: Identify geographic zone(s) in which a response is significantly elevated relative to the rest of a region Goal: Identify geographic zone(s) in which a response is significantly elevated relative to the rest of a region A list of candidate zones Z is specified a priori A list of candidate zones Z is specified a priori –This list becomes part of the parameter space and the zone must be estimated from within this list –Each candidate zone should generally be spatially connected, e.g., a union of contiguous spatial units or cells –Longer lists of candidate zones are usually preferable –Expanding circles or ellipses about specified centers are a common method of generating the list

20 ULS Candidate Zones Question: Are there data-driven (rather than a priori) ways of selecting the list of candidate zones? Question: Are there data-driven (rather than a priori) ways of selecting the list of candidate zones? Motivation for the question: A human being can look at a map and quickly determine a reasonable set of candidate zones and eliminate many other zones as obviously uninteresting. Can the computer do the same thing? Motivation for the question: A human being can look at a map and quickly determine a reasonable set of candidate zones and eliminate many other zones as obviously uninteresting. Can the computer do the same thing? A data-driven proposal: Candidate zones are the connected A data-driven proposal: Candidate zones are the connected components of the upper level sets of the response surface. The candidate zones have a tree structure (echelon tree is a subtree), which may assist in automated detection of multiple, but geographically separate, elevated zones. Null distribution: If the list is data-driven (i.e., random), its variability must be accounted for in the null distribution. A new list must be developed for each simulated data set. Null distribution: If the list is data-driven (i.e., random), its variability must be accounted for in the null distribution. A new list must be developed for each simulated data set.

21 Data-adaptive approach to reduced parameter space  0 Data-adaptive approach to reduced parameter space  0 Zones in  0 are connected components of upper level sets of the empirical intensity function G a = Y a / A a Zones in  0 are connected components of upper level sets of the empirical intensity function G a = Y a / A a Upper level set (ULS) at level g consists of all cells a where G a  g Upper level set (ULS) at level g consists of all cells a where G a  g Upper level sets may be disconnected. Connected components are Upper level sets may be disconnected. Connected components are the candidate zones in  0 These connected components form a rooted tree under set inclusion. These connected components form a rooted tree under set inclusion. –Root node = entire region R –Leaf nodes = local maxima of empirical intensity surface –Junction nodes occur when connectivity of ULS changes with falling intensity level ULS Scan Statistic

22 Upper Level Set (ULS) of Intensity Surface Hotspot zones at level g (Connected Components of upper level set)

23 Changing Connectivity of ULS as Level Drops g

24 ULS Connectivity Tree Schematic intensity “surface” N.B. Intensity surface is cellular (piece-wise constant), with only finitely many levels A, B, C are junction nodes where multiple zones coalesce into a single zone A B C

25 A hotspot confidence set with two connected components is shown on the ULS tree. The connected components correspond to different hotspot loci while the nodes within a connected component correspond to different delineations of that hotspot – all at the appropriate confidence level. Confidence Region on ULS Tree

26 Determine a confidence set for the hotspot Determine a confidence set for the hotspot Each member of the confidence set is a zone which is a statistically plausible delineation of the hotspot at specified confidence Each member of the confidence set is a zone which is a statistically plausible delineation of the hotspot at specified confidence Confidence set lets us rate individual cells a for hotspot membership Confidence set lets us rate individual cells a for hotspot membership Rating for cell a is percentage of zones in confidence set that contain a. (More generally, use weighted proportion.) Rating for cell a is percentage of zones in confidence set that contain a. (More generally, use weighted proportion.) Map of cell ratings: Map of cell ratings: –Inner envelope = cells with 100% rating –Outer envelope = cells with positive rating Hotspot Delineation and Hotspot Rating

27 Tessellated study area Want zones to be connected Maximize L(Z) for Z in  Candidate Zones Z Allowable zoneNot a zone

28 Tessellated study area Want zones to be connected Maximize L(Z) for Z in  Candidate Zones Z Allowable zoneNot a zone

29 Space-Time Detection and Early Warning ULS Scan Statistic The traditional space-time scan statistic employs cylinders as the candidate zones in the reduced parameter space. In many instances, the cylindrical shape may be a poor approximation to actual space-time hotspots, whereas the ULS approach is able to adapt its shape to the actual hotspot. Since the ULS tree is derived from the adjacency matrix, the same software will work once the notion of adjacency has been specified for space- time cells.

30 Hotspot Detection for Continuous Responses Human Health Context:  Blood pressure levels for spatial variation in hypertension  Cancer survival (censoring issues) Environmental Context:  Landscape metrics such as forest cover, fragmentation, etc.  Pollutant loadings  Animal abundance

31 Hotspot Model for Continuous Responses Simplest distributional model: Additivity with respect to the index parameter k suggests that we model k as proportional to size: Scale parameter  takes one value inside Z and another outside Z Other distribution models (e.g., lognormal) are possible but are computationally more complex and applicable to only a single spatial scale

32 Mapping Priority Hotspots of Vegetative Disturbance for Carbon Budgets

33 Early Detection of Biological Invasions

34 Scalable Wireless Geo-Telemetry with Miniature Smart Sensors Geo-telemetry enabled sensor nodes deployed by a UAV into a wireless ad hoc mesh network: Transmitting data and coordinates to TASS and GIS support systems

35 Target Tracking in Distributed Sensor Networks

36 Wireless Sensor Networks for Habitat Monitoring

37 Space-Time Poverty Hotspot Typology Federal Anti-Poverty Programs have had little success in eradicating pockets of persistent poverty Federal Anti-Poverty Programs have had little success in eradicating pockets of persistent poverty Can spatial-temporal patterns of poverty hotspots provide clues to the causes of poverty and lead to improved location-specific anti- poverty policy ? Can spatial-temporal patterns of poverty hotspots provide clues to the causes of poverty and lead to improved location-specific anti- poverty policy ?

38 Hotspot Persistence Hotspot Persistence Space (census tract) 1970 1980 1990 2000 Time (census year) Persistent Hotspot Long Duration Space (census tract) 1970 1980 1990 2000 Time (census year) One-shot Hotspot Short Duration Persistence is a property of space-time hotspots Persistence can be assessed by the projection of the space-time hotspot onto the time axis

39 Application of the Scan Statistic to Wetland and Stream Monitoring: Testing the Method in the Upper Juniata Watershed, Pennsylvania, USA

40 Potential advantages of critical areas as an indicator of stress Delineation of contributing areas is labor intensive Delineation of contributing areas is labor intensive Can critical areas give us a first cut as to wetlands in poor condition? Can critical areas give us a first cut as to wetlands in poor condition? Can they serve as an early warning? Can they serve as an early warning?

42 Restoration Targeting with SaTScan Can effectively target areas for restoration activities or best management practices Can effectively target areas for restoration activities or best management practices Can be scaled so that critical area matches management area Can be scaled so that critical area matches management area

43 Network Analysis of Biological Integrity in Freshwater Streams

44 Network-Based Surveillance Subway system surveillance Subway system surveillance Drinking water distribution system surveillance Drinking water distribution system surveillance Stream and river system surveillance Stream and river system surveillance Postal System Surveillance Postal System Surveillance Road transport surveillance Road transport surveillance Syndromic Surveillance Syndromic Surveillance

52 Bivariate Hotspot Detection The circle-based SaTScan and data- driven ULS scan statistic are designed to identify hotspots based on the elevated responses of one variable over the scan region. The circle-based SaTScan and data- driven ULS scan statistic are designed to identify hotspots based on the elevated responses of one variable over the scan region. These techniques are appropriate for detecting univariate hotspots. What can be done when the data under consideration provides many correlated responses in each cell? These techniques are appropriate for detecting univariate hotspots. What can be done when the data under consideration provides many correlated responses in each cell?

53 Bivariate Hotspot Detection A simple and effective approach to multivariate hotspot detection applies the univariate ULS to each variable in the data set and identifies the univariate hotspots. A simple and effective approach to multivariate hotspot detection applies the univariate ULS to each variable in the data set and identifies the univariate hotspots. Multivariate hotspots are those connected cells that appear in the intersection of the univariate hotspots of all variables. Multivariate hotspots are those connected cells that appear in the intersection of the univariate hotspots of all variables. We will refer to this strategy as the intersection method. We will refer to this strategy as the intersection method.

54 Use of Covariates Another approach to multivariate hotspot detection calls for the use of explanatory variables, Patil and Taillie (2004) Another approach to multivariate hotspot detection calls for the use of explanatory variables, Patil and Taillie (2004) The size (population, area, etc) are proportional to model expectations and provide a link between a response variable and other explanatory variables. The size (population, area, etc) are proportional to model expectations and provide a link between a response variable and other explanatory variables. Regression techniques often provide a basis for adjusting the rates when a functional relationship is identified. Regression techniques often provide a basis for adjusting the rates when a functional relationship is identified. To obtain hotspots based on all variables, the univariate ULS scan statistic is applied to the response variable and the adjusted sizes. To obtain hotspots based on all variables, the univariate ULS scan statistic is applied to the response variable and the adjusted sizes.

55 Poset Prioritization for Multicriteria Ranking and Prioritization

56 We also present a prioritization innovation. It lies in the ability for prioritization and ranking of hotspots based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Hasse diagrams and partial order sets. This leads us to early warning systems, and also to the selection of investigational areas. Prioritization Innovation Partial Order Set Ranking

58 First stage screening First stage screening –Significant clusters by SaTScan and/or upper level sets upper level sets Second stage screening Second stage screening –Multicriteria noteworthy clusters by partially ordered sets and Hass diagrams Final stage screening Final stage screening –Follow up clusters for etiology, intervention based on multiple criteria using Hass diagrams based on multiple criteria using Hass diagrams Multiple Criteria Analysis, Multiple Indicators and Choices, Health Statistics, Disease Etiology, Health Policy, Resource Allocation

59 Hotspot Prioritization and Poset Ranking Multiple hotspots with intensities significantly elevated relative to the rest of the region Multiple hotspots with intensities significantly elevated relative to the rest of the region Ranking based on likelihood values, and additional attributes: raw intensity values, socio-economic and demographic factors, feasibility scores, excess cases, seasonal residence, atypical demographics, etc. Ranking based on likelihood values, and additional attributes: raw intensity values, socio-economic and demographic factors, feasibility scores, excess cases, seasonal residence, atypical demographics, etc. Multiple attributes, multiple indicators Multiple attributes, multiple indicators Ranking without having to integrate the multiple indicators into a composite index Ranking without having to integrate the multiple indicators into a composite index

60 Regions of comparability and incomparability for the inherent importance ordering of hotspots. Hotspots form a scatterplot in indicator space and each hotspot partitions indicator space into four quadrants

61 HUMAN ENVIRONMENT INTERFACE LAND, AIR, WATER INDICATORS RANK COUNTRY LANDAIRWATER 1 Sweden 2 Finland 3 Norway 5 Iceland 13 Austria 22 Switzerland 39 Spain 45 France 47 Germany 51 Portugal 52 Italy 59 Greece 61 Belgium 64 Netherlands 77 Denmark 78 United Kingdom 81 Ireland 69.0176.4627.381.7940.5730.1732.6328.3432.5634.6223.3521.5921.8419.439.8312.649.2535.2419.0563.9880.2529.8528.107.746.502.1014.296.893.200.001.075.041.131.99100981001001001001001001008210098100100100100100 for land - % of undomesticated land, i.e., total land area-domesticated (permanent crops and pastures, built up areas, roads, etc.) for air - % of renewable energy resources, i.e., hydro, solar, wind, geothermal for water - % of population with access to safe drinking water

62 Hasse Diagram (all countries)

63 Hasse Diagram (Western Europe)

64 Figure 5. Hasse diagrams for four different posets. Poset D has a disconnected Hasse diagram with two connected components {a, c, e} and {b, d}.

65 Figure 17. Hasse diagram of Poset B (left) and a decision tree enumerating all possible linear extensions of the poset (right). Every downward path through the decision tree determines a linear extension. Dashed links in the decision tree are not implied by the partial order and are called jumps. If one tried to trace the linear extension in the original Hasse diagram, a “jump” would be required at each dashed link. Note that there is a pure-jump linear extension (path a, b, c, d, e, f) in which every link is a jump.

66 Figure 18. Histograms of the rank-frequency distributions for Poset B.

67 Cumulative Rank Frequency Operator – 5 An Example of the Procedure In the example from the preceding slide, there are a total of 16 linear extensions, giving the following cumulative frequency table. Rank Element123456 a91416161616 b71215161616 c0410161616 d026121616 e00141016 f0000616 Each entry gives the number of linear extensions in which the element (row label) receives a rank equal to or better that the column heading

68 Cumulative Rank Frequency Operator – 6 An Example of the Procedure 16 The curves are stacked one above the other and the result is a linear ordering of the elements: a > b > c > d > e > f

69 Cumulative Rank Frequency Operator – 7 An example where F must be iterated Original Poset (Hasse Diagram) a f eb c g d h a f e b ad c h g a f e b ad c h g F F 2

70 Cumulative Rank Frequency Operator – 8 An example where F results in ties Original Poset (Hasse Diagram) a cb d a b, c (tied) d F Ties reflect symmetries among incomparable elements in the original Hasse diagram Elements that are comparable in the original Hasse diagram will not become tied after applying F operator

71 Poset Ranking Representability Existence of weighting scheme and the correspondingly weighted index inducing poset ranking Existence of weighting scheme and the correspondingly weighted index inducing poset ranking w.x = | w | | x | cos( w, x ) w.x = | w | | x | cos( w, x ) = | w | x projection of x on w. = | w | x projection of x on w. w.d = 0, w.d > 0, w.d 0, w.d < 0 where d = x 1 – x 2 Representability on convex set S of w Representability on convex set S of w

73 This report is very disappointing. What kind of software are you using?

74 Stone-Age Space-Age Syndrome Stone-age dataSpace-age data Stone-age analysis Space-age analysis

1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.

Similar presentations

Presentation on theme: "1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.

Similar presentations

Presentation on theme: "1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010."— Presentation transcript:

Similar presentations

About project

Feedback