Mining Statistically Significant Co-location and Segregation Patterns.

Mining Statistically Significant Co-location and Segregation Patterns

Outline Motivation Key concepts Problem definition Related works Challenges Contribution Validation

Motivation Finding collocated events provides insightful evidences in decision making and scientific research: –Ecology –Biology –Epidemiology –… Colocation patterns caused by randomness need attention: –Presence of spatial autocorrelation –Abundance of feature instances –…

Key Concept (1)

Key Concept (2) Null hypothesis –A hypothesis that one tries to disprove given the observation from the dataset. Alternative hypothesis –The opposite of null hypothesis, which is true when null hypothesis is rejected.

Key Concept (2) Null hypothesis –For a colocation pattern C, a higher participation index can be obtained in a random feature distribution(spatial autocorrelation is considered). –For a segregation pattern C, a lower participation index can be obtained in a random feature distribution.

Key Concept (3) Statistical significance –Significance is determined by significance level α (or Type I error), which is the probability of rejecting the null hypothesis given that it is true. –For each observed pattern, this probability is called p-value.

Key Concept (4)

Problem Definition

Related work Co-location Patterns Segregation Patterns Significance Test Spatial Co-location Patterns Detection √ Spatial Segregation Patterns Detection √ Mining Statistically Significant Co-location and Segregation Patterns √√√

Challenge The co-location/segregation patterns determined by a manually set threshold will raise false positives and are sensitive to dataset No probability model is available to compute the significance level (p-value) in a closed-form fashion; Computation is expensive to test the significance through Monte Carlo simulation.

Contributions Incorporates statistical significance test with colocation and segregation pattern detection which reduces spurious patterns caused by randomness; Proposes three approaches for algorithm acceleration: –a subset-based filter –a grid-based sampling framework –a spatial-join based pruning technique

Subset-based Filter

Grid-based Sampling

Spatial-join Based Pruning

Quality of Approximation – Grid-based Participation Index

Inhibition (synthetic data set)

Auto-correlation (synthetic data set)

Mixed Spatial Interactions (synthetic data set)

Runtime Comparison (1) Fixed total cluster number of each auto-correlated feature

Runtime Comparison (2) Various total cluster number of each auto-correlated feature

Experiments (real data set) –Ants –Bramble Canes –Lansing Woods –Toronto address repository

Ants Data

Mining Statistically Significant Co-location and Segregation Patterns.

Similar presentations

Presentation on theme: "Mining Statistically Significant Co-location and Segregation Patterns."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Statistically Significant Co-location and Segregation Patterns.

Similar presentations

Presentation on theme: "Mining Statistically Significant Co-location and Segregation Patterns."— Presentation transcript:

Similar presentations

About project

Feedback