Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining Sanjay Chawla(Vignette Corporation) Shashi Shekhar, Weili Wu(CS, Univ. of Minnesota) Uygar Ozesmi(Ericyes University, Turkey)
Outline Motivation Application Domain Distinguishing characteristics of spatial data mining Problem Definition Spatial Statistics Approach Our approach: PLUMS Experiments, Results, Conclusion and Future Work
Motivation Historical Examples of Spatial Data Exploration –Asiatic Cholera, 1855 –Theory of Gondwanaland –Effect of fluoride on Dental Hygiene A potential application in news –Tracking the West Nile Virus
Application Domain Wetland Management: Predicting locations of bird(red-winged blackbird) nests in wetlands Why we choose this application ? –Strong spatial component –Domain Expertise –Classical Data Mining techniques(logistic regression, neural nets) had already been applied
Application Domain: Continued.. Nest Locations Distance to open water Vegetation DurabilityWater Depth
Unique characteristics of spatial data mining Spatial Autocorrelation Property
Unique characteristics…cont Average Distance to Nearest Prediction(ADNP):
Location Prediction:Problem Formulation Given: A spatial framework S. – Explanatory functions, – Dependent function F –A family F of learning model function mappings Find an element Objective: maximize (map_similarity = classification_accuracy + spatial accuracy) Constraints: spatial autocorrelation exists
Spatial Statistics Approach ”Logistic Regression:
Spatial Stat: Solution Techniques Least Square Estimation: Biased and Inconsistent Maximum Likelihood: Involve computation of large determinant(from W) Bayesian: Monte Carlo Markov Chain(e.g. Gibbs Sampling)
Our Approach
Experiment Setup
Result(1)
Result(2)
Conclusion and Future work PLUMS >> Classical Data Mining techniques PLUMS State-of-the-art Spatial Statistics approaches Better performance(two orders of magnitude) Try other configurations of the PLUMS framework and formalize!