C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University.

C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota ctlu@cs.umn.edu http://www.cs.umn.edu/research/shashi-group Group Members: Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu

C.T. LuSpatial Data Mining2 Outline Introduction Case 1: Location Prediction Case 2: Spatial Association: Co-location Case 3: Spatial Outlier Detection Conclusion and Future Directions

C.T. LuSpatial Data Mining3 Introduction: spatial data mining Spatial Databases are too large to analyze manually NASA Earth Observation System (EOS) National Institute of Justice – Crime mapping Census Bureau, Dept. of Commerce - Census Data Spatial Data Mining Discover frequent and interesting spatial patterns for post processing (knowledge discovery) Pattern examples: spatial outliers, location prediction, clustering, spatial association, trends,.. Historical Example London, 1854 Cholera & water pump

C.T. LuSpatial Data Mining4 Framework Problem statement: capture special needs Data exploration: maps Try reusing classical methods data mining, spatial statistics Invent new methods if reuse is not applicable Develop efficient algorithms Validation, Performance tuning

C.T. LuSpatial Data Mining5 Case 1: Location Prediction Problem: predict nesting site in marshes Given vegetation, water depth, distance to edge, etc. Data - maps of nests and attributes spatially clustered nests, spatially smooth attributes Classical method: logistic regression, decision trees, bayesian classifier but, independence assumption is violated ! Misses auto-correlation ! Spatial auto-regression (SAR) Open issues: spatial accuracy vs. classification accurary Open issue: performance - SAR learning is slow!

C.T. LuSpatial Data Mining6 Given: 1. Spatial Framework 2. Explanatory functions: 3. A dependent class: 4. A family of function mappings: Find: Classification model: Objective:maximize classification_accuracy Constraints: Spatial Autocorrelation exists Nest locations Distance to open water Vegetation durabilityWater depth Location Prediction

C.T. LuSpatial Data Mining7 Evaluation: Change Model Linear Regression Spatial Autoregression Model (SAR) y =  Wy + X  +  W models neighborhood relationships  models strength of spatial dependencies  error vector Mixed Spatial Autoregression Model (MSAR) y =  Wy + X  + WX  +  Consider the impact of the explanatory variables from the neighboring observations

C.T. LuSpatial Data Mining8 Measure: ROC Curve ROC Curve: Locus of the pair (TPR,FPR) for each cut-off probability Receiver Operating Characteristic (ROC) TPR = A n P n / (A n P n + A n P nn ) FPR = A nn P n / (A nn P n +A nn P nn ) Classification accuracy: confusion matrix

C.T. LuSpatial Data Mining9 Evaluation: Change Model Linear Regression Spatial Regression Spatial model is better

C.T. LuSpatial Data Mining10 Spatial Autoregression Model (SAR) y =  Wy + X  +  Solutions  and  - can be estimated using Maximum likelihood theory or Bayesian statistics. e.g., spatial econometrics package uses Bayesian approach using sampling-based Markov Chain Monte Carlo (MCMC) method. Maximum likelihood-based estimation requires O(n 3 ) ops. Solution Procedures

C.T. LuSpatial Data Mining11 Evaluation: Chang measure New measure: ADNP Average distance to nearest prediction Spatial accuracy (map similarity)

C.T. LuSpatial Data Mining12 Predicting Location using Map Similarity

C.T. LuSpatial Data Mining13 Predicting location using Map Similarity PLUMS components Map Similarity : Avg. Distance to Nearest Prediction(ADNP),.. Search Algorithm : Greedy, gradient descent Function family : generalized linear (GL)(logit, probit), non-linear, GL with auto-correlation Discretization of parameter space : Uniform, non-uniform, multi-resolution, …

C.T. LuSpatial Data Mining14 Association Rule Supermarket shelf management Goal: To identify items that are bought together by sufficiently many customers Approach: Process the point-of-scale data collected with barcode scanners to find dependencies among items (Transaction data) A classic rule – If a customer buys diaper and milk, then he is very likely to buy beer So, don’t be surprised if you find six-packs of beer stacked next to diapers!

C.T. LuSpatial Data Mining15 Association Rules:Support and confidence Item set I = {i1, i2, ….ik} Transactions T = {t1, t2, …tn} Association rule: A -> B Support S (A and B) occur in at least S percent of the transactions P (A U B) Confidence C : Of all the transactions in which A occurs, at least C percent of them contains B P (B|A)

C.T. LuSpatial Data Mining16 Case 2: Spatial Association Rule Problem: Given a set of boolean spatial features find subsets of co-located features, e.g. (fire, drought, vegetation) Data - continuous space, partition not natural Classical data mining approach: association rules But, No Transactions!!! No support measure!! Approach: Work with continuous data without transactionizing it! Participation index (support) : min. fraction of instances of a features in join result Confidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)] new algorithm using spatial joins

C.T. LuSpatial Data Mining17 Answers: and Can you find co-location patterns from the following sample dataset? Co-location

C.T. LuSpatial Data Mining18 Co-location Can you find co-location patterns from the following sample dataset?

C.T. LuSpatial Data Mining19 Spatial Co-location A set of features frequently co-located Given A set T of K boolean spatial feature types T={f 1,f 2, …, f k } A set P of N locations P={p 1, …, p N } in a spatial frame work S, p i  P is of some spatial feature in T A neighbor relation R over locations in S Find T c =  subsets of T frequently co-located Objective Correctness Completeness Efficiency Constraints R is symmetric and reflexive Monotonic prevalence measure Reference Feature Centric Window CentricEvent Centric Co-location

C.T. LuSpatial Data Mining20 Participation index Participation index = min{pr(f i, c)} Participation ratio pr(f i, c) of feature f i in co-location c = {f 1, f 2, …, f k } Fraction of instances of f i with feature {f 1, f 2, f i-1, f i+1,…, f k } nearby. Association rulesCo-location rules underlying spacediscrete setscontinuous space item-types events /Boolean spatial features collectionstransactionsneighborhoods Prevalence (A -> B)Support: P(A U B)Participation index Conditional probability (A ->B)Confidence: P[A|B]P [A in N(L) | B at L) Comparison with association rules Co-location

C.T. LuSpatial Data Mining21 Spatial Co-location Patterns Spatial feature A,B,C and their instances Possible associations are (A, B), (B, C), etc. Neighbor relationship includes following pairs: A1, B1 A2, B1 A2, B2 B1, C1 B2, C2 Dataset

C.T. LuSpatial Data Mining22 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Support (A,B) =2 (B,C)=2Support (A,B)=1 (B,C)=2 Partition approach [Yasuhiko, KDD 2001] Support not well defined i.e., not independent of execution trace Has a fast heuristic which is hard to analyze for correctness/completeness Dataset

C.T. LuSpatial Data Mining23 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Dataset Reference feature approach [Han SSD 95] Use C as reference feature to get transactions Transactions: (B1) (B2) Support (A,B) = Ǿ Note: Neighbor relationship includes following pairs: A1, B1 A2, B1 A2, B2 B1, C1 B2, C2

C.T. LuSpatial Data Mining24 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Our approach (Event Centric) Neighborhood instead of transactions Spatial join on neighbor relationship Support Participation index = Min ( p_ratio ) P_ratio(A, (A,B)) = fraction of instance of A participating in join(A,B, neighbor) Examples Support(A, B)=min(3/2,3/2)=1.5 Support(B, C)=min(2/2,2/2)=1 Dataset

C.T. LuSpatial Data Mining25 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Support A,B =2 B,C=2 Support A,B=1 B,C=2 Support(A,B)=min(3/2,3/2)=1.5 Support(B,C)=min(2/2,2/2)=1 Partition approach Our approach Dataset Reference feature approach C as reference feature Transactions: (B1) (B2) Support (A,B) = Ǿ

C.T. LuSpatial Data Mining26 Case 3: Spatial Outliers Detection Spatial Outlier: A data point that is extreme relative to it neighbors

C.T. LuSpatial Data Mining27 Application Domain: Traffic Data

C.T. LuSpatial Data Mining28 Spatial Outlier Detection Given A spatial framework SF consisting of locations s 1, s 2, …, s n An attribute function f : s i  R (R : set of real numbers) A neighborhood relationship N  SF  SF A neighborhood aggregation function : R N  R A difference function F diff : R  R  R Statistic test function ST : R  { True, False } Test is based on F diff (f, (f, N) Find O = {v i | v i  V, v i is a spatial outlier} Objective Correctness: The attribute values of v i is extreme, compared with its neighbors Computational efficiency

C.T. LuSpatial Data Mining29 An example of Spatial outlier

C.T. LuSpatial Data Mining30 Spatial Outlier Detection: Z s(x) approach Function: Declare x as a spatial outlier If

C.T. LuSpatial Data Mining31 Evaluation of Statistical Assumption Distribution of traffic station attribute f(x) looks normal Distribution of looks normal too!

C.T. LuSpatial Data Mining32 Different Spatial Outlier Test Spatial Statistic Approach Scatter plot approach(Luc Anselin 94’) Moran scatter plot approach (Luc Anselin 95’) Variogram cloud approach (Graphic)

C.T. LuSpatial Data Mining33 Scatter plot approach Given An attribute function f(x) A neighborhood relationship N(x) An aggregation function A difference function F diff : є = E(x) – (m  f(x) + b) Detect spatial outlier by Statistic test function ST :

C.T. LuSpatial Data Mining34 Graphical Spatial Outlier Test

C.T. LuSpatial Data Mining35 Original Data Graphical Spatial Tests

C.T. LuSpatial Data Mining36 A Unified Algorithm Separate two phases Model building Testing (a node or a set of nodes) Computation structure of model building Key insights: Spatial self join using N(x) relationship Algebraic aggregate functions can be computed in one disk scan of spatial join Computation structure of testing Single node: spatial range query Get-All-Neighbors(x) operation

C.T. LuSpatial Data Mining37 An example: Scatter plot Model building An attribute function f(x) Neighborhood aggregate function Distributive aggregate functions Algebraic aggregate functions where, Testing Difference function where Statistic test function

C.T. LuSpatial Data Mining38 Outlier Stations Detected

C.T. LuSpatial Data Mining39 Outlier Station Detected

C.T. LuSpatial Data Mining40 Conclusion and Future Directions Spatial domains may not satisfy assumptions of classical methods data: auto-correlation, continuous geographic space patterns: global vs. local, e.g., outliers vs. spatial outliers data exploration: maps and albums Open Issues patterns: hot-spots, spatial trends,… metrics: spatial accuracy (predicted locations), spatial contiguity(clusters) spatio-temporal dataset: spatial-temporal outliers scale and resolutions sentivity of patterns geo-statistical confidence measure for mined patterns

C.T. LuSpatial Data Mining41 Reference 1.S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th International Symposium on Spatial and Temporal Databases (SSTD01), July 2001. 2.S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001. 3.S. Shekhar, C.T. Lu, P. Zhang, “Detecting Graph-based Saptial Outlier”, Intelligent Data Analysis, To appear in Vol. 6(3), 2002 4.S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000. 5.S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, 2001. 6.S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's Spatial about Spatial Data Mining: Three Case Studies, as Chapter of Book: Data Mining for Scientific and Engineering Applications. V. Kumar, R. Grossman, C. Kamath, R. Namburu (eds.), Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2 7.Shashi Shekhar and Yan Huang, Multi-resolution Co-location Miner: a New Algorithm to Find Co-location Patterns in Spatial Datasets, Fifth Workshop on Mining Scientific Datasets (SIAM 2nd Data Mining Conference), April 2002

C.T. LuSpatial Data Mining42 http://www.cs.umn.edu/research/shashi-group Thank you !!!

C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University.

Similar presentations

Presentation on theme: "C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University.

Similar presentations

Presentation on theme: "C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University."— Presentation transcript:

Similar presentations

About project

Feedback