C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University.

Slides:



Advertisements
Similar presentations
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
Advertisements

SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
GIS and Spatial Statistics: Methods and Applications in Public Health
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS HPDM 2004 Workshop at SIAM Data Mining Conference Barış.
Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining Sanjay Chawla(Vignette Corporation) Shashi Shekhar, Weili Wu(CS,
Introduction to Spatial Data Mining
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Extraction of high-level features from scientific data sets Eui-Hong (Sam) Han Department of Computer Science and Engineering University of Minnesota Research.
Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Why Geography is important.
Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Data Mining – Intro.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Chapter 7: Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Mapping and analysis for public safety: An Overview.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining.
Spatial Data Mining hari agung.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
Spatial Data Mining. Outline 1.Motivation, Spatial Pattern Families 2.Limitations of Traditional Statistics 3.Colocations and Co-occurrences 4.Spatial.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.
Data Mining and Decision Support
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
KNN & Naïve Bayes Hongning Wang
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Spatial Data Mining.
Spatial statistics: Spatial Autocorrelation
By Arijit Chatterjee Dr
CSSE463: Image Recognition Day 11
Location Prediction and Spatial Data Mining (S. Shekhar)
Data Mining Classification: Alternative Techniques
Outlier Discovery/Anomaly Detection
SEG 4630 E-Commerce Data Mining — Final Review —
CSSE463: Image Recognition Day 11
Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
CSSE463: Image Recognition Day 11
Spatial Data Mining: Three Case Studies
CSSE463: Image Recognition Day 11
CSE572: Data Mining by H. Liu
Data Mining Anomaly Detection
Data Mining Anomaly Detection
Presentation transcript:

C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota Group Members: Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu

C.T. LuSpatial Data Mining2 Outline Introduction Case 1: Location Prediction Case 2: Spatial Association: Co-location Case 3: Spatial Outlier Detection Conclusion and Future Directions

C.T. LuSpatial Data Mining3 Introduction: spatial data mining Spatial Databases are too large to analyze manually NASA Earth Observation System (EOS) National Institute of Justice – Crime mapping Census Bureau, Dept. of Commerce - Census Data Spatial Data Mining Discover frequent and interesting spatial patterns for post processing (knowledge discovery) Pattern examples: spatial outliers, location prediction, clustering, spatial association, trends,.. Historical Example London, 1854 Cholera & water pump

C.T. LuSpatial Data Mining4 Framework Problem statement: capture special needs Data exploration: maps Try reusing classical methods data mining, spatial statistics Invent new methods if reuse is not applicable Develop efficient algorithms Validation, Performance tuning

C.T. LuSpatial Data Mining5 Case 1: Location Prediction Problem: predict nesting site in marshes Given vegetation, water depth, distance to edge, etc. Data - maps of nests and attributes spatially clustered nests, spatially smooth attributes Classical method: logistic regression, decision trees, bayesian classifier but, independence assumption is violated ! Misses auto-correlation ! Spatial auto-regression (SAR) Open issues: spatial accuracy vs. classification accurary Open issue: performance - SAR learning is slow!

C.T. LuSpatial Data Mining6 Given: 1. Spatial Framework 2. Explanatory functions: 3. A dependent class: 4. A family of function mappings: Find: Classification model: Objective:maximize classification_accuracy Constraints: Spatial Autocorrelation exists Nest locations Distance to open water Vegetation durabilityWater depth Location Prediction

C.T. LuSpatial Data Mining7 Evaluation: Change Model Linear Regression Spatial Autoregression Model (SAR) y =  Wy + X  +  W models neighborhood relationships  models strength of spatial dependencies  error vector Mixed Spatial Autoregression Model (MSAR) y =  Wy + X  + WX  +  Consider the impact of the explanatory variables from the neighboring observations

C.T. LuSpatial Data Mining8 Measure: ROC Curve ROC Curve: Locus of the pair (TPR,FPR) for each cut-off probability Receiver Operating Characteristic (ROC) TPR = A n P n / (A n P n + A n P nn ) FPR = A nn P n / (A nn P n +A nn P nn ) Classification accuracy: confusion matrix

C.T. LuSpatial Data Mining9 Evaluation: Change Model Linear Regression Spatial Regression Spatial model is better

C.T. LuSpatial Data Mining10 Spatial Autoregression Model (SAR) y =  Wy + X  +  Solutions  and  - can be estimated using Maximum likelihood theory or Bayesian statistics. e.g., spatial econometrics package uses Bayesian approach using sampling-based Markov Chain Monte Carlo (MCMC) method. Maximum likelihood-based estimation requires O(n 3 ) ops. Solution Procedures

C.T. LuSpatial Data Mining11 Evaluation: Chang measure New measure: ADNP Average distance to nearest prediction Spatial accuracy (map similarity)

C.T. LuSpatial Data Mining12 Predicting Location using Map Similarity

C.T. LuSpatial Data Mining13 Predicting location using Map Similarity PLUMS components Map Similarity : Avg. Distance to Nearest Prediction(ADNP),.. Search Algorithm : Greedy, gradient descent Function family : generalized linear (GL)(logit, probit), non-linear, GL with auto-correlation Discretization of parameter space : Uniform, non-uniform, multi-resolution, …

C.T. LuSpatial Data Mining14 Association Rule Supermarket shelf management Goal: To identify items that are bought together by sufficiently many customers Approach: Process the point-of-scale data collected with barcode scanners to find dependencies among items (Transaction data) A classic rule – If a customer buys diaper and milk, then he is very likely to buy beer So, don’t be surprised if you find six-packs of beer stacked next to diapers!

C.T. LuSpatial Data Mining15 Association Rules:Support and confidence Item set I = {i1, i2, ….ik} Transactions T = {t1, t2, …tn} Association rule: A -> B Support S (A and B) occur in at least S percent of the transactions P (A U B) Confidence C : Of all the transactions in which A occurs, at least C percent of them contains B P (B|A)

C.T. LuSpatial Data Mining16 Case 2: Spatial Association Rule Problem: Given a set of boolean spatial features find subsets of co-located features, e.g. (fire, drought, vegetation) Data - continuous space, partition not natural Classical data mining approach: association rules But, No Transactions!!! No support measure!! Approach: Work with continuous data without transactionizing it! Participation index (support) : min. fraction of instances of a features in join result Confidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)] new algorithm using spatial joins

C.T. LuSpatial Data Mining17 Answers: and Can you find co-location patterns from the following sample dataset? Co-location

C.T. LuSpatial Data Mining18 Co-location Can you find co-location patterns from the following sample dataset?

C.T. LuSpatial Data Mining19 Spatial Co-location A set of features frequently co-located Given A set T of K boolean spatial feature types T={f 1,f 2, …, f k } A set P of N locations P={p 1, …, p N } in a spatial frame work S, p i  P is of some spatial feature in T A neighbor relation R over locations in S Find T c =  subsets of T frequently co-located Objective Correctness Completeness Efficiency Constraints R is symmetric and reflexive Monotonic prevalence measure Reference Feature Centric Window CentricEvent Centric Co-location

C.T. LuSpatial Data Mining20 Participation index Participation index = min{pr(f i, c)} Participation ratio pr(f i, c) of feature f i in co-location c = {f 1, f 2, …, f k } Fraction of instances of f i with feature {f 1, f 2, f i-1, f i+1,…, f k } nearby. Association rulesCo-location rules underlying spacediscrete setscontinuous space item-types events /Boolean spatial features collectionstransactionsneighborhoods Prevalence (A -> B)Support: P(A U B)Participation index Conditional probability (A ->B)Confidence: P[A|B]P [A in N(L) | B at L) Comparison with association rules Co-location

C.T. LuSpatial Data Mining21 Spatial Co-location Patterns Spatial feature A,B,C and their instances Possible associations are (A, B), (B, C), etc. Neighbor relationship includes following pairs: A1, B1 A2, B1 A2, B2 B1, C1 B2, C2 Dataset

C.T. LuSpatial Data Mining22 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Support (A,B) =2 (B,C)=2Support (A,B)=1 (B,C)=2 Partition approach [Yasuhiko, KDD 2001] Support not well defined i.e., not independent of execution trace Has a fast heuristic which is hard to analyze for correctness/completeness Dataset

C.T. LuSpatial Data Mining23 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Dataset Reference feature approach [Han SSD 95] Use C as reference feature to get transactions Transactions: (B1) (B2) Support (A,B) = Ǿ Note: Neighbor relationship includes following pairs: A1, B1 A2, B1 A2, B2 B1, C1 B2, C2

C.T. LuSpatial Data Mining24 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Our approach (Event Centric) Neighborhood instead of transactions Spatial join on neighbor relationship Support Participation index = Min ( p_ratio ) P_ratio(A, (A,B)) = fraction of instance of A participating in join(A,B, neighbor) Examples Support(A, B)=min(3/2,3/2)=1.5 Support(B, C)=min(2/2,2/2)=1 Dataset

C.T. LuSpatial Data Mining25 Spatial Co-location Patterns Spatial feature A,B, C, and their instances Support A,B =2 B,C=2 Support A,B=1 B,C=2 Support(A,B)=min(3/2,3/2)=1.5 Support(B,C)=min(2/2,2/2)=1 Partition approach Our approach Dataset Reference feature approach C as reference feature Transactions: (B1) (B2) Support (A,B) = Ǿ

C.T. LuSpatial Data Mining26 Case 3: Spatial Outliers Detection Spatial Outlier: A data point that is extreme relative to it neighbors

C.T. LuSpatial Data Mining27 Application Domain: Traffic Data

C.T. LuSpatial Data Mining28 Spatial Outlier Detection Given A spatial framework SF consisting of locations s 1, s 2, …, s n An attribute function f : s i  R (R : set of real numbers) A neighborhood relationship N  SF  SF A neighborhood aggregation function : R N  R A difference function F diff : R  R  R Statistic test function ST : R  { True, False } Test is based on F diff (f, (f, N) Find O = {v i | v i  V, v i is a spatial outlier} Objective Correctness: The attribute values of v i is extreme, compared with its neighbors Computational efficiency

C.T. LuSpatial Data Mining29 An example of Spatial outlier

C.T. LuSpatial Data Mining30 Spatial Outlier Detection: Z s(x) approach Function: Declare x as a spatial outlier If

C.T. LuSpatial Data Mining31 Evaluation of Statistical Assumption Distribution of traffic station attribute f(x) looks normal Distribution of looks normal too!

C.T. LuSpatial Data Mining32 Different Spatial Outlier Test Spatial Statistic Approach Scatter plot approach(Luc Anselin 94’) Moran scatter plot approach (Luc Anselin 95’) Variogram cloud approach (Graphic)

C.T. LuSpatial Data Mining33 Scatter plot approach Given An attribute function f(x) A neighborhood relationship N(x) An aggregation function A difference function F diff : є = E(x) – (m  f(x) + b) Detect spatial outlier by Statistic test function ST :

C.T. LuSpatial Data Mining34 Graphical Spatial Outlier Test

C.T. LuSpatial Data Mining35 Original Data Graphical Spatial Tests

C.T. LuSpatial Data Mining36 A Unified Algorithm Separate two phases Model building Testing (a node or a set of nodes) Computation structure of model building Key insights: Spatial self join using N(x) relationship Algebraic aggregate functions can be computed in one disk scan of spatial join Computation structure of testing Single node: spatial range query Get-All-Neighbors(x) operation

C.T. LuSpatial Data Mining37 An example: Scatter plot Model building An attribute function f(x) Neighborhood aggregate function Distributive aggregate functions Algebraic aggregate functions where, Testing Difference function where Statistic test function

C.T. LuSpatial Data Mining38 Outlier Stations Detected

C.T. LuSpatial Data Mining39 Outlier Station Detected

C.T. LuSpatial Data Mining40 Conclusion and Future Directions Spatial domains may not satisfy assumptions of classical methods data: auto-correlation, continuous geographic space patterns: global vs. local, e.g., outliers vs. spatial outliers data exploration: maps and albums Open Issues patterns: hot-spots, spatial trends,… metrics: spatial accuracy (predicted locations), spatial contiguity(clusters) spatio-temporal dataset: spatial-temporal outliers scale and resolutions sentivity of patterns geo-statistical confidence measure for mined patterns

C.T. LuSpatial Data Mining41 Reference 1.S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th International Symposium on Spatial and Temporal Databases (SSTD01), July S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, S. Shekhar, C.T. Lu, P. Zhang, “Detecting Graph-based Saptial Outlier”, Intelligent Data Analysis, To appear in Vol. 6(3), S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's Spatial about Spatial Data Mining: Three Case Studies, as Chapter of Book: Data Mining for Scientific and Engineering Applications. V. Kumar, R. Grossman, C. Kamath, R. Namburu (eds.), Kluwer Academic Pub., 2001, ISBN Shashi Shekhar and Yan Huang, Multi-resolution Co-location Miner: a New Algorithm to Find Co-location Patterns in Spatial Datasets, Fifth Workshop on Mining Scientific Datasets (SIAM 2nd Data Mining Conference), April 2002

C.T. LuSpatial Data Mining42 Thank you !!!