Spatial Data Mining: Three Case Studies For additional details www.cs.umn.edu/~shekhar/problems.html Shashi Shekhar, University of Minnesota Presented.

Slides:



Advertisements
Similar presentations
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
Advertisements

Brief Introduction to Spatial Data Mining Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial.
Spatio – Temporal Outlier Detection in Environmental Data
A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS HPDM 2004 Workshop at SIAM Data Mining Conference Barış.
Cascading Spatio-Temporal Pattern Discovery P. Mohan, S.Shekhar, J. Shine, J. Rogers CSci 8715 Presented by: Atanu Roy Akash Agrawal.
Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining Sanjay Chawla(Vignette Corporation) Shashi Shekhar, Weili Wu(CS,
C.T. LuSpatial Data Mining1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University.
Group Members Faculty : Professor Shashi Shekhar Professor Mohamed Mokbel Students : Mete Celik Betsy George James Kang Sangho Kim Xiaojia Li Qingsong.
Introduction to Spatial Data Mining
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Introduction to Spatial Data Mining 강홍구 데이타베이스연구실 CHAPTER 7.
(Geo) Informatics across Disciplines! Why Geo-Spatial Computing? Societal: Google Earth, Google Maps, Navigation, location-based service Global Challenges.
Spatial Data Mining: Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
Panelist: Shashi Shekhar McKnight Distinguished Uninversity Professor University of Minnesota Cyber-Infrastructure (CI) Panel,
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Why Geography is important.
Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.
Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)
University of Minnesota Department of Computer Science and Engineering Directed by Professor Shashi Shekhar Department of Computer Science and Engineering.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Data Mining Chun-Hung Chou
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Army High Performance Computing Research Center Prof. Shashi Shekhar Computational Sciences & Engineering for Defense Technology Applications Enabling.
Chapter 7: Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
COMP3503 Intro to Inductive Modeling
Mapping and analysis for public safety: An Overview.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
Machine Learning for Spatio-temporal Datasets and Remote Sensing Remote Sensing for Climate Modeling Physics-based feature detectors combined via machine.
1 Agenda Today We will discuss a few interesting spatial data mining patterns Then come back to summarize what we have learned in this course so far.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining.
Spatial Data Mining hari agung.
Page  1 LAND COVER GEOSTATISTICAL CLASSIFICATION FOR REMOTE SENSING  Kęstutis Dučinskas, Lijana Stabingiene and Giedrius Stabingis  Department of Statistics,
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
Spatial Data Mining. Outline 1.Motivation, Spatial Pattern Families 2.Limitations of Traditional Statistics 3.Colocations and Co-occurrences 4.Spatial.
Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.
Data Mining and Decision Support
Spatial Computing Shashi Shekhar McKnight Distinguished University Professor Dept. of Computer Sc. and Eng. University of Minnesota
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Mining Statistically Significant Co-location and Segregation Patterns.
Experience Report: System Log Analysis for Anomaly Detection
Spatial Data Mining.
Data Mining – Intro.
A Black-Box Approach to Query Cardinality Estimation
Introduction to Spatial Data Mining
Location Prediction and Spatial Data Mining (S. Shekhar)
Waikato Environment for Knowledge Analysis
Jiawei Han Department of Computer Science
(Geo) Informatics across Disciplines!
Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai
Classification and Prediction
Spatial Data Mining: Three Case Studies
CSE572: Data Mining by H. Liu
Presentation transcript:

Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented to UCGIS Summer Assembly 2001

Background NSF workshop on GIS and DM (3/99) Spatial data [1, 8] - traffic, bird habitats, global climate, logistics,... For spatial patterns - outliers, location prediction, associations, sequential associations, trends, …

Framework Problem statement: capture special needs Data exploration: maps, new methods Try reusing classical methods –from data mining, spatial statistics If reuse is not possible, invent new methods Validation, Performance tuning

Case 1: Spatial Outliers Problem: stations different from neighbors [SIGKDD 2001] Data - space-time plot, distr. Of f(x), S(x) Distribution of base attribute: –spatially smooth –frequency distribution over value domain: normal Classical test - Pr.[item in population] is low –Q? distribution of diff.[f(x), neighborhood agg{f(x)}] –Insight: this statistic is distributed normally! –Test: (z-score on the statistics) > 2 –Performance - spatial join, clustering methods

Spatial outlier detection [4] Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor relationship (K neighbors) An attribute function f: V -> R An aggregation function f aggr : R k -> R Confidence level threshold  Find O = {v i | v i  V, v i is a spatial outlier} Objective Correctness: The attribute values of v i is extreme, compared with its neighbors Computational efficiency Constraints Attribute value is normally distributed Computation cost dominated by I/O op.

Spatial outlier detection Spatial Outlier Detection Test 1. Choice of Spatial Statistic S(x) = [f(x)–E y  N(x) (f(y))] Theorem: S(x) is normally distributed if f(x) is normally distributed 2. Test for Outlier Detection | (S(x) -  s ) /  s | >  Hypothesis I/O cost determined by clustering efficiency f(x)S(x) Spatial outlier and its neighbors

Spatial outlier detection Results 1. CCAM achieves higher clustering efficiency (CE) 2. CCAM has lower I/O cost 3. Higher CE leads to lower I/O cost 4. Page size improves CE for all methods Z-order CCAM I/O costCE value Cell-Tree

Case 2: Location Prediction Citations: SIAM DM Conf. 2001, SIGKDD DMKD 2000 Problem: predict nesting site in marshes –given vegetation, water depth, distance to edge, etc. Data - maps of nests and attributes –spatially clustered nests, spatially smooth attributes Classical method: logistic regression, decision trees, bayesian classifier –but, independence assumption is violated ! Misses auto-correlation ! –Spatial auto-regression (SAR), Markov random field bayesian classifier –Open issues: spatial accuracy vs. classification accurary –Open issue: performance - SAR learning is slow!

Location Prediction [6, 7, 8] Given: 1. Spatial Framework 2. Explanatory functions: 3. A dependent function 4. A family of function mappings: Find: A function Objective:maximize classification_accuracy Constraints: Spatial Autocorrelation exists Nest locations Distance to open water Vegetation durability Water depth

Evaluation: Changing Model Linear Regression Spatial Regression Spatial model is better

Evaluation: Changing measure New measure:

Case 3: Spatial Association Rules Citation: Symp. On Spatial Databases 2001 Problem: Given a set of boolean spatial features –find subsets of co-located features, e.g. (fire, drought, vegetation) –Data - continuous space, partition not natural, no reference feature Classical data mining approach: association rules –But, Look Ma! No Transactions!!! No support measure! Approach: Work with continuous data without transactionizing it! –confidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)] –support: cardinality of spatial join of instances of fire, drought, dry veg. –participation: min. fraction of instances of a features in join result –new algorithm using spatial joins and apriori_gen filters

Co-location Patterns [2, 3] Answers: and Can you find co-location patterns from the following sample dataset?

Co-location Patterns Can you find co-location patterns from the following sample dataset?

Co-location Patterns Spatial Co-location A set of features frequently co-located Given A set T of K boolean spatial feature types T={f 1,f 2, …, f k } A set P of N locations P={p 1, …, p N } in a spatial frame work S, p i  P is of some spatial feature in T A neighbor relation R over locations in S Find T c =  subsets of T frequently co-located Objective Correctness Completeness Efficiency Constraints R is symmetric and reflexive Monotonic prevalence measure Reference Feature Centric Window CentricEvent Centric

Co-location Patterns Participation index Participation ratio pr(f i, c) of feature f i in co-location c = {f 1, f 2, …, f k }: fraction of instances of f i with feature {f 1, …, f i-1, f i+1, …, f k } nearby 2.Participation index = min{pr(f i, c)} Algorithm Hybrid Co-location Miner Association rulesCo-location rules underlying spacediscrete setscontinuous space item-types events /Boolean spatial features collectionstransactionsneighborhoods prevalence measuresupportparticipation index conditional probability measurePr.[ A in T | B in T ]Pr.[ A in N(L) | B at L ] Comparison with association rules

Conclusions & Future Directions Spatial domains may not satisfy assumptions of classical methods –data: auto-correlation, continuous geographic space –patterns: global vs. local, e.g. spatial outliers vs. outliers –data exploration: maps and albums Open Issues – patterns: hot-spots, blobology (shape), spatial trends, … –metrics: spatial accuracy(predicted locations), spatial contiguity(clusters) –spatio-temporal dataset –scale and resolutions sentivity of patterns –geo-statistical confidence measure for mined patterns

References 1.S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C.T. Liu, “Spatial Databases: Accomplishments and Research Needs”, IEEE Transactions on Knowledge and Data Engineering, Jan.-Feb S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th International Symposium on Spatial and Temporal Databases (SSTD01), July S. Shekhar, Y. Huang, and H. Xiong, “Performance Evaluation of Co-location Miner”, the IEEE International Conference on Data Mining (ICDM’01), Nov (submitted) 4.S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, S. Shekhar, S. Chawla, the book “Spatial Database: Concepts, Implementation and Trends”. (To be published in 2001) 6.S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu, and S. Chawla, “Spatial Contextual Classification and Prediction Models for Mining Geospatial Data”, IEEE Transactions on Multimedia, (Submitted) Some papers are available on the Web sites: