Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining.

Slides:



Advertisements
Similar presentations
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Advertisements

Spatial Autocorrelation using GIS
Spatial statistics Lecture 3.
Crime Mapping & Analysis William Jarvis & Ibrahim Sabek CSCI 5715 Prof. Shashi Shekhar Wilson, Ronald and Filbert, Katie. “Crime Mapping and Analysis.”
Brief Introduction to Spatial Data Mining Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial.
Fire Sync Data Analysis Christel’s Baby Steps to Temporal and Spatial Analyses.
Spatial Mining.
GIS and Spatial Statistics: Methods and Applications in Public Health
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Spatial Interpolation
Introduction to Spatial Data Mining
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Introduction to Spatial Data Mining 강홍구 데이타베이스연구실 CHAPTER 7.
Classical Techniques: Statistics, Neighborhoods, and Clustering.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
Panelist: Shashi Shekhar McKnight Distinguished Uninversity Professor University of Minnesota Cyber-Infrastructure (CI) Panel,
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Why Geography is important.
Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.
Data Mining – Intro.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Business Logic Abuse Detection in Cloud Computing Systems Grzegorz Kołaczek 1st International IBM Cloud Academy Conference Research Triangle Park, NC April.
Data Mining Techniques
Chapter 1: Introduction to Spatial Databases 1.1 Overview 1.2 Application domains 1.3 Compare a SDBMS with a GIS 1.4 Categories of Users 1.5 An example.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Chapter 7: Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Understanding El Nino and La Nina Aim: To understand the what this weather phenomena is and what conditions it brings (A.K.A – To understand a bloody difficult.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Mapping and analysis for public safety: An Overview.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Applications of Spatial Statistics in Ecology Introduction.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
Spatial Data Mining hari agung.
Geo479/579: Geostatistics Ch4. Spatial Description.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
So, what’s the “point” to all of this?….
Spatial Data Mining. Outline 1.Motivation, Spatial Pattern Families 2.Limitations of Traditional Statistics 3.Colocations and Co-occurrences 4.Spatial.
Local Spatial Statistics Local statistics are developed to measure dependence in only a portion of the area. They measure the association between Xi and.
PCB 3043L - General Ecology Data Analysis.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
THE NATURE OF SCIENCE VOCABULARY.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Spatial Data Mining.
Data Mining – Intro.
Spatial statistics: Spatial Autocorrelation
Introduction to Spatial Statistical Analysis
Chapter 2: The Pitfalls and Potential of Spatial Data
Summary of Prev. Lecture
Lecture 7: Introduction to Spatial Data Mining
Introduction to Spatial Data Mining
Waikato Environment for Knowledge Analysis
Topic 3: Cluster Analysis
Spatial data mining and geographic knowledge discovery
Outlier Discovery/Anomaly Detection
Why are Spatial Data Special?
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
Topic 5: Cluster Analysis
Spatial Data Mining: Three Case Studies
Presentation transcript:

Spatial Data Mining Satoru Hozumi CS 157B

Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining Learn techniques on how to find spatial patterns Learn techniques on how to find spatial patterns

Examples of Spatial Patterns 1855 Asiatic Cholera in London Asiatic Cholera in London. A water pump identified as the source. A water pump identified as the source. Cancer cluster to investigate health hazards. Cancer cluster to investigate health hazards. Crime hotspots for planning police patrol routes. Crime hotspots for planning police patrol routes. Affects of weather in the US caused by unusual warming of Pacific ocean (El Nino). Affects of weather in the US caused by unusual warming of Pacific ocean (El Nino).

What is a Spatial Pattern? What is not a pattern? What is not a pattern? Random, haphazard, chance, stray, accidental, unexpected. Random, haphazard, chance, stray, accidental, unexpected. Without definite direction, trend, rule, method, design, aim, purpose. Without definite direction, trend, rule, method, design, aim, purpose. What is a Pattern? What is a Pattern? A frequent arrangement, configuration, composition, regularity. A frequent arrangement, configuration, composition, regularity. A rule, law, method, design, description. A rule, law, method, design, description. A major direction, trend, prediction. A major direction, trend, prediction.

Defining Spatial Data Mining Search for spatial patterns. Search for spatial patterns. Non-trivial search – as “automated” as possible. Non-trivial search – as “automated” as possible. Large search space of plausible hypothesis Large search space of plausible hypothesis Ex. Asiatic cholera : causes water, food, air, insects. Ex. Asiatic cholera : causes water, food, air, insects. Interesting, useful, and unexpected spatial patterns. Interesting, useful, and unexpected spatial patterns. Useful in certain application domain Useful in certain application domain Ex. Shutting off identified water pump => saved human lives. Ex. Shutting off identified water pump => saved human lives. May provide a new understanding of the world May provide a new understanding of the world Ex. Water pump – Cholera connection lead to the “germ” theory. Ex. Water pump – Cholera connection lead to the “germ” theory.

What is NOT Spatial Data Mining Simple querying of Spatial Data Simple querying of Spatial Data Finding neighbors of Canada given names and boundaries of all countries (Search space not large) Finding neighbors of Canada given names and boundaries of all countries (Search space not large) Uninteresting or obvious patterns Uninteresting or obvious patterns Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart). Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart). Common knowledge, nearby places have similar rainfall Common knowledge, nearby places have similar rainfall Mining of non-spatial data Mining of non-spatial data Diaper sales and beer sales are correlated in evenings Diaper sales and beer sales are correlated in evenings

Families of Spatial Data Mining Patterns Location Prediction: Location Prediction: Where will a phenomenon occur? Where will a phenomenon occur? Spatial Interactions Spatial Interactions Which subset of spatial phenomena interact? Which subset of spatial phenomena interact? Hot spot Hot spot Which locations are unusual or share commonalities? Which locations are unusual or share commonalities?

Location Prediction Where will a phenomenon occur? Where will a phenomenon occur? Which spatial events are predictable? Which spatial events are predictable? How can a spatial event be predicted from other spatial events? How can a spatial event be predicted from other spatial events? Examples Examples Where will an endangered bird nest? Where will an endangered bird nest? Which areas are prone to fire given maps of vegitation and drought? Which areas are prone to fire given maps of vegitation and drought? What should be recommended to a traveler in a given location? What should be recommended to a traveler in a given location?

Spatial Interactions Which spatial events are related to each other? Which spatial events are related to each other? Which spatial phenomena depend on other phenomenon? Which spatial phenomena depend on other phenomenon? Examples Examples Earth science: Earth science: climate and disturbance => {wild fires, hot, dry, lightning} climate and disturbance => {wild fires, hot, dry, lightning} Epidemiology: Epidemiology: Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, mosquitoes} Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, mosquitoes}

Hot spots Is a phenomenon spatially clutered? Is a phenomenon spatially clutered? Which spatial entities are unusual or share common characteristics? Which spatial entities are unusual or share common characteristics? Examples Examples Crime hot spots to plan police patrols Crime hot spots to plan police patrols

Spatial Queries Spatial Range Queries Spatial Range Queries Find all cities within 50 miles of Paris Find all cities within 50 miles of Paris Query has associated region (location, boundary) Query has associated region (location, boundary) Answer includes overlapping or contained data regions Answer includes overlapping or contained data regions Nearest-Neighbor Queries Nearest-Neighbor Queries Find the 10 cities nearest to Paris Find the 10 cities nearest to Paris Results must be ordered by proximity Results must be ordered by proximity Spatial Join Queries Spatial Join Queries Find all cities near a lake Find all cities near a lake Join condition involves regions and proximity. Join condition involves regions and proximity.

Unique Properties of Spatial Patterns Items in a traditional data are independent of each other, where as properties of location in a map are often “auto-correlated” (patterns exist) Items in a traditional data are independent of each other, where as properties of location in a map are often “auto-correlated” (patterns exist) Traditional data deals with simple domains, e.g. numbers and symbols where as spatial data types are complex Traditional data deals with simple domains, e.g. numbers and symbols where as spatial data types are complex Items in traditional data describe discrete objects where as spatial data is continuous Items in traditional data describe discrete objects where as spatial data is continuous

Association Rules Support = the number of time a rule shows up in a database Support = the number of time a rule shows up in a database Confidence = Conditional probability of Y given X Confidence = Conditional probability of Y given X Example Example (Bedrock type = limestone), (soil depth (sink hole risk = high) (Bedrock type = limestone), (soil depth (sink hole risk = high) Support = 20 %, confidence = 0.8 Support = 20 %, confidence = 0.8 Interpretation: Locations with limestone bedrock and low soil depth have high risk of sink hole formation. Interpretation: Locations with limestone bedrock and low soil depth have high risk of sink hole formation.

Apriori Algorithm to mine association rules Key challenge Key challenge Very large search space Very large search space Key assumption Key assumption Few associations are support above given threshold Few associations are support above given threshold Associations with low support are not interesting Associations with low support are not interesting Key insight Key insight If an association item set has high support, then so do all its subsets If an association item set has high support, then so do all its subsets

Association rules Example

Techniques for Association Mining Classical method Classical method Association rules given item types and transactions Association rules given item types and transactions Assumes spatial data can be decomposed into transactions Assumes spatial data can be decomposed into transactions Such decomposition may alter spatial patterns Such decomposition may alter spatial patterns New spatial method New spatial method Spatial association rule Spatial association rule Spatial co-location Spatial co-location

Associations, Spatial associations, co-location

Associations, Spatial associatins, co- location

Co-location Rules For point data in space For point data in space Does not need transaction, works directly with continuous space Does not need transaction, works directly with continuous space Use neighborhood definition and spatial joins Use neighborhood definition and spatial joins

Co-location rules

Clustering Process of discovering groups in large databases Process of discovering groups in large databases Spatial view: rows in a database = points in a multi- dimentional space. Spatial view: rows in a database = points in a multi- dimentional space. Visualization may reveal interesting groups Visualization may reveal interesting groups

Clustering Hierarchical Hierarchical All points in one cluster All points in one cluster Split and merge till a stop criterion is reached Split and merge till a stop criterion is reached Partitional Partitional Start with random central point Start with random central point Assign points to nearest central point Assign points to nearest central point Update the central points Update the central points Approach with statistical rigor Approach with statistical rigor Density Density Find clusters based on density of regions Find clusters based on density of regions

Outliers Observations inconsistent with rest of the dataset Observations inconsistent with rest of the dataset Observations inconsistent with their neighborhoods Observations inconsistent with their neighborhoods A local instability or discontinuity A local instability or discontinuity

Variogram Cloud Create a variogram by plotting attribute difference, distance for each pair of points Create a variogram by plotting attribute difference, distance for each pair of points Select points common to many outlying pairs Select points common to many outlying pairs

Moran Scatter Plot Plot normalized attribute values, weighted average in the neighborhood for each location Plot normalized attribute values, weighted average in the neighborhood for each location Select points in upper left and lower right quadrant Select points in upper left and lower right quadrant

Scatter plot Plot normalized attribute values, weighted average in the neighborhood for each location Plot normalized attribute values, weighted average in the neighborhood for each location Fit a liner regression line Fit a liner regression line Select points which are unusually far from the regression line. Select points which are unusually far from the regression line.

Conclusion Patterns are opposite of random Patterns are opposite of random Common spatial patterns: Common spatial patterns: Location prediction Location prediction Feature interaction Feature interaction Hot spot Hot spot Spatial patterns may be discovered using: Spatial patterns may be discovered using: Techniques like associations, clustering and outlier detection Techniques like associations, clustering and outlier detection