Brief Introduction to Spatial Data Mining Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial.

Slides:



Advertisements
Similar presentations
Our Approach: Use a separate regression function for different regions. Problem: Need to find regions with a strong relationship between the dependent.
Advertisements

Spatial Autocorrelation using GIS
Spatial statistics Lecture 3.
Crime Mapping & Analysis William Jarvis & Ibrahim Sabek CSCI 5715 Prof. Shashi Shekhar Wilson, Ronald and Filbert, Katie. “Crime Mapping and Analysis.”
Geography The Science of Spatial and Descriptive Analysis.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
Spatial Mining.
Correlation and Autocorrelation
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Geographic Information Systems
Spatial Interpolation
Introduction to Spatial Data Mining
Information Systems and GIS Chapter 2 Slides from James Pick, Geo-Business: GIS in the Digital Organization, John Wiley and Sons, Copyright © 2008.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Introduction to Spatial Data Mining 강홍구 데이타베이스연구실 CHAPTER 7.
Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind © 2005 John Wiley and.
Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.
Panelist: Shashi Shekhar McKnight Distinguished Uninversity Professor University of Minnesota Cyber-Infrastructure (CI) Panel,
SA basics Lack of independence for nearby obs
Why Geography is important.
Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.
Geographic Information System Geog 258: Maps and GIS February 17, 2006.
PHYSICAL GEOGRAPHY: CONCEPTS AND PERSPECTIVES.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Studying Geography The Big Idea
Themes and Elements of Geography
How Geographers See the World
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Chapter 7: Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Mapping and analysis for public safety: An Overview.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Chapter 1 – A Geographer’s World
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
DMC-104: Geography and Environment
Applications of Spatial Statistics in Ecology Introduction.
 The World Unit 1.  How Geographers Look at the World Chapter 1.
Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining.
GEOGRAPHY AND THE SOCIAL STUDIES Steve Jennings Associate Professor University of Colorado Colorado Springs.
Geo479/579: Geostatistics Ch4. Spatial Description.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
Mapcube to Understand Traffic Patterns Shashi Shekhar Computer Science Department University of Minnesota (612)
So, what’s the “point” to all of this?….
Geographical Data and Measurement Geography, Data and Statistics.
Spatial Data Mining. Outline 1.Motivation, Spatial Pattern Families 2.Limitations of Traditional Statistics 3.Colocations and Co-occurrences 4.Spatial.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.
Geographic Perspective.  On a piece of paper, quick write what comes to your mind when you think about “geographic perspective”
Patterns and Trends CE/ENVE 424/524. Classroom Situation Option 1: Stay in Lopata House 22 pros: spacious room desks with chairs built in projector cons:
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Geog. 579: GIS and Spatial Analysis - Lecture 10 Overheads 1 1. Aspects of Spatial Autocorrelation 2. Measuring Spatial Autocorrelation Topics: Lecture.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Welcome to Presentation Plus! Geographic Setting Geographic Setting The surface of the earth varies from place to place in terms of its physical features,
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Chapter 1 – A Geographer’s World
Ch 1 A Geographer’s World
Geography Curriculum XI,XII.
GEOGRAPHICAL INFORMATION SYSTEM
Chapter 2: The Pitfalls and Potential of Spatial Data
Summary of Prev. Lecture
Spatial statistics Topic 4 2/2/2007.
Spatial interpolation
Why are Spatial Data Special?
Spatial Data Mining: Three Case Studies
Presentation transcript:

Brief Introduction to Spatial Data Mining Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets Reading Material: Spatial Statistics Software:

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Examples of Spatial Patterns Historic Examples (section 7.1.5, pp. 186) 1855 Asiatic Cholera in London: A water pump identified as the source Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzlle Modern Examples Cancer clusters to investigate environment health hazards Crime hotspots for planning police patrol routes Bald eagles nest on tall trees near open water Nile virus spreading from north east USA to south and west Unusual warming of Pacific ocean (El Nino) affects weather in USA

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Why Learn about Spatial Data Mining? Two basic reasons for new work Consideration of use in certain application domains Provide fundamental new understanding Application domains Scale up secondary spatial (statistical) analysis to very large datasets Describe/explain locations of human settlements in last 5000 years Find cancer clusters to locate hazardous environments Prepare land-use maps from satellite imagery Predict habitat suitable for endangered species Find new spatial patterns Find groups of co-located geographic features

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Why Learn about Spatial Data Mining? - 2 New understanding of geographic processes for Critical questions Ex. How is the health of planet Earth? Ex. Characterize effects of human activity on environment and ecology Ex. Predict effect of El Nino on weather, and economy Traditional approach: manually generate and test hypothesis But, spatial data is growing too fast to analyze manually Satellite imagery, GPS tracks, sensors on highways, … Number of possible geographic hypothesis too large to explore manually Large number of geographic features and locations Number of interacting subsets of features grow exponentially Ex. Find tele connections between weather events across ocean and land areas SDM may reduce the set of plausible hypothesis Identify hypothesis supported by the data For further exploration using traditional statistical methods

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Autocorrelation Items in a traditional data are independent of each other, whereas properties of locations in a map are often “auto-correlated”. First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. People with similar backgrounds tend to live in the same area Economies of nearby regions tend to be similar Changes in temperature occur gradually over space(and time) Waldo Tobler in 2000 Papers on “Laws in Geography”:

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Characteristics of Spatial Data Mining Auto correlation Patterns usually have to be defined in the spatial attribute subspace and not in the complete attribute space Longitude and latitude (or other coordinate systems) are the glue that link different data collections together People are used to maps in GIS; therefore, data mining results have to be summarized on the top of maps Patterns not only refer to points, but can also refer to lines, or polygons or other higher order geometrical objects Patterns exist at different levels of granularity Large number of patterns, large dataset sizes Spatial patterns, e.g. spatial clusters can have arbitrary shapes Regional knowledge is of particular importance due to lack of global knowledge in geography (  spatial heterogeniety)

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Why Regional Knowledge Important in Spatial Data Mining? A special challenge in spatial data mining is that information is usually not uniformly distributed in spatial datasets. It has been pointed out in the literature that “whole map statistics are seldom useful”, that “most relationships in spatial data sets are geographically regional, rather than global”, and that “there is no average place on the Earth’s surface” [Goodchild03, Openshaw99]. Therefore, it is not surprising that domain experts are mostly interested in discovering hidden patterns at a regional scale rather than a global scale. Michael Frank Goodchild

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Spatial Autocorrelation: Distance-based measure K-function Definition ( ) Test against randomness for point pattern λ is intensity of event Model departure from randomness in a wide range of scales Inference For Poisson complete spatial randomness (CSR): K(h) = πh 2 Plot Khat(h) against h, compare to Poisson CSR >: cluster <: decluster/regularity [ number of events within distance h of an arbitrary event ] K-Function based Spatial Autocorrelation

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) 9 Basic Approach Using K-Functions

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Example: Collocation Red and Green Objects FOR radii r 1,…,r n DO FOR all green objects g DO Compute #-of-red objects within radius r j of g ENDDO Compute average ro j of values observed in previous loop Put entry (r j, (ro j /total_number_of_red_objects)) into Curve ENDDO

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Answers: and find patterns from the following sample dataset? Associations, Spatial associations, Co-location

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Illustration of Cross-Correlation Illustration of Cross K-function for Example Data Cross-K Function for Example Data

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Colocation Rules – Spatial Interest Measures

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Cross-Correlation Cross K-Function Definition Cross K-function of some pair of spatial feature types Example Which pairs are frequently co-located Statistical significance [number of type j event within distance h of a randomly chosen type i event]

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Spatial Association Rules A special reference spatial feature Transactions are defined around instance of special spatial feature Item-types = spatial predicates Example: Table 7.5 (pp. 204)

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Participation index = min{pr(f i, c)} Where pr(f i, c) of feature f i in co-location c = {f 1, f 2, …, f k }: = fraction of instances of f i with feature {f 1, …, f i-1, f i+1, …, f k } nearby N(L) = neighborhood of location L Pr.[ A in N(L) | B at location L ]Pr.[ A in T | B in T ]conditional probability metric Neighborhood (N)Transaction (T)collection events /Boolean spatial featuresitem-types support discrete sets Association rulesCo-location rules participation indexprevalence measure continuous spaceUnderlying space Co-location rules vs. traditional association rules

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Conclusions Spatial Data Mining Spatial patterns are opposite of random Common spatial patterns: location prediction, feature interaction, hot spots, geographically referenced statistical patterns, co-location, emergent patterns,… SDM = search for unexpected interesting patterns in large spatial databases Spatial patterns may be discovered using Techniques like classification, associations, clustering and outlier detection New techniques are needed for SDM due to Spatial Auto-correlation Importance of non-point data types (e.g. polygons) Continuity of space Regional knowledge; also establishes a need for scoping Separation between spatial and non-spatial subspace—in traditional approaches clusters are usually defined over the complete attribute space Knowledge sources are available now Raw knowledge to perform spatial data mining is mostly available online now (e.g. relational databases, Google Earth) GIS tools are available that facilitate integrating knowledge from different source

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Spatial Regression Spatial Regression.pptx

Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN)) Example Videos Discussing Spatial Analysis What is GIS? (Geo- graphically weighted regression software advertisement video) (Spatial Analysis and Remote Sensing Degree at UA) ArcGIS Spatial Analyst Overviewhttp:// (ArcGIS 9.3: Advanced planning and analysis - Part 1) (Example using Spatial Analysis to Analyze Medical Data; the video is not really that “great”; if you know a better one share it with us! ACM GIS Conference, discusses advances in Geographical Information Systems and related areas Houston Area GIS Day Nov. 10, 2011