Introduction to Spatial Data Mining 강홍구 2003-08-23 데이타베이스연구실 CHAPTER 7.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
Spatial Autocorrelation using GIS
Spatial statistics Lecture 3.
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
Brief Introduction to Spatial Data Mining Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial.
Fire Sync Data Analysis Christel’s Baby Steps to Temporal and Spatial Analyses.
Spring 2003Data Mining by H. Liu, ASU1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Spatial Mining.
Correlation and Autocorrelation
Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining Sanjay Chawla(Vignette Corporation) Shashi Shekhar, Weili Wu(CS,
Introduction to Spatial Data Mining
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Why Geography is important.
Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)
Data Mining – Intro.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Chapter 7: Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Spatial Statistics Applied to point data.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining.
Spatial Data Mining hari agung.
Geo479/579: Geostatistics Ch4. Spatial Description.
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
So, what’s the “point” to all of this?….
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Mining Statistically Significant Co-location and Segregation Patterns.
Data Mining – Intro.
Introduction to Spatial Statistical Analysis
Introduction to Spatial Data Mining
Location Prediction and Spatial Data Mining (S. Shekhar)
Topic 3: Cluster Analysis
Spatial statistics Topic 4 2/2/2007.
Outlier Discovery/Anomaly Detection
Clustering Wei Wang.
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
Topic 5: Cluster Analysis
Spatial Data Mining: Three Case Studies
Presentation transcript:

Introduction to Spatial Data Mining 강홍구 데이타베이스연구실 CHAPTER 7

overview Our focus –Understanding of spatial data minng – 대용량 데이타베이스에서 관심있고 유용한 정보 패턴 을 발견하는 처리과정 Defining Spatial Data Mining –Search for spatial patterns –Non-trivial search - as “ automated ” as possible — reduce human effort –Interesting, useful and unexpected spatial pattern Important process of data mininig –Data extraction, data cleaning, feature selection, algorithm design and tuning, output analysis

Pattern discovery A Pattern can be a summary statistic –Mean, median, standard deviation of a dataset The promise of data mining is the ability to rapidly and automatically search for local and potentially high-utility patterns using computer algorithm

The Data-Mining Process

The Data-Mining Process (cont) Domain Expert (DE) –Identifies SDM goals, spatial dataset –Describe domain knowledge Data Mining Analyst (DMA) –Helps identify pattern families, SDM techniques to be used –Explain the SDM outputs to Domain Expert Joint effort –Feature selection –Selection of patterns for further exploration

Statistics and Data Mining Data mining is as a filter step before the application of rigorous statistical tools –Ex R-tree 의 MBR 탐색

Data Mining as a Search Problem Data mining algorithm searches a potentially large space of patterns to come up with candidate patterns Restriction is not completely unjustified

Unique Feature of Spatial Data Mining Everything is related to everything else, but nearby things are more related than distant things

Famous Historical Examples of Spatial Data Exploration 1855 Asiatic Cholera in London : A water pump identified as the source Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzlle

An Illustrative Application Domain –Scale up secondary spatial (statistical) analysis to very large datasets Red-winged blackbird 의 둥지 위치에 관련된 속성 들은 다양 –Find new spatial patterns Find groups of co-located geographic features

Measures of Spatial Form and Auto-correlation 공간은 연속적인 공간과 분리된 공간으로 나눌 수 있음 – 연속된 공간은 좌표로 분리된 공간은 객체로 식별함 Moran’s I : A Global Measure of Spatial Autocorrelation –Spatical autocorrelation In spatial statistics, An area within statistics devote to the analysis of spatial data

Spatial Statistical Model Statistical models are often used to represent the observation in term of random variables Point process –Point pattern 에서 pointer 의 공간 분산에 대한 모델 숲의 나무, 도시의 주유소 –lattices –geostatistics

The Data-Mining Trinity Location Prediction –Question addressed Where will a phenomenon occur? Which spatial events are predictable? How can a spatial event be predicted from other spatial events? –Equations, rules, other methods… –Examples: Where will an endangered bird nest ? Which areas are prone to fire given maps of vegetation, climate, etc.?

The Data-Mining Trinity(cont) Spatial Interactions –Question addressed Which spatial events are related to each other? Which spatial phenomenon depend on other phenomena? –Examples: Predator-Prey species, wolves, deer Symbiotic species, e.g. bees, flowering plants Event causation, e.g. vegetation, droughty weather, fire

The Data-Mining Trinity(cont) Hot spots –Question addressed Is a phenomenon spatially clustered? Which spatial entities or clusters are unusual? Which spatial entities share common characteristics? –Examples: Cancer clusters [CDC] to launch investigations Crime hot spots to plan police patrols –Defining unusual Comparison group: –neighborhood –entire population Significance: probability of being unusual is high

Classification techniques Classification is to find a function –f : D -> L –D 는 f 의 도메인 L 은 레이블의 집합

Mapping Techniques to Spatial Pattern Families Overview – There are many techniques to find a spatial pattern family – Choice of technique depends on feature selection, spatial data, etc. Spatial pattern families vs. Techniques – Location Prediction: Classification, function determination – Interaction : Correlation, Association – Hot spots: Clustering, Outlier Detection We discuss these techniques now –With emphasis on spatial problems –Even though these techniques apply to non-spatial datasets too

Location Prediction as a classification problem Given: –1. Spatial Framework –2. Explanatory functions –3. A dependent class –4. A family of function mappings Find: Classification model Objective: maximize classification_accuracy Constraints: Spatial Autocorrelation exists

Association rule discovery techniques Association rules are patterns of the form X->Y –Ex. Diapers->beer An association rule is characterized by two parameters :support and confidence –A => B 일때, Support : A and B occur in at least s percent of the transactions Confidence : of all the transactions in which A occurs, at least c percent of them contain

Associations, Spatial associations, Co-location find patterns from the following sample dataset?

Associations, Spatial associations, Co-location (cont) Answers: and

clustering Clustering –Process of discovering groups in large databases. –Spatial view: rows in a database = points in a multi- dimensional space Categories of clustering algorithms –Hierarchical clustering method –Partitional clustering algorithm –Density-based clustering algorithm –Grid-based clustering algorithm New spatial methods –Comparison with complete spatial random processes –Neighborhood EM

Algorithmic Ideas in Clustering Hierarchical –All points in one clusters –then splits and merges till a stopping criterion is reached Partitioning based –Start with random central points –assign points to nearest central point –update the central points –Approach with statistical rigor Density –Find clusters based on density of regions Grid-based –Quantize the clustering space into finite number of cells –use thresholding to pick high density cells –merge neighboring cells to form clusters

Idea of Outliers What is an outlier? –Observations inconsistent with rest of the dataset –Techniques for global outliers Statistical tests based on membership in a distribution –Pr.[item in population] is low Non-statistical tests based on distance, nearest neighbors, convex hull, etc. What is a spatial outlier? –Observations inconsistent with their neighborhoods –A local instability or discontinuity New techniques for spatial outliers –Graphical - Variogram cloud, Moran scatterplot –Algebraic - Scatterplot, Z(S(x))

Conclusions Patterns are opposite of random Common spatial patterns: –Location prediction: Classification, function determination –Feature interaction: spatial association, co-location, correlation –Hot spots: spatial outlier detection, clustering SDM = search for interesting, unexpected, and useful patterns or rules in large spatial databases Spatial patterns may be discovered using –Techniques like classification, associations, clustering, and outlier detection –New techniques are needed for SDM due to Spatial Auto-correlation Continuity of space