Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Spatial Data Mining 강홍구 2003-08-23 데이타베이스연구실 CHAPTER 7.

Similar presentations


Presentation on theme: "Introduction to Spatial Data Mining 강홍구 2003-08-23 데이타베이스연구실 CHAPTER 7."— Presentation transcript:

1 Introduction to Spatial Data Mining 강홍구 2003-08-23 데이타베이스연구실 CHAPTER 7

2 overview Our focus –Understanding of spatial data minng – 대용량 데이타베이스에서 관심있고 유용한 정보 패턴 을 발견하는 처리과정 Defining Spatial Data Mining –Search for spatial patterns –Non-trivial search - as “ automated ” as possible — reduce human effort –Interesting, useful and unexpected spatial pattern Important process of data mininig –Data extraction, data cleaning, feature selection, algorithm design and tuning, output analysis

3 Pattern discovery A Pattern can be a summary statistic –Mean, median, standard deviation of a dataset The promise of data mining is the ability to rapidly and automatically search for local and potentially high-utility patterns using computer algorithm

4 The Data-Mining Process

5 The Data-Mining Process (cont) Domain Expert (DE) –Identifies SDM goals, spatial dataset –Describe domain knowledge Data Mining Analyst (DMA) –Helps identify pattern families, SDM techniques to be used –Explain the SDM outputs to Domain Expert Joint effort –Feature selection –Selection of patterns for further exploration

6 Statistics and Data Mining Data mining is as a filter step before the application of rigorous statistical tools –Ex R-tree 의 MBR 탐색

7 Data Mining as a Search Problem Data mining algorithm searches a potentially large space of patterns to come up with candidate patterns Restriction is not completely unjustified

8 Unique Feature of Spatial Data Mining Everything is related to everything else, but nearby things are more related than distant things

9 Famous Historical Examples of Spatial Data Exploration 1855 Asiatic Cholera in London : A water pump identified as the source Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzlle

10 An Illustrative Application Domain –Scale up secondary spatial (statistical) analysis to very large datasets Red-winged blackbird 의 둥지 위치에 관련된 속성 들은 다양 –Find new spatial patterns Find groups of co-located geographic features

11 Measures of Spatial Form and Auto-correlation 공간은 연속적인 공간과 분리된 공간으로 나눌 수 있음 – 연속된 공간은 좌표로 분리된 공간은 객체로 식별함 Moran’s I : A Global Measure of Spatial Autocorrelation –Spatical autocorrelation In spatial statistics, An area within statistics devote to the analysis of spatial data

12 Spatial Statistical Model Statistical models are often used to represent the observation in term of random variables Point process –Point pattern 에서 pointer 의 공간 분산에 대한 모델 숲의 나무, 도시의 주유소 –lattices –geostatistics

13 The Data-Mining Trinity Location Prediction –Question addressed Where will a phenomenon occur? Which spatial events are predictable? How can a spatial event be predicted from other spatial events? –Equations, rules, other methods… –Examples: Where will an endangered bird nest ? Which areas are prone to fire given maps of vegetation, climate, etc.?

14 The Data-Mining Trinity(cont) Spatial Interactions –Question addressed Which spatial events are related to each other? Which spatial phenomenon depend on other phenomena? –Examples: Predator-Prey species, wolves, deer Symbiotic species, e.g. bees, flowering plants Event causation, e.g. vegetation, droughty weather, fire

15 The Data-Mining Trinity(cont) Hot spots –Question addressed Is a phenomenon spatially clustered? Which spatial entities or clusters are unusual? Which spatial entities share common characteristics? –Examples: Cancer clusters [CDC] to launch investigations Crime hot spots to plan police patrols –Defining unusual Comparison group: –neighborhood –entire population Significance: probability of being unusual is high

16 Classification techniques Classification is to find a function –f : D -> L –D 는 f 의 도메인 L 은 레이블의 집합

17 Mapping Techniques to Spatial Pattern Families Overview – There are many techniques to find a spatial pattern family – Choice of technique depends on feature selection, spatial data, etc. Spatial pattern families vs. Techniques – Location Prediction: Classification, function determination – Interaction : Correlation, Association – Hot spots: Clustering, Outlier Detection We discuss these techniques now –With emphasis on spatial problems –Even though these techniques apply to non-spatial datasets too

18 Location Prediction as a classification problem Given: –1. Spatial Framework –2. Explanatory functions –3. A dependent class –4. A family of function mappings Find: Classification model Objective: maximize classification_accuracy Constraints: Spatial Autocorrelation exists

19 Association rule discovery techniques Association rules are patterns of the form X->Y –Ex. Diapers->beer An association rule is characterized by two parameters :support and confidence –A => B 일때, Support : A and B occur in at least s percent of the transactions Confidence : of all the transactions in which A occurs, at least c percent of them contain

20 Associations, Spatial associations, Co-location find patterns from the following sample dataset?

21 Associations, Spatial associations, Co-location (cont) Answers: and

22 clustering Clustering –Process of discovering groups in large databases. –Spatial view: rows in a database = points in a multi- dimensional space Categories of clustering algorithms –Hierarchical clustering method –Partitional clustering algorithm –Density-based clustering algorithm –Grid-based clustering algorithm New spatial methods –Comparison with complete spatial random processes –Neighborhood EM

23 Algorithmic Ideas in Clustering Hierarchical –All points in one clusters –then splits and merges till a stopping criterion is reached Partitioning based –Start with random central points –assign points to nearest central point –update the central points –Approach with statistical rigor Density –Find clusters based on density of regions Grid-based –Quantize the clustering space into finite number of cells –use thresholding to pick high density cells –merge neighboring cells to form clusters

24 Idea of Outliers What is an outlier? –Observations inconsistent with rest of the dataset –Techniques for global outliers Statistical tests based on membership in a distribution –Pr.[item in population] is low Non-statistical tests based on distance, nearest neighbors, convex hull, etc. What is a spatial outlier? –Observations inconsistent with their neighborhoods –A local instability or discontinuity New techniques for spatial outliers –Graphical - Variogram cloud, Moran scatterplot –Algebraic - Scatterplot, Z(S(x))

25 Conclusions Patterns are opposite of random Common spatial patterns: –Location prediction: Classification, function determination –Feature interaction: spatial association, co-location, correlation –Hot spots: spatial outlier detection, clustering SDM = search for interesting, unexpected, and useful patterns or rules in large spatial databases Spatial patterns may be discovered using –Techniques like classification, associations, clustering, and outlier detection –New techniques are needed for SDM due to Spatial Auto-correlation Continuity of space


Download ppt "Introduction to Spatial Data Mining 강홍구 2003-08-23 데이타베이스연구실 CHAPTER 7."

Similar presentations


Ads by Google