Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,

Similar presentations


Presentation on theme: "1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,"— Presentation transcript:

1 1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han, J. Simon Fraser University, Canada represented by Chung-hao Tan Nov.16.2000

2 2 Outlines What is data mining? What is spatial data mining? Generalization-based knowledge discovery. Clustering-based analysis. Exploring spatial association rules. Mining in image database. Future direction & conclusion.

3 3 What Is Data Mining? A short definition: “extracting implicit knowledge from large amount of data.” The form of discovered knowledge: –Regression and classification. –Association rules. –Clustering. What can be contributed by database research? –Efficient data access method (indexing). –Query optimizer. –Data integration. –… => Data Warehousing research provides a convenient platform for data mining.

4 4 An Example of Data Mining Technique Example: –Data: Stock trading data (price, size, number of trades, etc.). –Query: Given the current and past trading information, can you tell me whether it will go up or go down in the next minute? –Method: Bayesian CART model search (Chipman, 1997). => try to find a classification or regression tree to model the data. –Result: 1. Reduce the misclassification rate from 53% to 30%. 2. Identify those important classification rules. 3. Identify those important variables (predictors).

5 5 An Example of Data Mining Technique (Cont.)

6 6

7 7 What Is Spatial Data Mining? A short definition: Extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial database. Benefits: –Understand spatial data; query optimization. –Discover relationships between spatial data and non-spatial data. –Construction of spatial knowledge base (e.g. associations). Application: –GIS. –Image database exploration. –Robot navigation. –… (any applications which use spatial data).

8 8 Primitives of Spatial Data Mining Spatial characteristic rules: –A general description of spatial data. –E.g. price range of houses in various regions. Spatial discriminating rules: –A general description of comparison among spatial data. –E.g. a comparison of price ranges of houses in various regions. Spatial association rules: –Implication of one or a set of features by another set of features. –E.g. house near beach -> is expensive.

9 9 Primitives of Spatial Data Mining (Cont.) Thematic maps: –Present the spatial distribution of a single or a few attributes. –E.g. Temperature thematic map. –Data stored by raster image or vector image. Image database: –A special kind of spatial database where data almost entirely consists of image or pictures (e.g. satellite image or medical image). –These images have coordination properties.

10 10 Data Mining Architecture An example: (by Matheus, 1993)

11 11 Mining By Statistic Methods Methods: –Regression model. Disadvantage. –Assumption of statistical independence among the spatially distributed data. –Need experts’ domain knowledge (in spatial data). –Cannot model non-linear rules or symbolic values very well. –Do not work well with incomplete or inconclusive data.

12 12 Generalization-based Method Ideas: –Learning from examples. –Combined with generalization. Concept hierarchy. –Explicitly given by the domain experts. –Higher levels are more general terms. Attributed-oriented induction: –Performed by climbing the generalization hierarchies and summarizing the general relationships between spatial and non- spatial data at higher concept levels. –Until reaching a generalization threshold.

13 13 Spatial-data-dominant Generalization Ideas: –First step: Spatial-oriented induction. Merging spatial regions according to the spatial concept hierarchy. –Second step: Attribute-oriented induction. Non-spatial data at each merged regions are generalized at a given level by the threshold.

14 14 Non-spatial-data-dominant Generalization Ideas: –First step: Attribute-oriented induction. Non-spatial data are generalized at a given level by the threshold. –Second step: Spatial-oriented induction. Merging spatial regions which have the same non-spatial description. Ignore those small regions with different non-spatial descriptions but inside a large merged region.

15 15 Generalization-based Method (Cont.)

16 16 Clustering-based Method Ideas: –Clusters can be found without using any background knowledge. –Unsupervised learning. –Methods: PAM – Repeat to find a better k representatives by trying all possible pairs of combinations. CLARA – Same as PAM, but using a subset of data as samples. CLARANS – Same as PAM, but randomly changing the samples at each iteration.

17 17 SD-CLARANS Ideas: –First step: Spatial-oriented induction. Spatial-relevant data are collected and clustered. –Second step: Attributed-oriented induction. Find out the non-spatial description of objects in each cluster.

18 18 NSD-CLARANS Ideas: –First step: Attributed-oriented induction. Produce a number of generalized tulples. –Second step: Spatial-oriented induction. For each such generalized tuple, all spatial components are collected and clustered.

19 19 Other Issues In Clustering Need a fast access method to the spatial data (e.g. R*- tree). Focus on relevant data only. Using CF tree (for example) to store clustered results: –A tuple of data is incrementally inserted into the closet leaf node (a sub-cluster). –If the diameter of the sub-cluster exceeds a threshold after insertion, split that leaf node. –Each internal node contains a Clustering Feature (CF). CF = (N, LS, SS)N: #points in the sub-cluster. LS: linear sum of the N points. SS: square sum of the N points. –Linear scalability; insensibility to the input order; good quality of clustering.

20 20 Exploring Spatial Associations Example: –Is_a(x, school) -> close_to(x, park) 80%. –Topological relations: intersect, overlap, disjoint… –Spatial orientation: left_of, west_of… –Distance information: close_to, far_away… Minimum Support: –Ignore those rules with small number of evidences. –E.g. Ignore the relation associating only 5% house in that area and a single school. –Strong rule: A rule with large support (exceeds the minimum support threshold). Minimum Confidence: –Filter out those rules with low confidence. –E.g. Ignore the relations X->Y with only 5% confidence.

21 21 Multi-level Spatial Associations Rules Using tree to explore: –Collect task-relevant data. –Computation starts at high level of spatial predicates like close_to. –Utilize spatial indexing methods. –For those pattern that pass the filtering at the high levels, do further refinements at the lower levels, like adjacent_to, intersects, distance_less_than_x, etc. –Filter out those patterns that do not exceed Minimum Support Threshold or Minimum Confidence Threshold. –Derive the strong association rules!

22 22 Using Approximation and Aggregation Ideas: –Instead of asking “where the clusters in the spatial database?”, we want to know “what are the characteristics of the clusters in terms of the features that are close to them?” –E.g. “90% of the expensive house in a cluster are close to a lake”. –Using computational geometry concept. –First step: Eliminate unnecessary features. –Second step: Calculate the aggregate proximity of points in the cluster to the convex boundary of each features. –Experiment result: processing 50,000 features within 2 seconds.

23 23 Mining In Image Database Ideas: –Mining useful information in image database. –Example: Automatically identify volcano on the surface of Venus from images transmitted by the spacecraft. –Question: Is the above example related to spatial data mining research?

24 24 Future Directions Data mining in spatial object-oriented database. Mining under uncertainty. Alternative Clustering Techniques. Mining spatial data deviation and evolution rules. Using multiple thematic maps. Interleaved generalization. Generalization using temporal spatial data. Spatial Data Mining Query Language. Multidimensional rule visualization.

25 25 Conclusion What is spatial data mining? (Non-)Spatial-data-dominant generalization (Non-)Spatial-data-dominant clustering Spatial association rules Using approximation and aggregation Mining in image database


Download ppt "1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,"

Similar presentations


Ads by Google