1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Spatial Data Mining-Applications Hemant Kumar Jerath,B.Tech. MS Project Student Mangalore University Advisors: Dr. B.K Mohan & Dr.(Mrs.).P. Venkatachalam.
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
Chapter 9. Mining Complex Types of Data
Spring 2003Data Mining by H. Liu, ASU1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Spatial Mining.
TERMS, CONCEPTS and DATA TYPES IN GIS Orhan Gündüz.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695.
Spatio-Temporal Databases
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany.
Presented by Zeehasham Rasheed
Measurement-Based GIS Michael F. Goodchild University of California Santa Barbara.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Birch: An efficient data clustering method for very large databases
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
Spatial Data Mining hari agung.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Spatial DBMS Spatial Database Management Systems.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
1 Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han Pavan Podila COSC 6341, Fall ‘04.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Other Clustering Techniques
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
 Introduction  Methods for Knowledge Discovery in Spatial Databases ◦ Generalization-Based Knowledge Discovery ◦ Methods Using Clustering ◦ Methods.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Spatial Data Management
Data Mining – Intro.
Data Transformation: Normalization
DATA MINING Spatial Clustering
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining Concept Description
Data Warehousing and Data Mining
CS 685: Special Topics in Data Mining Jinze Liu
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Clustering Wei Wang.
Birch presented by : Bahare hajihashemi Atefeh Rahimi
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Data Pre-processing Lecture Notes for Chapter 2
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han, J. Simon Fraser University, Canada represented by Chung-hao Tan Nov

2 Outlines What is data mining? What is spatial data mining? Generalization-based knowledge discovery. Clustering-based analysis. Exploring spatial association rules. Mining in image database. Future direction & conclusion.

3 What Is Data Mining? A short definition: “extracting implicit knowledge from large amount of data.” The form of discovered knowledge: –Regression and classification. –Association rules. –Clustering. What can be contributed by database research? –Efficient data access method (indexing). –Query optimizer. –Data integration. –… => Data Warehousing research provides a convenient platform for data mining.

4 An Example of Data Mining Technique Example: –Data: Stock trading data (price, size, number of trades, etc.). –Query: Given the current and past trading information, can you tell me whether it will go up or go down in the next minute? –Method: Bayesian CART model search (Chipman, 1997). => try to find a classification or regression tree to model the data. –Result: 1. Reduce the misclassification rate from 53% to 30%. 2. Identify those important classification rules. 3. Identify those important variables (predictors).

5 An Example of Data Mining Technique (Cont.)

6

7 What Is Spatial Data Mining? A short definition: Extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial database. Benefits: –Understand spatial data; query optimization. –Discover relationships between spatial data and non-spatial data. –Construction of spatial knowledge base (e.g. associations). Application: –GIS. –Image database exploration. –Robot navigation. –… (any applications which use spatial data).

8 Primitives of Spatial Data Mining Spatial characteristic rules: –A general description of spatial data. –E.g. price range of houses in various regions. Spatial discriminating rules: –A general description of comparison among spatial data. –E.g. a comparison of price ranges of houses in various regions. Spatial association rules: –Implication of one or a set of features by another set of features. –E.g. house near beach -> is expensive.

9 Primitives of Spatial Data Mining (Cont.) Thematic maps: –Present the spatial distribution of a single or a few attributes. –E.g. Temperature thematic map. –Data stored by raster image or vector image. Image database: –A special kind of spatial database where data almost entirely consists of image or pictures (e.g. satellite image or medical image). –These images have coordination properties.

10 Data Mining Architecture An example: (by Matheus, 1993)

11 Mining By Statistic Methods Methods: –Regression model. Disadvantage. –Assumption of statistical independence among the spatially distributed data. –Need experts’ domain knowledge (in spatial data). –Cannot model non-linear rules or symbolic values very well. –Do not work well with incomplete or inconclusive data.

12 Generalization-based Method Ideas: –Learning from examples. –Combined with generalization. Concept hierarchy. –Explicitly given by the domain experts. –Higher levels are more general terms. Attributed-oriented induction: –Performed by climbing the generalization hierarchies and summarizing the general relationships between spatial and non- spatial data at higher concept levels. –Until reaching a generalization threshold.

13 Spatial-data-dominant Generalization Ideas: –First step: Spatial-oriented induction. Merging spatial regions according to the spatial concept hierarchy. –Second step: Attribute-oriented induction. Non-spatial data at each merged regions are generalized at a given level by the threshold.

14 Non-spatial-data-dominant Generalization Ideas: –First step: Attribute-oriented induction. Non-spatial data are generalized at a given level by the threshold. –Second step: Spatial-oriented induction. Merging spatial regions which have the same non-spatial description. Ignore those small regions with different non-spatial descriptions but inside a large merged region.

15 Generalization-based Method (Cont.)

16 Clustering-based Method Ideas: –Clusters can be found without using any background knowledge. –Unsupervised learning. –Methods: PAM – Repeat to find a better k representatives by trying all possible pairs of combinations. CLARA – Same as PAM, but using a subset of data as samples. CLARANS – Same as PAM, but randomly changing the samples at each iteration.

17 SD-CLARANS Ideas: –First step: Spatial-oriented induction. Spatial-relevant data are collected and clustered. –Second step: Attributed-oriented induction. Find out the non-spatial description of objects in each cluster.

18 NSD-CLARANS Ideas: –First step: Attributed-oriented induction. Produce a number of generalized tulples. –Second step: Spatial-oriented induction. For each such generalized tuple, all spatial components are collected and clustered.

19 Other Issues In Clustering Need a fast access method to the spatial data (e.g. R*- tree). Focus on relevant data only. Using CF tree (for example) to store clustered results: –A tuple of data is incrementally inserted into the closet leaf node (a sub-cluster). –If the diameter of the sub-cluster exceeds a threshold after insertion, split that leaf node. –Each internal node contains a Clustering Feature (CF). CF = (N, LS, SS)N: #points in the sub-cluster. LS: linear sum of the N points. SS: square sum of the N points. –Linear scalability; insensibility to the input order; good quality of clustering.

20 Exploring Spatial Associations Example: –Is_a(x, school) -> close_to(x, park) 80%. –Topological relations: intersect, overlap, disjoint… –Spatial orientation: left_of, west_of… –Distance information: close_to, far_away… Minimum Support: –Ignore those rules with small number of evidences. –E.g. Ignore the relation associating only 5% house in that area and a single school. –Strong rule: A rule with large support (exceeds the minimum support threshold). Minimum Confidence: –Filter out those rules with low confidence. –E.g. Ignore the relations X->Y with only 5% confidence.

21 Multi-level Spatial Associations Rules Using tree to explore: –Collect task-relevant data. –Computation starts at high level of spatial predicates like close_to. –Utilize spatial indexing methods. –For those pattern that pass the filtering at the high levels, do further refinements at the lower levels, like adjacent_to, intersects, distance_less_than_x, etc. –Filter out those patterns that do not exceed Minimum Support Threshold or Minimum Confidence Threshold. –Derive the strong association rules!

22 Using Approximation and Aggregation Ideas: –Instead of asking “where the clusters in the spatial database?”, we want to know “what are the characteristics of the clusters in terms of the features that are close to them?” –E.g. “90% of the expensive house in a cluster are close to a lake”. –Using computational geometry concept. –First step: Eliminate unnecessary features. –Second step: Calculate the aggregate proximity of points in the cluster to the convex boundary of each features. –Experiment result: processing 50,000 features within 2 seconds.

23 Mining In Image Database Ideas: –Mining useful information in image database. –Example: Automatically identify volcano on the surface of Venus from images transmitted by the spacecraft. –Question: Is the above example related to spatial data mining research?

24 Future Directions Data mining in spatial object-oriented database. Mining under uncertainty. Alternative Clustering Techniques. Mining spatial data deviation and evolution rules. Using multiple thematic maps. Interleaved generalization. Generalization using temporal spatial data. Spatial Data Mining Query Language. Multidimensional rule visualization.

25 Conclusion What is spatial data mining? (Non-)Spatial-data-dominant generalization (Non-)Spatial-data-dominant clustering Spatial association rules Using approximation and aggregation Mining in image database