 Introduction  Methods for Knowledge Discovery in Spatial Databases ◦ Generalization-Based Knowledge Discovery ◦ Methods Using Clustering ◦ Methods.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Mining Multiple-level Association Rules in Large Databases
Fast Algorithms For Hierarchical Range Histogram Constructions
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
1 Enviromatics Spatial database systems Spatial database systems Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Spring 2003Data Mining by H. Liu, ASU1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Spatial Mining.
Clustering II.
Spatial Information Systems (SIS) COMP Spatial queries and operations.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695.
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Data Mining By Archana Ketkar.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
FLANN Fast Library for Approximate Nearest Neighbors
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Spatial Database Souhad Daraghma.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
CS 376b Introduction to Computer Vision 04 / 29 / 2008 Instructor: Michael Eckmann.
Applied Cartography and Introduction to GIS GEOG 2017 EL
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Image Classification 영상분류
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Spatial Data Mining hari agung.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Spatial DBMS Spatial Database Management Systems.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han Pavan Podila COSC 6341, Fall ‘04.
Data Mining and Decision Support
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
Spatial Data Management
Data Mining – Intro.
What Is Cluster Analysis?
School of Computer Science & Engineering
Data Warehousing and Data Mining
DATA MINING Introductory and Advanced Topics Part II - Clustering
Clustering Wei Wang.
Nearest Neighbors CSC 576: Data Mining.
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

 Introduction  Methods for Knowledge Discovery in Spatial Databases ◦ Generalization-Based Knowledge Discovery ◦ Methods Using Clustering ◦ Methods Exploring Spatial Associations ◦ Using Approximation and Aggregation ◦ Mining in Image Databases  Future Directions  Conclusion  References

Our objectives: ◦ Describe existing spatial data mining methods ◦ Give a general perspective of the field’s current state ◦ Summarize the paper’s description of the state of spatial data mining in 1996.

 A spatial database is a database that is optimized to store and query data that is related to objects in space, including points, lines and polygons. While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types. These are typically called geometry or feature.

 Database systems use indexes to quickly look up values and the way that most databases index data is not optimal for spatial queries. Instead, spatial databases use a spatial index to speed up database operations.  In addition to typical SQL queries such as SELECT statements, spatial databases can perform a wide variety of spatial operations. A few examples are:  Spatial Measurements  Spatial Functions  Spatial Predicates  Constructor Functions  Observer Functions

 Data mining is a field at the intersection of computer science and statistics, it is the process that attempts to discover patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

 “Spatial data mining, or knowledge discovery in spatial database, refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases.” (Koperski and Han, 1995)  Data mining, or knowledge discovery in databases, refers to the “ discovery of interesting, implicit, and previously unknown knowledge from large databases.” (Frawley et al, 1992) WHAT’S THE DIFFERENCE?

DATA = an attribute of an object. SPATIAL DATA = Attribute data referenced to a specific location. The Attributes of spatial objects are: - Highly dependant on location - Often influenced by neighboring objects The (WHAT) dimension. (WHERE) & (WHAT)

ObjectWhat MilkIs cold YogurtIs warm ButterIs warm

ObjectWhereWhat MilkIn fridge (X 1,Y 1 ) Is cold MilkOn table (X 2,Y 2 ) Is warm ObjectWhereWhat MilkIn fridge (X 1,Y 1 ) Is warm YogurtIn fridge (X 1,Y 1 ) Is warm ButterIn fridge (X 1,Y 1 ) Is warm SPATIAL DATA MINING: THE FRIDGE IS BROKEN! Object() : Location() -> Characteristic() (Milk) : (In Fridge) -> [should be] -> (Cold)

 To understand spatial data  To discover relationships between spatial and non spatial data  To capture the general characteristics in a concise way  To build spatial knowledge-bases

 Various kinds of rules can be discovered from databases in general  Spatial Characteristic Rule: General description for spatial data.  Example: A rule describing the general price range of houses in various geographic regions in a city.

 Spatial Discriminant Rule: General description of the features discriminating or contrasting a class of spatial data from other classes.  Example: Comparison of price ranges of houses in different geographical regions.  Spatial Association Rule: Describes the implication of one or a set of features by another set of features in spatial databases.  Example: Associating the price range of the houses with nearby spatial features, like beaches.

 Thematic Maps:  Presents the spatial distribution of attributes.  Differs from general maps where the objective is to present the positions of objects.  Used for discovering different rules  2 ways to represent a thematic map: ◦ Raster ◦ Vector

 Image Databases:  Special kind of spatial databases where data almost entirely consists of images or pictures.  Used in remote sensing, medical imaging  Usually stored in form of grid arrays representing the image intensity in spectral ranges.

 Spatial Data Structures:  Consists of points, lines, rectangles, etc.  Spatial Computations:  Spatial join is one of the most expensive spatial operations.  Map overlay is an important operation for geographic information systems

SD mining algorithms must efficiently overcome:  The huge volume of spatial data  The complexity of spatial data types/structures  The complexity of spatial accessing/query methods  Expensive spatial processing operations m Object = HUGE S-DB GO! Where is: Citizen{Brad} Which highways cross Nat’n Park boundaries? = spatial JOIN

◦ Generalization-Based Knowledge Discovery ◦ Methods Using Clustering ◦ Methods Exploring Spatial Associations ◦ Using Approximation and Aggregation ◦ Mining in Image Databases

 Spatial Data ◦ Geometric (location, area, perimeter) ◦ Topological (adjacency, inclusion)  Non-Spatial Data ◦ Stored in a traditional database with an attribute that is a pointer to the spatial description (Aref et al )

 Need for background knowledge in the form of concept hierarchies : spatial and non- spatial ◦ Non-Spatial: ◦ Spatial: Counties -> Provinces -> Larger Regions

 Two generalization-baseds algorithms presented by Lu et al : ◦ Spatial Data Dominant Generalization:  Generalization of the spatial objects continues until the "spatial generalization threshold" is reached.  Non-spatial data is analyzed. ◦ Non-Spatial Data Dominant Generalization:  The non-spatial data is generalized into higher concept level.  Neighbouring areas with similar generalized attributes are merged.  Dependent upon the concept hierarchies and need for algorithms that does not use these hierarchies

 No need for a background information like concept hierarchies  Foundation of clusters directly from the data  A similar approach in machine learning is called "unsupervised learning"  Clustering algorithms: ◦ PAM (Kaufmann and Rousseeuw – 1990) ◦ CLARAN (Kaufmann and Rousseeuw – 1990) ◦ CLARANS (Ng and Han )  SD (CLARANS) – Spatial Dominant Approach  NSD (CLARANS) – Non-Spatial Dominant Approach

 n objects and k clusters  Selecting the most representative point for each cluster  Most centrally located point in a cluster – medoid  Computationally inefficient

29 Total Cost = K=2 Arbitrary choose k object as initial medoids Assign each remaining object to nearest medoids Randomly select a nonmedoid object,O ramdom Compute total cost of swapping Total Cost = 26 Swapping O and O ramdom If quality is improved. Do loop Until no change

 Very similar to PAM  Sampling is the difference  The idea: If the sample is selected in a random manner, it will be representing the data correctly and, therefore, CLARA can deal with larger data sets with respect to PAM.  The drawback: May not do the best clustering -> An object is a medoid and it is not selected when sampling

 Tries to mix PAM and CLARA  The idea: CLARA -> All stages with same sample CLARANS -> At each step a random sample  The chance to miss the potential solutions is decreased to minimum  Experimentally shown as more efficient than PAM and CLARA  Also, detection of outliers i.e points that are not belong to any cluster

 Based upon CLARANS, two spatial mining algorithms were developed: ◦ SD (CLARANS) – Spatial Dominant Approach ◦ NSD (CLARANS) – Non-Spatial Dominant Approach

 Clustering of objects spatially by CLARANS  Attribute-oriented induction on non-spatial description of the objects  Result: Description of each cluster by it’s relative non- spatial attributes  Spatial Cluster: Downtown Edmunton  Non-Spatial (Attribute) Cluster: 50% Commercial, 40% Residental, 10% Public Services

 Attribute-oriented generalization on non-spatial description of the objects  For each generalized tuple, clustering of the objects by CLARANS  In the case of an overlap between clusters, the merging of those clusters Spatial Cluster:  Region East of 50 St, South of 35 St,  North of Whitemud.  Non-Spatial (Attribute) Cluster:  Mostly Industrial

 Two Drawbacks: ◦ The assumption that all objects to be clustered are stored in main memory  Requirement of disk-based methods  Integration with spatial access methods like R* tree ◦ Efficiency of the algorithm can be improved by some modifications  Focusing on the representative objects when selecting medoids  Restricting the access to certain objects that do not actually contribute to the computation

 Presented because of the non-availability or time consuming construction of the R-Trees  An algorithm presented by Zhang et al. – 1996  Can be used with any clustering algorithm like CLARANS  Two concepts that are used in the algorithm: ◦ Clustering Feature  A triple summarizing information about subclusters of points ◦ CF Tree  A balanced tree that stores clustering features

(3,4) (2,6) (4,5) (4,7) (3,8) Clustering Feature: CF = (N, LS, SS) N: Number of data points LS:  N i=1 =X i SS:  N i=1 =X i 2 CF = (5, (16,30),(54,190))

 Desire to discover rules that associate spatial objects with other spatial objects  Association rules -> Agrawal et al. – 1993  Association rules for spatial databases -> Koperski and Han – 1995  Example Rule: is_a(x, school) -> close_to(x, park) (80%)  Other predicates: intersects, overlap, disjoint, left_of, west_of, close_to, far_away

 Support ( Support (X) ) Probability that a tuple contains X  Confidence ( X->Y ) Probability that a tuple having X also contains Y  Predefined thresholds (minimum support, minimum confidence) to determine associations  Strong Rule: A Rule that has a support, no less than minimum support; and has a confidence, no less than minimum confidence

 Methods that answer where the groups of data are  More interesting: Why the clusters are there?  Rephrased: What are the characteristics of the clusters in terms of the features that are close to them? C1 C2 C1 C2 Cluster C1 centered at X 1,Y 1 Cluster C2 centered at X 2,Y 2 45% of the objects in Clusters C1 & C2 are close to feature (River)

 CRH algorithm presented by Knorr and Ng – 1995  CRH: C (encompassing circle), R (isothetic rectangle), H (convex hull)  Using filters to reduce the candidate features from a large number of features  Input: Any number of features and one cluster  Output: List of relevant features and points located from a defined distance from features

 There have been studies in this field, on the automatic recognition and categorization of objects  Usual systems are composed of 3 components:  Data focusing  Feature extraction  Classifaction learning

 Data Focusing: Increases the overall efficiency of the system by first identifying the portion of the image being analyzed that is most likely to containt the target object  Feature Extraction: Extracts interesting features from the data. Pattern recognition methods are used  Classification Learning: Discriminates the target objects from other objects that look alike

 This system has about 80% accuracy.  It is difficult for experts to provide classifications with 100% certainty  False classifications can produce large errors because they are treated as negative examples

 Identified Future Directions for spatial data mining: ◦ Data Mining in Spatial Object Oriented DB ◦ Alternative Clustering Techniques:  Clustering overlapping objects, Fuzzy Clustering of Spatial Data ◦ Mining under uncertainty:  Evidential reasoning, Fuzzy sets approaches ◦ Spatial Data Deviation and Evolution Rules:  Rule application to data that changes over time ◦ Interleaved Generalization (spatial and non-spatial) ◦ Generalization of Temporal Spatial Data (data evolution) ◦ Parallel Data Mining (multi-processor systems) ◦ Spatial Data Mining Query Language ◦ Multidimensional Rule Visualization and Multiple Thematic Maps “The variety of yet unexplored topics and problems makes knowledge discovery in spatial databases an attractive and challenging research field.”

TopicCurrently Active Research Field References DM in Spatial Obj-Oriented DBYes >10 (1997 – 2012) Alternative / Fuzzy Spatial Clustering Yes 1996, >10 (1997 – 2012) Mining under uncertaintyYes >10 (1997 – 2012) Deviation / Evolution RulesYes >10 (1997 – 2012) Interleaved GeneralizationVague… Generalization of Temporal Spatial Data Yes >10 (1997 – 2012) Parallel Data Miningmerged Spatial Data Mining Query Language Yes: GeoMiner, GMQL, Spatial SQL 1991, 1994, 1996, 1997 (h&k), >10 (1997 – 2012) Visualization TopicsYes – esp. GIS >10 (1997 – 2012)

 Data Mining / Knowledge Discovery of Spatial Data is a large, active research area.  While it was a “young” field at the time this survey paper was written, it is quickly maturing in applications such as: ◦ Geographic Information Systems ◦ Medical Imaging ◦ Robotics Navigation

 deminsional-mbr.png  object-model-examples.png  projection.png   kota/Map_Service_01.jpg  kota/4D_Screen.jpg 

 Krzysztof Koperski, Junas Adhikary, Jiawei Han. Spatial Data Mining: Progress and Challenges Survey Paper. Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996

 W. G. Aref and H. Samet. Extending DBMS with Spatial Operations. In Proc. 2nd Symp. SSD’91, pp , Zurich, Switzerland, Aug  W. Lu, J. Han, and B. C. Ooi. Discovery of General Knowledge in Large Spatial Databases. In Proc. Far East Workshop on Geographic Information Systems pp , Singapore, June  L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons,  R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc Int. Conf. Very Large Data Bases, pp , Santiago, Chile, September  T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an Efficient Data Clustering Method for Very Large Databases. In Proc ACM-SIGMOD Int. Conf. Management of Data, Montreal, Canada, June  R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules Between Sets of Items in Large Databases. In Proc ACM-SIGMOD Int. Conf. Management of Data, pp , Washington, D.C., May  K. Koperski and J. Han. Discovery of Spatial Association Rules in Geogrpahic Information Databases. In Proc. 4th Int’l Symp. On Large Spatial Databases (SSD’95), pp , Portland, Maine, August  E. Knorr and R. T. Ng. Applying Computational Geometry Concepts to Discovering Spatial Aggregate Proximity Relationships. In Technical Report, University of British Columbia,  Paper Review and Slides by Brad Danielson