Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Mining Multiple-level Association Rules in Large Databases
Spatial Data Mining-Applications Hemant Kumar Jerath,B.Tech. MS Project Student Mangalore University Advisors: Dr. B.K Mohan & Dr.(Mrs.).P. Venkatachalam.
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
Chapter 9. Mining Complex Types of Data
Spring 2003Data Mining by H. Liu, ASU1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Spatial Mining.
Clustering II.
Managing Data Resources
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Spatio-Temporal Databases
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Applied Cartography and Introduction to GIS GEOG 2017 EL
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Exploring ArcToolbox Presented by: Isaac Johnson.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Modeling Storing and Mining Moving Object Databases Proceedings of the International Database Engineering and Applications Symposium (IDEAS’04) Sotiris.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Spatial DBMS Spatial Database Management Systems.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
NR 143 Study Overview: part 1 By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han Pavan Podila COSC 6341, Fall ‘04.
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
Data Mining and Decision Support
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
CENTENNIAL COLLEGE SCHOOL OF ENGINEERING & APPLIED SCIENCE VS 361 Introduction to GIS SPATIAL OPERATIONS COURSE NOTES 1.
 Introduction  Methods for Knowledge Discovery in Spatial Databases ◦ Generalization-Based Knowledge Discovery ◦ Methods Using Clustering ◦ Methods.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
Data Mining – Intro.
Clustering CSC 600: Data Mining Class 21.
Physical Database Design
Clustering Wei Wang.
Nearest Neighbors CSC 576: Data Mining.
Text Categorization Berlin Chen 2003 Reference:
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT /11/2007

Introduction Authors objectives:  Describe and critique existing spatial data mining methods  Give readers a general perspective of the field’s current state  Make suggestions for future directions and growth potential of spatial data mining

Introduction My objectives:  Summarize the paper’s description of the state of spatial data mining in  Examine the predictions for future directions made by these authors.  Briefly examine the accuracy of these predictions by doing a topic search on spatial data mining research from 1997 to 2007.

Spatial Data Mining: Definition “Spatial data mining, or knowledge discovery in spatial database, refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases.” (Koperski and Han, 1995) Data mining, or knowledge discovery in databases, refers to the “ discovery of interesting, implicit, and previously unknown knowledge from large databases.” (Frawley et al, 1992) WHAT’S THE DIFFERENCE?

What is spatial data and why is mining it different that mining “normal” data? DATA = an attribute of an object. SPATIAL DATA = Attribute data referenced to a specific location. The Attributes of spatial objects are: - Highly dependant on location - Often influenced by neighboring objects The (WHAT) dimension. (WHERE) & (WHAT)

Spatial Data …? ObjectWhereWhat MilkIn fridge (X 1,Y 1 ) Is cold MilkOn table (X 2,Y 2 ) Is warm ObjectWhereWhat MilkIn fridge (X 1,Y 1 ) Is warm YogurtIn fridge (X 1,Y 1 ) Is warm ButterIn fridge (X 1,Y 1 ) Is warm SPATIAL DATA MINING: THE FRIDGE IS BROKEN! Object() : Location() -> Characteristic() (Milk) : (In Fridge) -> [should be] -> (Cold)

Why do Spatial Data Mining? To understand spatial data To discover relationships between spatial and non spatial data To capture the general characteristics in a concise way To build spatial knowledge-bases

Critical Challenge in Spatial Data Mining SD mining algorithms must efficiently overcome: The huge volume of spatial data The complexity of spatial data types/structures The complexity of spatial accessing/query methods Expensive spatial processing operations Object = HUGE S-DB GO! Where is: Citizen{Brad} Which highways cross Nat’n Park boundaries? = spatial JOIN

Spatial Data Mining Methods 1. Generalization Based Knowledge Discovery 2. Clustering Methods 3. Aggregate Proximity Measuring 4. Spatial Association Rules

Generalization-Based Knowledge Discovery Requires background knowledge of dataset, presented as concept hierarchies Developing these hierarchies conceptually similar to Hierarchical Clustering Hierarchies are either based on spatial attributes or non-spatial attributes. Queries about data can then be made at levels of generalization in the hierarchy.

Concept Hierarchies Non-spatial attribute hierarchy: Agricultural land use Spatial attribute hierarchy: Agricultural Land Size Divisions Quarter (.25 mile sq.) Section (1 mile sq.) Township (36 mile sq.) Level of generalization Dominion Land Survey Queries can be made at different levels of Generalization

Spatial vs Non-spatial Dominant Generalization Discrete precipitation values generalized into 7 classes. Spatial data were classified based on their fit to these precipitation attribute classes. Spatial objects (individual regions) are generalized (merged) until the desired zone size is reached. Non-spatial data falling into these zones is then generalized and reported on. Examples from (Koperski et al, 1996) Goal: produce high-level descriptions of the data. Thematic maps are effective ways to summarize the data and their spatial relationships.

Clustering Methods Goal: like Generalization, to reveal relationships between spatial and non-spatial attributes Techniques used are based on some clustering methods we examined in class:  PAM (k–medoids clustering)  CLARA (k-medoids, where medoids are chosen from a sample of a large DB)  CLARANS (mixture of PAM and CLARA, where new k-medoids are tested from new random samples) 2 spatial data mining variations of CLARANS:  SD(CLARANS): spatial dominant approach  NSD(CLARANS): non-spatial dominant approach

SD(CLARANS) 1. The spatial components of objects in the dataset are collected and clustered using CLARANS 2. Non-spatial description of the objects are brought into the resulting clusters. 3. Result: each cluster (defined by spatial boundaries) is described by it’s relative abundance of non-spatial attributes.

NSD(CLARANS) 1. Non-spatial attribute generalization produces k generalized attribute groups 2. The spatial components are clustered using CLARANS to find k clusters. 3. If these spatial clusters overlap, they may be merged, and their attribute descriptions merged as well. 4. Result: each cluster is described by a single attribute description

SD(CLARANS): Example Spatial Cluster: Downtown Edmonton Attributes: (Building types) 50% Commercial, 40% Residential, 10% Public Services

NSD(CLARANS): Example Attribute Cluster (Building types): Mostly Industrial Spatial Cluster: Region East of 50St & South of HW#16 & North of Whitemud &…

Clustering Methods: Improvements CLARANS is inefficient at calculating total distance between clusterings.  Integrate with more efficient spatial access methods developed for Spatial Databases Use cluster focusing to increase efficiency in finding locally optimized clusters:  Information about sub-clusters is summarized in tree structures  Sub-clusters (or nodes in the sub-cluster tree) can be tested when selecting a new medoid (center) during the recursive process of optimizing clusters  (mixture of Hierarchical Clustering and Partition Clustering)

Aggregate Proximity Measuring Clustering methods are effective at finding where groups of data are in a spatial DB BUT – it’s often more interesting to know WHY the clusters are there. C1 C2 Cluster C1 centered at X 1,Y 1 Cluster C2 centered at X 2,Y 2 C1 C2 45% of the objects in Clusters C1 & C2 are close to feature{River}

Aggregate Proximity Measuring Calculates the distance between a feature and the set of points in a cluster. This may reveal how outside features influence the cluster. C1 C2 Cluster C1 centered at X 1,Y 1 Cluster C2 centered at X 2,Y 2 C1 C2 45% of the objects in Clusters C1 & C2 are close to feature{River}

Aggregate Proximity Measuring – How? CRH Algorithm (Circle, Rectangle, convex Hull) Input: any number of local map features AND one cluster Uses filters of various geometries to find the characteristics of a cluster wrt nearby features The C and R filters prune candidate features, and only promising features are sent to the H filter. The H filter builds more accurate buffers around the selected features. Aggregate proximity is calculated between the points in the cluster and the buffers around the features of interest. Output: list of features with smallest aggregate proximity to points in the cluster, and percentages of points located within a defined distance from features.

Mining Spatial Association Rules Generalization and Clustering characterize spatial objects based on their non-spatial attributes. Spatial Association Rules are required to associate spatial objects with other spatial objects. Examples from paper: is_a(x,school) -> close_to(x, park)(80%) Reads: 80% of schools are close to parks. Suggested predicates for spatial association rules:  topological relations: intersects, overlap, disjoint  spatial orientations: left_of, west_of, etc.  distance information: close_to, far_from, etc min support and min confidence are required to filter out infrequent and weak rules.

Multi-level Mining of Spatial Association Rules Query: describe a set of objects using relations to other objects. is_a(x,school) -> adjacent_to(x, park) High Level mining:  Use coarse spatial predicates such as g_close_to (generalized close to)  Spatial objects satisfy g_close_to if the distance between their Minimum Bounding Rectangles is less than a threshold. Low Level mining:  More detailed, accurate predicates are used to build rules with objects which pass through from High Level. Apriori rational: if a pattern is not large at High Level, it will not be large at Low Level  This minimizes expensive spatial computations

Predictions, circa 1996 Identified Future Directions for spatial data mining:  Data Mining in Spatial Object Oriented DB  Alternative Clustering Techniques: Clustering overlapping objects, Fuzzy Clustering of Spatial Data  Mining under uncertainty: Evidential reasoning, Fuzzy sets approaches  Spatial Data Deviation and Evolution Rules: Rule application to data that changes over time  Interleaved Generalization (spatial and non-spatial)  Generalization of Temporal Spatial Data (data evolution)  Parallel Data Mining (multi-processor systems)  Spatial Data Mining Query Language  Multidimensional Rule Visualization and Multiple Thematic Maps “The variety of yet unexplored topics and problems makes knowledge discovery in spatial databases an attractive and challenging research field.”

Prediction Accuracy, 10 years later TopicCurrently Active Research Field References DM in Spatial Obj-Oriented DBYes >10 (1997 – 2007) Alternative / Fuzzy Spatial Clustering Yes 1996, >10 (1997 – 2007) Mining under uncertaintyYes – esp. Robo nav >10 (1997 – 2007) Deviation / Evolution RulesYes – mixed topics >10 (1997 – 2007) Interleaved GeneralizationVague… Generalization of Temporal Spatial Data Yes >10 (1997 – 2007) Parallel Data Miningmerged Spatial Data Mining Query Language Yes: GeoMiner, GMQL, Spatial SQL 1991, 1994, 1996, 1997 (h&k), >10 (1997 – 2007) Visualization TopicsYes – esp. GIS >10 (1997 – 2007)

Conclusion Data Mining / Knowledge Discovery of Spatial Data is a large, active research area. While it was a “young” field at the time this survey paper was written, it is quickly maturing in applications such as:  Geographic Information Systems  Medical Imaging  Robotics Navigation

References Krzysztof Koperski, Junas Adhikary, Jiawei Han. Spatial Data Mining: Progress and Challenges Survey Paper. Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996 K. Koperski and J. Han. Discovery of spatial association rules in geographic information databases. In Proc. 4th Int'l Symp. on Large Spatial Databases (SSD'95), pages , Portland, Maine, Aug W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus. Knowledge discovery in databases: An overview. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages Roddick, Hornsby, and Spiliopoulou. An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research, TSDM2000, LNAI 2007, pp. 147–163, 2001.c Springer-Verlag Berlin Heidelberg 2001