Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.

Slides:

Advertisements

Similar presentations

Recap: Mining association rules from large datasets

Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

A distributed method for mining association rules

Data Mining Techniques Association Rule

Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Frequent Closed Pattern Search By Row and Feature Enumeration

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.

TEMPORAL ASSOCIATION RULE MINING

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.

DATA MINING -ASSOCIATION RULES-

Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.

Fast Algorithms for Association Rule Mining

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong.

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.

Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.

Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.

Data Mining Find information from data data ? information.

DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

Spatial Data Mining. Outline 1.Motivation, Spatial Pattern Families 2.Limitations of Traditional Statistics 3.Colocations and Co-occurrences 4.Spatial.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Spatial Data Mining.

Data Mining Find information from data data ? information.

G10 Anuj Karpatne Vijay Borra

Frequent Pattern Mining

CARPENTER Find Closed Patterns in Long Biological Datasets

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

DIRECT HASHING AND PRUNING (DHP) ALGORITHM

Association Rule Mining

A Parameterised Algorithm for Mining Association Rules

Mining Association Rules from Stars

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.

Spatial Data Mining: Three Case Studies

15-826: Multimedia Databases and Data Mining

Fraction-Score: A New Support Measure for Co-location Pattern Mining

Association Analysis: Basic Concepts

Presentation transcript:

Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003

Outline Introduction Motivation Related work Formal definition Algorithms Experiments Conclusion

Introduction KDD Discovery of interesting, implicit, and previously unknown knowledge from large databases [FPM91] Spatial data mining Extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases [KH95]

Feature of Spatial Data Mining Spatial autocorrelation Everything is related to everything else but nearby things are more related than distant things (Tobler, 1979) Spatial heterogeneity The variation in spatial data is a function of location

Motivation A famous historical example In 1909, the residents of Colorado Springs were discovered that they had healthy teeth and the local drinking water had high level of fluoride. Researchers confirmed the positive role of fluoride in controlling tooth decay. {healthy teeth, high level of fluoride}

Motivation (Cont’) Another case [HSX02]

Related work Neighboring Class Sets Mining Co-location Pattern Mining

Neighboring Class Sets Access records of mobile services IDPositionServices… xxx(14975,27020)Weather… xxx(16723,24301)Timetable… xxx(15521,26441)Ticket… xxx……… (14737,26752)Timetable…

Neighboring Class Sets Neighboring class sets ((timetable,ticket),4), ((timetable,weather)3), ((ticket,weather),2), ((timetable,ticket,weather),2) [Mor01]

Neighboring Class Sets Grouping of points [Mor01]

Neighboring Class Sets Grouping of points [Mor01]

Neighboring Class Sets Grouping of points [Mor01]

Neighboring Class Sets Apriori generation of valid instances [Mor01]

Problems Undercount the number of instances Depend on the order of classes to generate instances for k-neighboring class set (k>2) Provide an absolute number to be support threshold

Co-location Patterns Mining Co-location: a subset of Boolean features E.g.: (drought, EL Nino, substantial increase in vegetation, extremely high precipitation)

Co-location Patterns Mining Row instance I ={i 1,i 2,…,i k } of a co- location C={f 1,f 2,…,f k }: i j is an instance of f j (j = 1,2,…k) i p and i q are neighbors to each other (A.1,B.1) is a row instance of co-location {A,B} Table instance T of C is the set of all row instances of C {(A.1,B.1), (A.2,B.4), (A.3,B.4)} is table instance of {A,B}

Co-location Patterns Mining Participant ratio for feature f i : Pr({A,B},A}=3/4=75%, Pr({A,B},B}=2/5=40% Participant index of a co-location C: Pi({A,B})=min(0.75,0.4)=0.4

Co-location Pattern Mining Co-location rule: C 1  C 2 (p,cp) C 1 and C 2 are co-locations C 1  C 2 =  p: participant index, cp: conditional probability {A}  {B}(40%, 75%) Conditional probability of a co-location rule:

Co-location Patterns Mining Apriori-property Participant index is monotonically non- increasing as the size of the co-location increasing Apriori-like mining algorithm Candidate generation Instances generation

Co-location Patterns Mining Candidate generation Join Prune

Co-location Patterns Mining Instance generation Geometric approach Rtree join Combinatorial approach Sort-merge join Hybrid approach Rtree join to get instances for size 2 co-location Sort-merge join to get instances for size k(k>2) co-location

Co-location Patterns Mining Example

Problems The participant index measure may overate some co-location The features are binary Pr({A,B},A)=2/8=25% Pr({A,B},B)=6/6=100% Pi({A,B})=min(25%,100%)=25% {B}  {A}(25%, 100%) {A}  {B}(25%, 25%) Probability({A,B})=7/(8*6)  15%

Spatial Congeries Patterns Mining Input: D = {D 1,D 2,…,D n } Spatial relation to regulate the relation of objects in patterns min_fre threshold to determine whether an itemset is frequent Output: Complete set of Spatial Congeries patterns

Spatial Congeries Patterns Mining Example of datasets *Attribute values can be translated to categorical values ** {VD:10 WD:shallow DOP: near NL:existent} can be a pattern IDAttributeTypeDescription D1Vegetation durabilityOrdinalOrdinate scale from 10 to 100 D2Water depthNumericIn centimeters D3Distance to open waterNumericIn meters D4Nest locationBinaryExistence or absence of bird nest

Formal Definition Item: an attribute value in a dataset. I is the set of all items. E.g.: water depth: shallow Itemset: subset of I E.g.: VD:10 WD:shallow DOP: near N:existent E.g.: VD:100 WD:depth DOP:far N:absent

Formal Definition Spatial relation: rule to regulate the spatial relation of objects in patterns Instances of an item i: points which has attribute value i Instances of an itemset: if instances of all items in the itemset satisfy the spatial relation, the combination of these instances is an instance of the itemset.

Observation The number of instances of itemsets is not monotonically non-increasing E.g.: an instance of {triangle, circle} can construct two instances of {triangle, circle, rectangle} Conclusion: the number of instances of an itemset can be used to be the measure to determine whether the itemset is a pattern

Formal Definition Frequency of an itemset: Number of instances of the itemset over all possible combinations of instances of items E.g.: Frequency({A,B})=7/(8*6)  15%

Formal Definition Spatial Congeries pattern: If the frequency of an itemset is no less than frequency threshold min_fre, the itemset is a Spatial Congeries pattern.

Property of Frequency Lemma: the frequency of an itemset is monotonically non-increasing with the size of the itemset increasing. Proof: (simplified) For size k-1 itemset I k-1 ={v 1, v 2,…, v k-1 } and size k itemset I k = {v 1, v 2,…, v k-1, v k } *m q is the number of instances of I q **n q is the number of instances of item v q.

Algorithm-1 Step 1: generate complete set of size 2 patterns by Rtree-join on complete Rtrees

Algorithm-1 Step 1: generate complete set of size 2 patterns by Rtree-join on complete Rtrees

Algorithm-1 Step 1: generate complete set of size 2 patterns by Rtree-join on complete Rtrees

Algorithm-1 Step 1: generate complete set of size 2 patterns by Rtree-join on complete Rtrees

Algorithm-1 Step 2:generate size k (k>2) patterns level by level Generate size k (k>2) candidates Join two size k-1 patterns Prune those candidates which have subsets that are not frequent Generate size k (k>2) instances

Sample Square: a1 Triangle: a2 Circle: b1 Diamond: c1 a2Y5X5 a1Y4X4 a1Y3X3 a2Y2X2 a1Y1X1 b1Y8X8 b1Y7X7 b1Y6X6 c1Y9X9 Datasets A Datasets B Datasets C

Process of Algorithm-1 RJ to find the instances of size 2 candidates Build Rtree for each dataset A, B and C Do RJ find the instances of size 2 candidates m a1b1 = 5, m a2b1 =3, m a1c1 = 2, m a2c1 = 0, m b1c1 = 0 Get size 2 patterns a1b1, a2b1,a1c1 according to the frequency threshold 50% f a1b1 = 5/(3*3)  56%, f a2b1 = 3/(2*3) = 50%, f a1c1 = 2/(3*1)  67%, f a2c1 = 0 f b1c1 = 0

Process of Algorithm-1 Sort-merge-join to find the instances of size k (k>2) candidates Generate size 3 candidates Join size 2 pattern a1b1 and a1c1 to form a1b1c1 Prune a1b1c1 because b1c1 is not a pattern Get size 3 patterns ( there is no size 3 patterns)

Algorithm-2 Step 1:generate all patterns for a combination of subsets. Each subset corresponds to an item. All points in the subset have the item as their attribute value. E.g.: The first combination is a1b1c1. It needs to build rtrees for subsets of a1, b1, c1 in order to generate size 2 patterns. Then it do sort-merge join to generate size k(k>2) patterns. Step 2: generate all patterns for another combination until there is no combination E.g.: The second combination is a2b1c1.

Process of Algorithm-2 Generate patterns for combination a1b1c1 RJ on Rtrees for a1, b1 and c1 to get instances of candidates a1b1, a1c1, b1c1 Suppose a1b1 and a1c1 are patterns, size 3 candidates is a1b1c1 Sort-merge-join to get instances of a1b1c1 Generate patterns for combination a2b1c1 RJ on Rtrees for a2, b1, c1 to get instances of candidates a2b1 and a2c1. Because the instances of b1c1 has been generated, there is no need to do it again Suppose a2b1 is pattern There is no size 3 candidate

Experiment Environment CPU type: Pentium III Xeon 700MHz RAM: 4096M Dataset Synthetic dataset with Gauss distribution No. of clusters: 5 Map size: 800 E.g.: (622, 478, 5) is a point in a dataset

Experiment-1 *No. of Datasets: 3 *No. of Attribute Values: 5 *Distance threshold : 100 *Frequency threshold: 0.01

Experiment-1 *No. of Datasets: 3 *No. of Attribute Values: 5 *Distance threshold : 100 *Frequency threshold: 0.01

Experiment-1 *No. of Datasets: 3 *No. of Attribute Values: 5 *Distance threshold : 100 *Frequency threshold: 0.01

Experiment-2 *No. of Points in each datasets: 1000 *No. of Attribute Values: 5 *Distance threshold : 100 *Frequency threshold: 0.01

Experiment-3 *No. of Datasets: 5 *No. of Points in each datasets: 1000 *No. of Attribute Values: 5 *Distance threshold: 100

Experiment-4 *No. of Datasets: 3 *No. of Points in each datasets: 1000 *No. of Attribute Values: 5 *Frequency threshold: 0.01

Conclusions Neighboring class set mining and co-location pattern mining problem are introduced Spatial Congeries pattern mining is formulated and provided with two Apriori-like mining algorithms Future work: More pruning methods should be used to reduce the time and space requirement The experiments should be done on real datasets

References [HSX02] Huang Y., Shekhar S., Xiong H. Discovering Co-location Patterns from Spatial Datasets: A General Approach. Submited to IEEE TKED (under second round review) [HXSP03] Huang Y., Xiong H., Shekar S., Pei J. Mining Confident Co-location Rules without A Support Threshold. Proc. of 18 th ACM Symposium on Applied Computing (ACM SAC), 2003 [Mor01] Morimoto Y. Mining Frequent Neighboring Class Sets in Spatial Databases. Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2001.

Q&A