Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.

Slides:



Advertisements
Similar presentations
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
Array Based Cancer Diagnostics: Gene Expression Profiling of DNA Microarray Data Abdoulaye Samb DPS 2005 Proceedings Student Research May 06, 2005.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Artificial Neural Network Applications on Remotely Sensed Imagery Kaushik Das, Qin Ding, William Perrizo North Dakota State University
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
MULTI-LAYERED SOFTWARE SYSTEM FRAMEWORK FOR DISTRIBUTED DATA MINING
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology.
Data Mining 1 Data Mining is one aspect of Database Query Processing (on the "what if" or pattern and trend end of Query Processing, rather than the "please.
Association Rule Mining (ARM)  We will look for common models for ARM/Classification/Clustering, e.g., R(K 1..K k,A 1..A n ) where K s are structure &
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
K-Nearest Neighbor Classification on Spatial Data Streams Using P-trees Maleq Khan, Qin Ding, William Perrizo; NDSU.
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
TEMPLATE DESIGN © Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer.
Association Rule Mining
Peano Count Trees and Association Rule Mining for Gene Expression Profiling using DNA Microarray Data Dr. William Perrizo, Willy Valdivia, Dr. Edward Deckard,
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Accelerating Multilevel Secure Database Queries using P-Tree Technology Imad Rahal and Dr. William Perrizo Computer Science Department North Dakota State.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Multimedia Data Mining using P-trees* William Perrizo,William Jockheck, Amal Perera, Dongmei Ren, Weihua Wu, Yi Zhang Computer Science Department North.
Data Mining Motivation: “Necessity is the Mother of Invention”
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
Yue (Jenny) Cui and William Perrizo North Dakota State University
Firmer Mathematical Foundation: HistoTrees
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
A Spatial Data and Sensor Network Application:
Association Rule Mining
The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy
Association Analysis: Basic Concepts
Presentation transcript:

Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North Dakota State University, USA May 2002 (P-tree technology is patent pending by NDSU)

Outline Concepts – Association Rule Mining – Market Basket Data – Remotely Sensed Imagery (RSI) data – Peano Count Trees (P-trees) Association rule mining on RSI data using P-trees Performance analysis Conclusion

Association Rule Mining Originally proposed for market basket data. Given – A set of items I = {i 1,i 2,…i m } (e.g., items purchasable in a market) – A set of transactions D (e.g., customers checking out = id + itemset) An association rule is X=>Y, where X, Y are disjoint itemsets – X, Y are consider as events. E.g., X is the event that a transaction contains X. X=>Y is the event: “if t contains X, then it contains Y” X is called the antecedent, Y is called the consequent. Two measures: support (% trans containing X  Y) and confidence (% of those transactions containing X which also contain Y) Given minimum thresholds, minsup and minconf, – Find the frequent itemsets which have support above minsup. – Derive all rules supported by frequent sets, with confidence above minconf.

Association rule mining on RSI data RSI data can be viewed as a relational table – Each band (column) is an attribute (for simplicity we assume all values are bytes) – Each pixel (row) is a transaction. – Each interval in each band is an item. – Row/column or longitude/latitude is the primary key ARM task on RSI data – To mine implicit relations among different bands, for example, relations among spectral bands and yield. Example Rule (NDVI): NIR[192,255] ^ RED[0,63] => Yield[128,255]

Important ARM Algorithms Apriori – stepwise algorithm DHP (Direct Hashing and Pruning) – hash itemset counts and prune transactions Partition – divide the database into small partitions such that each can be processed independently and efficiently in memory. DIC (Dynamic Itemset Counting) – overlap the counting of candidate itemsets at different points during a scan. FP-growth – uses Frequent Pattern tree (FP-tree) to optimize candidate generation. Others…

Remotely Sensed Imagery (RSI) Data Satellite image – TM (Thematic Mapper) imagery (6, 7 or 8 bands) TM is Landsat satellite imagery covering the earth every 18 days since ETM+ (Landsat-7) contains 8 bands –7 VIR bands (Blue, Green, Red, NIR, MIR, TIR, MIR2) –1 Panchromatic band (PC). Aerial photography – TIFF (3 bands: Blue, Green, Red) Ground data – Yield, Moisture, Nitrate, Temperature, Elevation, etc

Precision Agriculture Dataset: TIFF Image and related Bands (1320×1320) RGB Moisture Yield Nitrate

x y R G B Y M N x: Row y: Column R: Red G: Green B: Blue Y: Yield M: Moisture N: Nitrate As a relation

Spatial Data Formats BAND ( ) ( ) ( ) ( ) BAND ( ) ( ) ( ) ( ) BSQ format (2 files) Band 1: Band 2:

Spatial Data Formats BAND ( ) ( ) ( ) ( ) BAND ( ) ( ) ( ) ( ) BSQ format (2 files) Band 1: Band 2: BIL format (1 file)

Spatial Data Formats BAND ( ) ( ) ( ) ( ) BAND ( ) ( ) ( ) ( ) BSQ format (2 files) Band 1: Band 2: BIL format (1 file) BIP format (1 file)

Spatial Data Formats BAND ( ) ( ) ( ) ( ) BAND ( ) ( ) ( ) ( ) BSQ format (2 files) Band 1: Band 2: BIL format (1 file) BIP format (1 file) bSQ format (16 files) B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B

Peano Count Tree (P-tree) P-tree represents RSI data bit-by-bit in a recursive quadrant-by-quadrant arrangement. P-trees are a lossless compressed representation of the original data.

An example 2-D a P-tree Quadrant-based, Pure (Pure-1/Pure-0) quadrant Peano or Z-ordering Root Count bSQ file bSQ file arranged as a spatial dataset (2-D raster order)

Peano Mask Tree (PM-tree) Truth-Trees (1 if condition is true of quadrant, else 0 – E.g., Pure-1 and Pure-0 Trees – All are lossless compressed representations of the dataset

Peano or Z-ordering Pure-1/Pure-0 quadrant Root Count  Level  Fan-out  QID (Quadrant ID) ( 7, 1 ) ( 111, 001 )

P-tree Operations P-tree 55 PM-tree m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 16 __8____ _15__ 16 1 m m 1 / / | \ / | \ \ / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ //|\ //|\ //|\ P-tree-1: m ______/ / \ \______ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ P-tree-2: m ______/ / \ \______ / / \ \ 1 0 m 0 / / \ \ m //|\ 0100 AND-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ OR-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 m 1 1 / / \ \ m 0 1 m //|\ //|\ Complement 9 m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 0 __8____ _1__ 0 0 m m 0 / / | \ / | \ \ / / \ \ / / \ \ m 1 0 m 0 0 m 0 //|\ //|\ //|\ //|\ //|\ //|\

Ptree ANDing Operation PM-tree1: m ______/ / \ \______ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ PM-tree2: m ______/ / \ \______ / / \ \ 1 0 m 0 / / \ \ m //|\ 0100 Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ &  RESULT 0 0      231 Depth-first Pure-1 path code

Various P-trees Basic P-trees P i, j Value P-trees P i (v) Tuple P-trees P(v 1, v 2, …, v n ) AND COMPLEMENT AND Interval P-trees P i (v 1, v 2 ) Cube P-trees P([v 11, v 12 ], …, [v N1, v N2 ]) OR AND AND, OR, COMPLEMENT AND, OR Predicate P-trees P(p) COMPLEMENT AND, OR, COMPLEMENT

Association Rule Mining on RSI Data using P-trees Admissible Itemsets (Asets ) – Asets are itemsets of the form, Int 1  Int 2 ...  Int n = Π i=1...n Int i, where Int i is an interval of values in Band i (some of which may be the full value range). – Example: Aset {[01,01] 1, [11,11] 2 } P-ARM algorithm Pruning techniques

P-ARM algorithm Procedure P-ARM { Data_Discretization; F 1 = {frequent 1-Asets}; For (k=2; F k-1  ) do begin C k = p-gen(F k-1 ); Forall candidate Asets c  C k do c.count = AND_rootcount(c); F k = {c  C k | c.count >= minsup} end Answer =  k F k } F 1 is determined directly from P-tree root counnts and pruning techniques rather than transaction database scan. The p-gen function differs from the apriori-gen function in Apriori by using some pruning techniques. The AND_rootcount function is used to calculate Aset counts directly by ANDing the appropriate basic P- trees instead of scanning the transaction databases. The support count for Aset {B1[0,64), B2[64,127)} (or {[00, 00] 1, [01, 01] 2 }) is the root count of P 1 (00) AND P 2 (01).

Pruning Techniques Band-based pruning – An itemset with two items from the same band will have support zero. Constraint-base pruning – E.g., specify yield as the only consequent band of interest. – Note: in the performance comparisons we did not use this pruning technique (to maintain fairness, since it is hard to implement in other alogrithms) Bit-based pruning for multi-level rules – if Aset [128,255] (or [1,1] 2 ) is not frequent, then the Aset [128,191] (or [10,10] 2 ) and [192,255] (or [11,11] 2 ) cannot be frequent either. Others

P-ARM versus Apriori Scalability with support threshold 1,742,400 pixels (transactions)

P-ARM versus Apriori (cont.) Scalability with number of transactions Support threshold =10%

P-ARM versus FP-growth Scalability with support threshold %30%50%70%90% Support threshold Run time (Sec.) P-ARM FP-growth 17,424,000 pixels (transactions) 1,742,400 pixels (transactions)

P-ARM versus FP-growth (cont.) Scalability with the number of transactions Support threshold =10%

Conclusion A model for association rule mining on RSI data – P-trees facilitate fast calculation of support – P-trees facilitates significant pruning techniques Applications other than precision agriculture – Flood prediction and monitoring – Community and regional planning – Virtual archeology – Mineral exploration – Bioinformatics/Genomics – VLSI design