Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.

Slides:

Advertisements

Similar presentations

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.

Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.

Fast Algorithms For Hierarchical Range Histogram Constructions

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.

Multidimensional Indexing

Multidimensional Data. Many applications of databases are "geographic" = 2dimensional data. Others involve large numbers of dimensions. Example: data.

Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “whereamI” queries.

Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.

Spatial Information Systems (SIS) COMP Raster-based structures (1)

Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.

CS561-S2004 strategies for processing ad hoc queries 1 Strategies for Processing Ad Hoc Queries on Large Data Warehouses Presented by Fan Wu Instructor:

Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.

COMP 451/651 Multiple-key indexes

Lab3 CPIT 440 Data Mining and Warehouse.

Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates

Artificial Neural Network Applications on Remotely Sensed Imagery Kaushik Das, Qin Ding, William Perrizo North Dakota State University

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.

Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets Taufik Abidin, Amal Perera, Masum Serazi, William.

Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.

Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology.

Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.

Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,

RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,

Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.

Data Mining and Data Warehousing Many-to-Many Relationships Applications William Perrizo Dept of Computer Science North Dakota State Univ.

TEMPLATE DESIGN © Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer.

P-Tree Implementation Anne Denton. So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of.

Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.

Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.

Accelerating Multilevel Secure Database Queries using P-Tree Technology Imad Rahal and Dr. William Perrizo Computer Science Department North Dakota State.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.

Knowledge Discovery in Protected Vertical Information Dr. William Perrizo University Distinguished Professor of Computer Science North Dakota State University,

SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 

SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.

Content  Hierarchical Triangle Mesh (HTM)  Perrizo Triangle Mesh Tree (PTM-tree)  SDSS.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.

Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.

P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.

Multimedia Data Mining using P-trees* William Perrizo,William Jockheck, Amal Perera, Dongmei Ren, Weihua Wu, Yi Zhang Computer Science Department North.

Dense-Region Based Compact Data Cube

Item-Based P-Tree Collaborative Filtering applied to the Netflix Data

Data Mining Motivation: “Necessity is the Mother of Invention”

Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.

How To Build a Compressed Bitmap Index

Indexing Structures for Files and Physical Database Design

Multidimensional Access Structures

Decision Tree Induction for High-Dimensional Data Using P-Trees

Efficient Ranking of Keyword Queries Using P-trees

Efficient Ranking of Keyword Queries Using P-trees

Yue (Jenny) Cui and William Perrizo North Dakota State University

Proximal Support Vector Machine for Spatial Data Using P-trees1

William Perrizo Dept of Computer Science North Dakota State Univ.

North Dakota State University Fargo, ND USA

Yue (Jenny) Cui and William Perrizo North Dakota State University

3. Vertical Data LECTURE 2 Section 3.

Storage Structure and Efficient File Access

North Dakota State University Fargo, ND USA

William Perrizo Dept of Computer Science North Dakota State Univ.

North Dakota State University Fargo, ND USA

The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy

Integrating Query Processing and Data Mining in Relational DBMSs

MIS 451 Building Business Intelligence Systems

Presentation transcript:

Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

OUTLINE Introduction Review Of Peano Trees (P-trees) OLAP Operations Using P-trees  Peano Data Cubes (PD-Cubes)  OLAP Operations Performance Analysis Conclusion

INTRODUCTION Efficient OLAP for spatial data warehouses is in great demand Spatial warehouses is growing with more and more spatial data, such as remotely sensed images, geographical information, digital sky survey data The data in a warehouse are conceptually modeled as data cubes (Gray et al, 1997)

INTRODUCTION (Cont.) OLAP queries are complex and time consuming Two major approaches to speed up OLAP  Using index structures  Operating on compressed data. Bitmap index are space inefficient for high cardinality attributes, and are only suitable for narrow domains.

Our Approach on OLAP A new data warehousing structure, PD- cube, is developed to facilitate OLAP operations and queries Fast logical operations of P-Trees are used to accomplish OLAP operations. Predicate P-trees are used to efficiently reduce data accesses by filtering out “big holes” consisting of consecutive 0’s

REVIEW OF PEANO TREES (Ptrees) The Ptree is a quadrant-based tree structure (assuming a 2-dimensional image; more generally, for n- dimensional data, an n-polytant tree) It is used to facilitate compression and very fast logical operations on bit sequential (bSQ) data (Perrizo, 2001)  Ptrees can be 1-dimensional, 2-dimensional, 3-dimensional… The most useful form of a Ptree is the predicate- Ptree: e.g., Pure1 Ptree (P1tree) and NonPure0 Ptree (NP0-tree)

bSQ File and a Pure1 tree (P1-tree) P1-tree: Tree node=1 iff that sub-quadrant is purely 1-bits

An Count Ptree (NOTE: usually counts are the ultimate goal, but Pure1 trees are easier to work with and produce the needed counts quite quickly) Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count  Level  Fan-out  QID (Quadrant ID) ( 7, 1 ) ( 111, 001 )

BSQ File and a NP0-tree NP0-tree: Node=1 iff that sub-quadrant is not pure zero. (more general; -Ptree: node=1 iff sub-quad satisfies

Logical Operations of P-trees Operations are level by level Consecutive 0’s holes can be filtered out We only need to load quadrant with Qid 2 for ANDing NP0-tree1 and NP0-tree2.

OLAP OPERATIONS USING P-TREES 1.Peano Data Cube (PD-cube) 2.OLAP Operations 1)Slice/Dice 2)Rollup 3.Performance Analysis

Peano Data Cube (PD-cube) The data cube is partitioned by bit position  Each bit-wised data cube is in Peano order Take advantage of the continuity and sparseness of spatial data An example: a 3-D data cube representing the crop yield with three dimensions: X- coordinate, Y-coordinate, and time T.

A Fact Table and the PD-cubes XYTYield (1111) (0100) (0001) (1100) (0010) (1100) (1111) (0010) (0000) (1111) (0010)

OLAP Query Examples “Find all galaxies brighter than magnitude 22.” “Find average crop yield in a field. ” “Find area of the region with the color red.” “Find total traffic flow during a given period.”

Slice/Dice Operations Typical select statements may have a number of predicates in their “where” clause. The predicates may include “=”, “ ”.  These predicates lead to two different query scenarios: equal queries (“=”) and range queries (“ ”).

Equal Select Slice Example Suppose we have a 3-D data cube representing crop yield with dimensions X, Y and T, where X = {0, 1}, Y = {0, 1} and T = {0, 1}. XYTYield

Ptrees for 3-D Cube Example P ij is a Ptree for the j th bit of the i th attribute.

Slice: “Get yield where Y = 1” First get Ptree masks, and then trim all Ptrees accordingly

Range Slice: Get yield where Y >1001 Data set {“Y > 1001”} consists of two subsets {“Y = 11**”} and {“Y = 101*”}, where * is 1 or 0. The query clause can be written as “where Y = 11** || Y = 101*”. The query is retrieved by Ptree mask PM gt = PM1 || PM2. PM1 = P21 & P22 PM2 = P21 & P’22 & P ** 101* P21 & P22 P21 & P’22 & P23

Other Properties of Range Queries Combination of an Equality Query and a Range Query  Divide {“T  ”} into 2 subsets {“T> ”}, {“T= ”} Complement of a Range Query  Data set {“T  ”} is the complement of {“T> ”}  With the result of query “Get yield where T> , we can easily retrieve query “Get yield where T  ” by making the complement, i.e. PM le = PM’ gt.

Rollup Operations PD-cube is stored in Peano order rather than in raster order. Therefore, the rollup of PD-cube is accomplished differently from the traditional data cube as a result of different storage models. According to the Peano storage of PD- cube, we develop the recursive rollup algorithm.

Rollup of “Yield” along Dimension T S2[ ] = {1, 1, 1, 0} S1 [ ] = {0, 1, 1, 1} S0 [ ] = {2, 1, 1, 1} S [ ] = {8, 7, 7, 7} S[i] = S2[i] x S1[i] x S0[i] x

PERFORMANCE ANALYSIS: Compare our algorithm with bitmap indexed data cube method The data is prepared in five sizes, 128x128, 256x256, 512x512, 1024x1024, and 2048x2048. When cube size > 1900KB, our method outperforms bitmap indexed data cube method. As the cube size increases, there is a drastic increase in response time for bitmap indexed data cube method.

CONCLUSION A general spatial data warehousing structure, PD- cube, is presented to facilitate OLAP operations. The fast logical operations of Ptrees are used to accomplish these operations. Predicate Ptrees are used to find the “big holes” of consecutive 0’s by performing logical operations. Experiments indicate OLAP operations using Ptrees is much faster than traditional data cube methods.

FUTURE WORK One future research direction is to extend our PD-cube into parallel data warehouse systems.  It appears to be particularly promising to partition large cubes horizontally or vertically (or both) into small cubes to improve the query performance through parallelism.

Thank you.