Cube Tree Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Data Mining Association Analysis: Basic Concepts and Algorithms
Implementation & Computation of DW and Data Cube.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Accessing Spatial Data
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Active Disks: Programming Model, Algorithm and Evaluation Anurag Acharya, Mustafa Uysal, Joel Saltz.
SEMILARITY JOIN COP6731 Advanced Database Systems.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Efficient Methods for Data Cube Computation and Data Generalization
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
14/13/15 CMPS 3130/6130 Computational Geometry Spring 2015 Windowing Carola Wenk CMPS 3130/6130 Computational Geometry.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Closed Cube Computation Data cube produces large outputs –1,015,367 tuples (39MB) –210,343,580 tuples (8GB)(200 times) Two methods to reduce.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
CS4432: Database Systems II Query Processing- Part 2.
The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영
CSCE Database Systems Chapter 15: Query Execution 1.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Data Mining Soongsil University
CPS216: Data-intensive Computing Systems
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 —
Efficient Methods for Data Cube Computation
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Shell Cube Introducing iceberg cubes will lessen the burden of computing trivial aggregate.
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Data Structures: Segment Trees, Fenwick Trees
K Nearest Neighbor Classification
Dynamic Programming.
Database Design and Programming
Continuous Density Queries for Moving Objects
Finding Frequent Itemsets by Transaction Mapping
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Cube Tree Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces. Intersection between a subspace and the orthogonal hyper-plane stores the aggregates. Origin represents aggregate with no grouping Query a group-by aggregate on the corresponding hyper-planes

Packed R-Tree Sort-pack: (for multi-dimension data) –Achieves excellent clustering –Significantly reduces the overlap and dead space A preferred structure for Datcubes storage Representation of Datacube only provide good clustering for half of the total group-bys Degradation due to strong interleaving between points of these group-bys.

Dataless & Reduced Cubetree Dataless Cubtree: Only contains aggregate values but no data values Better clustering than a full tree in a R-Tree –Projection points are not interleaved Reduced Cubetree: Each hyper-plane which containing aggregates will form a R-Tree independently The dimension of R-Tree reduced by one. Better clustering and query performance

Allocating of goupbys to R-Trees A set of group-bys are compatible if there exist a sort order that guarantees no dispersion Allocate a group-by to one of the N R-Trees –the set of group-bys for this R-Tree is compatible –if a group-by cannot find a compatible set assign it to a set that contain all of its gorup-by attributes. (false allocation) Selection of sort order for Packed R-Tree is also an import parameter for favoring some prefered group-bys

Bulk Incremental Update

Iceberg Cube Selectively compute only those partitions that satisfy an aggregate condition Aggregate with low support reveal little meaning & make the cube sparse Conditions like –Minimum support of a partition –Required Range

Bottom-Up Cube Parent to compu the child

Bottom-Up Cube (2) Starting from a bottom single dimension groupby If current inputs can be pruned return Partition the data in this group-by If a partition is greater than the minsup –recursive call on BUC with the partition as inputs Loop until all dimensions is done

Bottom-Up Cube (3) Similar idea of Apriori-gen Apriori will generate all the candidates at the same level first (breadth first) BUC is in depth first manner. –To reduce memory requirement Dimension ordering: provide better pruning –Cardinality, Skew & Correlation