1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
Programmable Logic PAL, PLA.
Frequent Closed Pattern Search By Row and Feature Enumeration
Fast Algorithms For Hierarchical Range Histogram Constructions
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Efficient Incremental Maintenance of Data Cubes Ki Yong Lee Software Laboratories Samsung Electronics Co., Ltd. Myoung Ho Kim Division of.
March DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Heuristic alignment algorithms and cost matrices
1 Computing the cube Abhinandan Das CS 632 Mar
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
1 Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung Implementing Data Cubes Efficiently.
Efficient Methods for Data Cube Computation and Data Generalization
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Lecture 9 Query Optimization.
ITCS 6163 Cube Computation. Two Problems Which cuboids should be materialized? –Ullman et.al. Sigmod 96 paper How to efficiently compute cube? –Agrawal.
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
1 Ch20. Dynamic Programming. 2 BIRD’S-EYE VIEW Dynamic programming The most difficult one of the five design methods Has its foundation in the principle.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Cubing Heuristics (JIT lecture) Heuristics used during data cube computation.
Chapter 13: Query Processing
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
CS 540 Database Management Systems
Percentage cube queries Optimisation Presented by: Abdallah KHELIL
CS 440 Database Management Systems
Parallel Databases.
Database Management System
Frequent Pattern Mining
Chapter 12: Query Processing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
External Joins Query Optimization 10/4/2017
Design and Analysis of Multi-Factored Experiments
Running example The 4-houses puzzle:
Fractional Factorial Design
Evaluation of Relational Operations: Other Techniques
Chapter 4: Data Cube Computation and Data Generalization
Continuous Density Queries for Moving Objects
Yan Huang - CSCI5330 Database Implementation – Query Processing
Design matrix Run A B C D E
Presentation transcript:

1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung

2 Content  Introduction  Existing Methods  Proposed Method: Partitioned-Cube  Memory-Cube  Experiment  Conclusion

3 Introduction  Datacubes queries compute aggregates over database relations at a variety of granularities.  Cube by: Product, Country, Date  Aggregation Function: Sum(Sales)

4 Sparseness  Cardinality is a small fraction of the size of the cross product of the attribute domains.  Interest in sparse relations, as effective datacube computation is important.

5 Problem  Large Domain with CUBE BY attributes  Large number of CUBE BY attributes  Existing methods are not efficient We Need Something New  Partitioned - Cube

6 Existing Methods  PIPESORT  Optimize overall cost by evaluating each path  Poor performance when the relation is sparse  Lower bound of no. of sorting is  Large I / O cost for huge cuboids

7  OVERLAP  Minimize Disk Access by overlapping cuboids  But I / O cost is at least quadratic in k, even given memory-sized partition  Classify the cuboids into “Partition” and “SortRun” state  I / O depends on the partition size and number of sorted runs

8  Array – Based Algorithms  Partitioned the data, and store fragments to memory. Data Compression may be applied  Allow direct access to the memory cells  For sparse data, array fragments may not be fit into memory. Then, a more costly data structure would be required

9 Partitioned-Cube Partition the large relations into fragments that can be fitted into the memory  It follows the recursive structure of datacubes  A sub-datacube is obtained by fixing each possible value of a CUBE BY attribute

10 Partitioned-Cube(cont.) Algorithm Partition-Cube(R, {B 1, …, B m }, A, G) R: a set of tuples {B 1, …, B n }: CUBE BY attribute A: attribute to be aggregated G: aggregate function F: finest granularity datacube tuples D: remaining tuples Step 1:if (R fits in memory) then return Memory-Cube(R, {B 1, …, B n }, A, G) Step 2:scan R, partition on B j in {B 1, …, B n } Step 3:for (i = 1 to n) (F i, D i ) = Partition-Cube(R i, {B 1, …, B n }, A, G) Step 4:let F = union of F i ’s Step 5:let (F’, D’) = Partition-Cube(F, {B 1, … B m }, A, G) Step 6:let D = union of F’, D’ and D i ’s Step 7:return (F, D) CountryYearSale s US US20015 US20008 US20026 HK20006 HK20018 HK20017 HK20027

11 Partitioned-Cube(cont.) STEP 1: Partition the large relations into fragments that can be fitted into the memory CountryYearSale s US US20015 US20008 US20026 HK20006 HK20018 HK20017 HK20027 CountryYearSale s US US20015 US20008 US20026 CountryYearSale s HK20006 HK20018 HK20017 HK20027 R R1R1 R2R2

12 Partitioned-Cube(cont.) STEP2: Compute the tuples in the corresponding sub-datacube CountryYearSales US US20015 US20008 US20026 R1R1 F1F1 D1D1 CountryYearSales US US20015 US20026 CountryYearSales USALL29

13 Partitioned-Cube(cont.) STEP3: In the same way, Compute F2 and D2 CountryYearSales HK20006 HK20018 HK20017 HK20027 R2R2 F2F2 D2D2 CountryYearSales HK20006 HK HK20027 CountryYearSales HKALL28

14 Partitioned-Cube(cont.) Step 4:F= Step 5: by recursively call this function, get F’ and D’ CountryYearSales US US20015 US20026 HK20006 HK HK20017 F CountryYearSales All All All F’ D’ CountryYearSales All 57

15 Partitioned-Cube(cont.) Step 6: Step 7: return F, D CountryYearSales US US20015 US20026 HK20006 HK HK20027 F CountryYearSales All All All CountryYearSales All 57 F’ D’ CountryYearSales USALL29 CountryYearSales HKALL28 D1D1 D2D2 D

16 Partitioned-Cube(cont.) Recursively execute STEP2 if there are more than 2 attributes CountryYearSales US US20015 US20008 US20026 R1R1 F1F1 D1D1 CountryYearSales US US20015 US20026 CountryYearSales USALL29

17 Memory-Cube  Perform complex operation over each fragment independently  Minimize the total no. of paths in searching lattice  Share the sort work  Compute the tuples in the corresponding sub-datacube  Compute the datacube tuples with the value ALL for the attributes

18 Memory-Cube  Minimize the total no. of paths in searching lattice G(1) =D  Є G(2) =CD  C  Є D G(3) =BCD  BC  B  Є BD  D CD  C G(4) = ABCD  ABC  AB  A  Є ABD  AD  D ACD  AC  C BCD  BC  B BD CD 6 = 4 C 2

19 Memory-Cube  Share Sort Work  Re-Order the sorting sequence can improve the performance  Sorting result on shorter relation can be reused in longer relation  E.g. S6 = CD, S3 = CAD After sorting S6, for S3, the entire relation does not have to be resorted, only each block of tuples that shares a C value needs to be independently sorted in the AD order.

20 Memory-Cube  Sort in-memory relation according to the attribute  Like PIPESORT, make a single scan through the data  Aggregates all small fragments on the path  Output datacube result by combining these small fragments

21 Solution Analysis  I / O cost is linear of k  CPU Cost (In-memory sorts) is exponential in k  CPU Cost should be dominated by the I / O time

22 Experiment  CPU time v.s. No. of Tuples  Exponential in no. of CUBE BY attributes

23 Experiment  CPU, I / O, CPU Usage % v.s. no. of CUBE BY attributes  CPU Usage % drops for large no. of CUBE BY attributes

24 Experiment  Share sorting work  CPU Time is dominated by I / O Time

25 Conclusion  Partitioned-Cube is a fast computation of datacubes over large sparse relation  Minimize the number of sort orders  Show the advantages of sharing sort orders in the datacube computation  First solution with LINEAR I / O Cost

26 Reference  Kenneth A. Ross, Divesh Srivastava : Kenneth A. Ross Divesh Srivastava Fast Computation of Sparse Datacubes. VLDB 1997 VLDB 1997 :

27 Q & A Section