Cubing Heuristics (JIT lecture) Heuristics used during data cube computation.

Slides:



Advertisements
Similar presentations
An Introduction to Data Warehousing
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Implementação do DW. SAD Tagus 2004/05 H. Galhardas O problema e as soluções Grandes quantidades de dados => Métodos de acesso e processamento de interrogações.
1 Copyright Jiawei Han. Modified by Charles Ling for CS411a/538a, UWO, Data Mining and Data Warehousing v Introduction v Data warehousing and OLAP.
1 Lecture 23: Query Execution Friday, March 4, 2005.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
Implementation & Computation of DW and Data Cube.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
Data Warehouses and OLAP
Data Warehousing & OLAP. 2 What is Data Warehouse? “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
1 Computing the cube Abhinandan Das CS 632 Mar
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Instructor: Dan Hebert
August 25, 2015Data Mining: Concepts and Techniques 1 Data Warehousing What is a data warehouse? A multi-dimensional data model Data warehouse architecture.
Introduction to Data Mining and Data Warehousing Muhammad Ali Yousuf DSC – ITM Friday, 9 th May 2003.
An Introduction to Data Warehousing Yannis Kotidis AT&T Labs-Research.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
©Jiawei Han and Micheline Kamber
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Efficient Methods for Data Cube Computation and Data Generalization
Methods for Multiplying. Standard Algorithm Partial Products Draw table with dimensions of digits in each number. Ex.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline.
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
1 Data Warehousing & OLAP. 2 Data Warehousing and OLAP Technology for Data Mining What is a data warehouse? A multi-dimensional data model Data warehouse.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
ITCS 6163 Cube Computation. Two Problems Which cuboids should be materialized? –Ullman et.al. Sigmod 96 paper How to efficiently compute cube? –Agrawal.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
1 Data Mining: Concepts and Techniques — Chapter 2 —
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
Multiplication Facts. 9 6 x 4 = 24 5 x 9 = 45 9 x 6 = 54.
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
CSCE Database Systems Chapter 15: Query Execution 1.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
On Computing the Data Cube. Research Report 10026, IBM Almaden Research Center, San Jose, California, 병렬 분산 컴퓨팅 연구실 석사 1 학기 송지숙.
INFORMATION INTEGRATION Sandeep Singh Balouria CS-257 ID- 101.
Multiplication Facts. 2x2 4 8x2 16 4x2 8 3x3.
Multiplication Facts Review: x 1 = 1 10 x 8 =
1 Lecture 23: Query Execution Monday, November 26, 2001.
Data Mining and Big Data
Multiplication Facts All Facts. 0 x 1 2 x 1 10 x 5.
A data cube (1) OR a lattice of cuboids
Chapter 5. Data Cube Technology
Multiplication Facts.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 —
A B D C G5b Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Shell Cube Introducing iceberg cubes will lessen the burden of computing trivial aggregate.
Multiplication Facts.
CS 412 Intro. to Data Mining Chapter 5. Data Cube Technology
Multiplication Facts.
Chapter 4: Data Cube Computation and Data Generalization
Lecture 24: Query Execution
Multiplication Facts 3 x Table.
Presentation transcript:

Cubing Heuristics (JIT lecture) Heuristics used during data cube computation

2 Cuboids Corresponding to the Cube all product date country product,dateproduct,countrydate, country product, date, country 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D(base) cuboid

3 Efficient Data Cube Computation  Data cube can be viewed as a lattice of cuboids The bottom-most cuboid is the base cuboid The top-most cuboid (apex) contains only one cell How many cuboids in an n-dimensional cube with L levels?  Materialization of data cube Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) Selection of which cuboids to materialize  Based on size, sharing, access frequency, etc.

4 Cube Computation: ROLAP-Based Method  Efficient cube computation methods ROLAP-based cubing algorithms (Agarwal et al’96) Array-based cubing algorithm (Zhao et al’97) Bottom-up computation method (Bayer & Ramarkrishnan’99)  ROLAP-based cubing algorithms Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples Grouping is performed on some subaggregates as a “partial grouping step” Aggregates may be computed from previously computed aggregates, rather than from the base fact table

5 Cube Computation: ROLAP-Based Method (2)  Hash/sort based methods ( Agarwal et. al. VLDB’96 ) Smallest-parent: computing a cuboid from the smallest cubod previously computed cuboid. Cache-results: caching results of a cuboid from which other cuboids are computed to reduce disk I/Os Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used Share-partitions: sharing the partitioning cost cross multiple cuboids when hash-based algorithms are used