00-03-30 병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정

Slides:



Advertisements
Similar presentations
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Advertisements

Chapter 10: Designing Databases
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
1 Multi-way Algorithm for Cube Computation CPS Notes 8.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Fast Algorithms For Hierarchical Range Histogram Constructions
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Implementation & Computation of DW and Data Cube.
ACM GIS An Interactive Framework for Raster Data Spatial Joins Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics,
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Warehousing.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
Efficient Methods for Data Cube Computation and Data Generalization
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
A Novel Approach for Approximate Aggregations Over Arrays SSDBM 2015 June 29 th, San Diego, California 1 Yi Wang, Yu Su, Gagan Agrawal The Ohio State University.
BI Terminologies.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Data resource management
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Object Oriented Database By Ashish Kaul References from Professor Lee’s presentations and the Web.
Indexes and Views Unit 7.
1 On-Line Analytic Processing Warehousing Data Cubes.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
What is OLAP?.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
OLAP Seminar1 Sanjay Goil Alok Choudhary Department of Electrical & Computer Engineering and Center for Parallel and Distributed Computing, Northwestern.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies 병렬 분산 컴퓨팅 연구실 석사 1 학기 김남희.
Dense-Region Based Compact Data Cube
Parallel Databases.
Efficient Methods for Data Cube Computation
Chapter 13 The Data Warehouse
Physical Database Design
Types of OLAP Servers.
DataMart (Data Warehouse) Tool:
Presentation transcript:

병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정

병렬분산컴퓨팅연구실 2 Contents n Introduction n Cubing Relational Tables n Storage Explosion and the Cube n Three Strategies n Comparing the Algorithms n An Object-Relational ADT n Conclusions

병렬분산컴퓨팅연구실 3 Introduction n Key demand of OLAP App. - queries be answered quickly n The Goal of research - to exploit the structure of the multi- dimensional model to provide extremely high performance for queries

병렬분산컴퓨팅연구실 4 Introduction n Group-by ( in SQL terms ) - the ability to simultaneously aggregate across many sets of dimensions n “Cube” operator ( by Gray ) - to compute aggregates over all subsets of dimensions specified in the “cube” operation n Precompution of aggregates  to speed up multidimensional data analysis

병렬분산컴퓨팅연구실 5 Introduction n “the computing of cube” Problem - to precompute some or all of the cube - what & how much to precompute  difficult n 3 Strategies - uniformly distributed assumption - simple sampling-based algorithm - probabilistic counting algorithm n Implementing a multidimensional array ADT in Paradise, object relational DBMS

병렬분산컴퓨팅연구실 6 Cubing Relational Tables n “Cube” operator - by Gray group-by 의 n 차원에 대한 generalization to formalize simultaneous aggregation to express it in SQL CUBE [DISTINCT | ALL] BY cuboid : each such group-by aggregate base cuboid : the group-by aggregate over all the attributes in - n 개의 attribute 에 대해 개의 base cuboid

병렬분산컴퓨팅연구실 7 Cubing Relational Tables n Example - CUBE Product, Year, Customer By SUM(sales)  compute the sales aggregate cuboids on all 8 subsets of the set {Product, Year, Customer} n Key challenge - to understand how the cuboids in this collection are related to each other - to exploit these relationships to minimize I/O  exploring a class of sorting-based methods Ex. experiments  always perform better

병렬분산컴퓨팅연구실 8 Storage Explosion and the Cube n Virtually all OLAP products resort to some degree of precomputation of these aggregates  precomputation 을 많이 하면 할수록 … queries 응답이 더 빨라진다. n The problem of estimating how much storage… - full precomputation problem  cube framework n Example… (), (ProductId), (StoreId), (ProductId, StoreId)  4 group-bys  (StoreId) :: select StoreId, SUM(Quantity) from sales group by StoreId;

병렬분산컴퓨팅연구실 9 Figure - (a) (1/3) X : sales

병렬분산컴퓨팅연구실 10 Figure - (b) (2/3) X : sales

병렬분산컴퓨팅연구실 11 Figure - ( c ) (3/3) X : sales

병렬분산컴퓨팅연구실 12 Table

병렬분산컴퓨팅연구실 13 Storage Explosion and the Cube n hierarchy 가 없는 cube 의 storage requirement 보다 hierarchy 가 있는 cube 의 storage requirement 가 훨씬 나쁘다. Ex. Figure - (a)  hierarchy 없는 경우 : 34 tuples  hierarchy 있는 경우 : 73 tuples  Even for a small database & a small number of dimensions  the size of the cubes for the databases are very different n Blowup range 가 일어날 수 있는 예 Ex. Figure-( b ) Vs. Figure-( c )  Table 결과

병렬분산컴퓨팅연구실 14 Three Strategies (1/3) n Uniformly distributed assumption  if r elements are chosen uniformly and at random from a set of n elements, the expected number of distinct elements obtained is. 즉, attributes 의 any subset 에 대한 group-by 의 size 를 추정할 수 있다.  cube size estimate  size of a hierarchy( i- dimension) : k : dimensions the total # of group-bys :  overestimate the size of cube, require count of distinct values…. But simple & fast

병렬분산컴퓨팅연구실 15 Three Strategies (2/3) n Simple Sampling-based Algorithm  take a random subset of the database & compute the cube on that subset  sample size 에 대한 data size 의 ratio 로 estimate  |D| : the size of database, |s| : the sample size  CUBE(s) : the size of the cube computed on the s  the size of the cube on the entire database D is approximated by :  The simple biased estimator produces surprisingly good estimates.

병렬분산컴퓨팅연구실 16 Three Strategies (3/3) n Probabilistic Counting Algorithm  by Flajolet & Martin  count the number of distinct elements in a multi-set  by estimating the number of distinct elements in a particular grouping of the data, we can estimate the number of tuples in that grouping.  a single pass through the database, using only a fixed amount of additional storage

병렬분산컴퓨팅연구실 17 Comparing the algorithms n Sampling-based Alg.  database 에 나타난 중복된 수에 dependent n Assuming the data is uniformly distributed  as the skew in the data increases, the estimate becomes inaccurate n Probabilistic counting Alg.  perform very well under various degrees of skew  more reliable, accurate and predictable estimate n for a reasonably quick and accurate estimate of the size of he cube  Probabilistic counting Alg.

병렬분산컴퓨팅연구실 18 An Object-Relational ADT n Storage Structure : (1) Relational Table : ex) (I, J, K, D) (2) to store the data in an Array : row or column-major n The advantage of a MOLAP  dense arrays 가 array 에 더 compact 하게 저장  array lookup 은 단순한 arithmetic operation n The advantage of a ROLAP  sparse data sets 이 tables 에 더 compact 하게 저장  standard SQL DB 가 가져오는 모든 것을 얻는다 ( scalability to very large data sets) Array index integer

병렬분산컴퓨팅연구실 19 Using Paradise n To implement an array ADT ( MOLAP style ) n To implement bit-map indices n To purpose “Query evaluation Algorithm” - an example of a high-performance ROLAP system n 같은 code 에 대해서 수행  factors 수를 reduce Ex. Same concurrency control & recovery system

병렬분산컴퓨팅연구실 20 Conclusions n Consider the problem of computing the “cube” over data stored in arrays rather than in tables. n Start a data set in a table  convert it to an array, “cube” the array  store the result back to tables  faster ( than to cube the table directly )  very efficient