Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
1 Multi-way Algorithm for Cube Computation CPS Notes 8.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Online Analytical Processing OLAP
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Implementation & Computation of DW and Data Cube.
Cube Tree Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces.
Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
Efficient Methods for Data Cube Computation and Data Generalization
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Data Warehousing.
Algorithm Paradigms High Level Approach To solving a Class of Problems.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
What is OLAP?.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
Attribute Allocation in Large Scale Sensor Networks Ratnabali Biswas, Kaushik Chowdhury, and Dharma P. Agrawal International Workshop on Data Management.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Dense-Region Based Compact Data Cube
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 —
Efficient Methods for Data Cube Computation
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Spatial Indexing I Point Access Methods.
Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Shell Cube Introducing iceberg cubes will lessen the burden of computing trivial aggregate.
So far we have covered … Basic visualization algorithms
Data Warehouse.
Chapter 15 QUERY EXECUTION.
Implementing Data Models & Reports with Microsoft SQL Server
Multidimensional Indexes
Spatial Indexing I R-trees
Data Warehouse.
Chapter 4: Data Cube Computation and Data Generalization
Presentation transcript:

Materialization and Cubing Algorithms

Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values of these cells are dependent on the values of other cells in the data cube. Materializing some or all of these cells is a common and powerful query optimization technique.

Materialization contd.. The size of the data warehouse and the complexity of queries can cause queries to take very long to complete. Materializing (precompute) frequently asked queries is a commonly used technique for performance improvement.

Issues in View Materialization What views should we materialize, and what indexes should we build on the precomputed results? Given a query and a set of materialized views, can we use the materialized views to answer the query? How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)

Bottom up Cubing (BUC) BUC is an algorithm for cube construction which proceeds from the apex to base cuboid(more specific). This notion is hence called the bottom up approach. BUC can use the Apriori pruning property to compute icberg cubes while applying the algorithm which will be clear in the next slide.

BUC algorithm It is a recursive algorithm which divides dimensions into partitions and facilitates iceberg pruning. It does not allow simultaneous aggregation and the best feature of BUC is the sharing of partitioning costs.

Bottom-Up Data Cube Computation example Norway … USA All All Norway84 … 114 USA99 All 297 Cell Values: Numbers of loan applications

Introduction to MOLAP cube Computing multiple related group-bys and aggregates is one of the core operations of On- Line Analytical Processing (OLAP) applications. Although is designed for MOLAP systems it can also be used for Relational OLAP (ROLAP) systems when table data is converted to an array, cubed as if in a MOLAP system, and then converted back to a table.

Array Storage There are three major issues relating to the storage of the array that must be resolved –It is likely in a multidimensional application that the array is too large to fit in memory –It is likely that many of the cells in the array are empty, because there is no data for that combination of coordinates –In many cases an array will need to be loaded from data that is not in array format (e.g., from a relational table or from an external load file)

Resolving Storage Issues A large n-dimensional array that can not fit into memory is divided into small size n-dimensional (corresponding to disk blocking size) chunks and each chunk is stored as one object on disk Sparse chunks (with data density less than 40%) use a “chunk-offset compression” where for each valid array entry a pair, (offsetInChunk, data), is stored To load data from formats other than arrays, a partition- based loading algorithm is used that takes as input the table, each dimension size and a predefined chunk size, and returns a (possibly compressed) chunked array

Basic Array Cubing Algorithm 1.Construct the minimum size spanning tree for the group- bys of the Cube 2.Compute any group-by D i1 D i2... D ik of a Cube from the “parent” D i1 D i2... D ik+1 which has the minimum size 3.Read in each chunk of D i1 D i2... D ik+1 along the dimension D ik+1 and aggregate each chunk to a chunk of D i1 D i2... D ik 4.Once the chunk of D i1 D i2... D ik is complete, we output the chunk to disk and use the memory for for the next chuck of D i1 D i2... D ik, keeping only one chunk in memory at a time

Inferences Multi-Way Array Algorithm overlaps the computation of different group-bys, while using minimal memory for each group-by. Thus, the Algorithm is valuable in both ROLAP and MOLAP systems