Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002.

Slides:



Advertisements
Similar presentations
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Advertisements

Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
Discovering Queries based on Example Tuples
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fast Algorithms For Hierarchical Range Histogram Constructions
A Backtracking Correction Heuristic for Graph Coloring Algorithms Sanjukta Bhowmick and Paul Hovland Argonne National Laboratory Funded by DOE.
Technical BI Project Lifecycle
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.
Dimensional Modeling Business Intelligence Solutions.
Research Topics of Potential Interest to Geography COMPUTER SCIENCE Research Away Day, 29 th April 2010 Thomas Erlebach.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Dimensional Modeling – Part 2
Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
1 Dr. Panagiotis Symeonidis Data Engineering Laboratory Data Warehouse implementation: Part B.
0 © 2008 MoneyGram. Proprietary and Confidential. MoneyGram’s Business Intelligence Implementation for Oracle Applications TCF OAUG May 15, 2008 Pat Redding.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Computational Geometry Piyush Kumar (Lecture 5: Linear Programming) Welcome to CIS5930.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
1 Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung Implementing Data Cubes Efficiently.
OnLine Analytical Processing (OLAP)
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
CS 345: Topics in Data Warehousing Tuesday, November 2, 2004.
Presented by Missouri State University Office of Institutional Research.
 Business Intelligence Anthony DeCerbo Meaghan Duffy Steve Smith Warren Scoville.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
MIS2502: Data Analytics Dimensional Data Modeling
1 Extending Drill Through to Oracle Transaction Level Detail from Hyperion Essbase.
Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Chapter 5 Index and Clustering
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Flexible Data Cube for Range-Sum Queries in Dynamic OLAP Data Cubes Authors: C.-I Lee and Y.-C. Li Speaker: Y.-C. Li Date :Dec. 19, 2002.
Interactive Data Exploration Using Semantic Windows Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
Structure and Integration of RTF Guidelines: Savings, Lifetimes and Cost/Benefit July 17, 2012 Regional Technical Forum Presented by: Michael Baker, SBW.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Oracle OLAP Option Bud Endress Director of Product Management, OLAP.
8 Copyright © 2005, Oracle. All rights reserved. Managing Schema Objects.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
3 Copyright © 2006, Oracle. All rights reserved. Building an Analytic Workspace.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
Search: Advanced Topics Computer Science cpsc322, Lecture 9
A Backtracking Correction Heuristic
Using Partitions and Fragments
Components of A Successful Data Warehouse
Informix Red Brick Warehouse 5.1
Star Schema.
Search: Advanced Topics Computer Science cpsc322, Lecture 9
Analyzing the Business Case
View and Index Selection Problem in Data Warehousing Environments
Search: Advanced Topics Computer Science cpsc322, Lecture 9
Chapter 13 The Data Warehouse
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Research Paper Overview.
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002

2 Topics Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work

3 Example Star Schema Sell CustID DateID BindID Cost Fact Table DateID Month Quarter Year Calendar CustID Name City State/Prov Customer Bind Style BindID Desc

4 Star Schema Viewed with Data Fact Table Bind Style BindID PB HC Desc Paper Back Hard Cover DateIDMonthQuarterYear 1/1/98Jan /2/98Jan /31/00Dec42000  Customer CustIDNameCityState/Prov 00001U of MAnn ArborMI 00002Smith & Co.TorontoOnt  SellCustIDDateIDBindIDCost $ /31/00PB$500 $ /1/99HC$1100  Many Rows Calendar

5 Eight Dimensions of Book Database AttributeHierarchy Levels Trim Width4 Trim Length4 Pages4 Quantity4 Stock Width4 Stock Length4 Bind Style4 Press4

6 Combinatorial Explosion Possible views =  ℓ i, where d = |dimensions| ℓ i = |levels| in dimension i Book database example –2 dimensions, 4 2 = 16 views –4 dimensions, 4 4 = 256 views –6 dimensions, 4 6 = 4,096 views –8 dimensions, 4 8 = 65,536 views i = 1 d

7 Recap Materialized views quicken query responses Disk space limits view materialization Update window is a constraint Solution: Select strategic views

8 Our OLAP Optimization Approach Fact Table Update Users Sample Data Estimated View Size Strategic Views Current Views Incremental Data Queries Quick Responses Completed Work Current Work Initial Data Estimate Request View Size Estimation View Selection View Maintenance Query Optimization

9 View Selection: Example of Hypercube Lattice [HRU96] p = Part s = Supplier c = Customer {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

10 Example of HRU Algorithm [HRU96] 5.2M x 4 = 20.8M 0 x 4 = M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 {p, s} {c, s} {c, p} {s} {p} {c} {} Iteration 1 Benefits of Possible Materialization Choices p = Part s = Supplier c = Customer {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

11 0 x 4 = M x 2 = 1.58M 0.6M x 2 = 1.2M 5.9M x 2 = 11.8M 0.8M - 1 Iteration 2 Benefits of Possible Materialization Choices p = Part s = Supplier c = Customer Example of HRU 5.2M x 4 = 20.8M 0 x 4 = M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 {p, s} {c, s} {c, p} {s} {p} {c} {} Iteration 1 {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

12 Exponentiality in HRU O(kn 2 ) time, where k = |views to select|, n = |possible views| n = 2 d in non-hierarchical database, where d = |dimensions| HRU algorithm is O(k2 2d ) time Two sources of exponentiality –Each possible view is evaluated –Each view evaluation considers the effect of materialization on every descendent

13 Polynomial Greedy Algorithm (PGA) Nominate smallest child view NominationSelection For each candidate Select fact table [more candidates] [else] [termination condition met] [else] Evaluate benefit Select view greedily Start new path [path ended] [continuing path]

14 p = Part s = Supplier c = Customer Example of PGA [NT02] {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

15 Example of PGA {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1 p = Part s = Supplier c = Customer Nomination Candidates {p, s} {s} {}

16 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 NominationSelection {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

17 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Candidates {c, s} {s} {c} {} NominationSelectionNomination {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

18 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Candidates 0 x 2 = M x 2 = 1.58M 5.9M x 2 = 11.8M 6M - 1 {c, s} {s} {c} {} Iteration 2 NominationSelectionNominationSelection {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

19 Nomination Complexity Maximum swatch width is d. Maximum path length is d. Finding one path is O(d 2 ) time Our strategy nominates a path each time a view is selected, complexity is O(d 2 k) time

20 Evaluating Views in PGA Polynomial time evaluation requires approximating materialization benefits Account for smallest ancestor Account for materialized view with largest overlap in descendants Complexity of our algorithm is O(d 2 k 2 )

21 Complexities d = | dimensions | g = geometric mean of the number of hierarchical levels per dimension k = | views selected for materialization | ℓ = | layers in lattice | Database TypeHRUPGA Non-HierarchicalO(k2 2d ) timeO(d 2 k 2 ) time O(d 2 k) space HierarchicalO(kg 2d ) timeO(dk 2 ℓ) time O(dkℓ) space

22 Near Optimal Selection d=2, ℓ = 4 Materialization Costs (rows) Query Costs (rows)

23 Query Costs at Four Dimensions Query Costs (thousands of rows) Materialization Costs (thousands of rows) HRU PGA

24 Query Costs at Six Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows) HRU PGA

25 Query Costs at Eight Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows) HRU PGA

26 Performance at Four Dimensions Materialization Costs (thousands of rows) Processing Time (seconds) HRU PGA

27 Performance at Six Dimensions HRU PGA Materialization Costs (thousands of rows) Processing Time (minutes)

28 Performance at Eight Dimensions Materialization Costs (thousands of rows) Processing Time (minutes) HRU PGA

29 Conclusions PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity PGA extends the usefulness of OLAP systems into higher dimensionality

30 Current Work Fact Table Update Users Sample Data Estimated View Size Strategic Views Current Views Incremental Data Queries Quick Responses Completed Work Current Work Initial Data Estimate Request View Size Estimation View Selection View Maintenance Query Optimization

31 Current Work Design alternative data structures for materialized views in OLAP Test impact of new data structures on update and query costs. Integrate our work into an OLAP system

32 References [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp , Montreal, Canada. [NT01]T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada. [NT02]T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version).