Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.

Slides:



Advertisements
Similar presentations
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
Advertisements

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
The Volcano/Cascades Query Optimization Framework
Maintaining Sliding Widow Skylines on Data Streams.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Lecture 10 Query Optimization II Automatic Database Design.
IPOG: A General Strategy for T-Way Software Testing
Efficient Incremental Maintenance of Data Cubes Ki Yong Lee Software Laboratories Samsung Electronics Co., Ltd. Myoung Ho Kim Division of.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Introduction Combining two frameworks
Using Contrapositive Law to Enhance Implication Graphs of Logic Circuits Kunal K Dave Master’s Thesis Electrical & Computer Engineering Rutgers University.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Applying Edge Partitioning to SPFD's 1 Applying Edge Partitioning to SPFD’s 219B Project Presentation Trevor Meyerowitz Mentor: Subarna Sinha Professor:
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.
Query Optimization Allison Griffin. Importance of Optimization Time is money Queries are faster Helps everyone who uses the server Solution to speed lies.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Jianmin Wang 1, Shaoxu Song 1, Xiaochen Zhu 1, Xuemin Lin 2 1 Tsinghua University, China 2 University of New South Wales, Australia 1/23 VLDB 2013.
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise.
2006/3/211 Multiple Aggregations over Data Stream Rui Zhang, Nick Koudas, Beng Chin Ooi Divesh Srivastava SIGMOD 2005.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
7 Strategies for Extracting, Transforming, and Loading.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
Chapter 14: Query Optimization
Lecture 6- Query Optimization (continued)
Database System Implementation CSE 507
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Chapter 13: Query Optimization
A paper on Join Synopses for Approximate Query Answering
RE-Tree: An Efficient Index Structure for Regular Expressions
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Automatic Physical Design Tuning: Workload as a Sequence
View and Index Selection Problem in Data Warehousing Environments
Sofian Maabout University of Bordeaux. CNRS
Structure and Content Scoring for XML
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Chapter 11 Database Performance Tuning and Query Optimization
Structure and Content Scoring for XML
Presentation transcript:

Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham

Materialized Views Complex results materialized in order to speed up queries that depend on these results Increasingly being supported by commercial database systems (e.g. Oracle8i) Crucial in data warehousing environments

Materialized View Maintenance As underlying data changes, the materialized views need to be refreshed Efficient view maintenance crucial! Need to provide up-to-date query responses growing Amount of data added to data warehouses increasing Maintenance time window shrinking

Focus Efficient techniques for maintenance of a set of materialized views (MVs) by Transient materialization of common subexpressions (CSEs) Selection of additional MVs Computation of the best maintenance policy and plan for each MV

Transient Materialization of Common Subexpressions CSEs materialized to reduce maintenance cost by sharing computation, disposed after use Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] –Huge search space; considered impractical Earlier work by Sellis [TODS88] Efficient heuristic algorithms proposed by Roy et al. [SIGMOD00]

Selection of Additional MVs Additional views materialized permanently to reduce the overall maintenance cost Motivated by Ross et al. [SIGMOD96] –restricted to incremental maintenance only –do not consider transient materialization MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]

Best Maintenance Policy and Plan Computation For each MV, Determine the best maintenance policy (incremental or recomputation) Find the corresponding best plan Earlier work by Vista [EDBT98] –Does not take into account transient materialization of CSEs or presence of other MVs Current systems need manual specification of the maintenance policy

Contribution A framework that consolidates the choice of CSEs to be transiently materialized Additional MVs Best maintenance plan (incremental/recomputation) Integrated with a state of the art query optimizer (Volcano [ICDE93])

Example dABCDdE BC DE ABC CDEBCDE merge merge incremental refresh recomputation recomputation permanentpermanentpermanent permanent transient initial set

Approach 1. Setting up the search space of maintenance plans 2. Best maintenance plan computation 3. Transient/Permanent materialized view selection

Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

Setting Up the Maintenance Plan Space The Query DAG representation for recomputation plans Incorporating incremental plans

Representation of the Recomputation Plan Space Equivalence Class (OR node) Operation (AND node) AND/OR Query DAG BC ABC BCD CD AB C D B Best Plan A Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00 ]

Incremental Plans: Propagation Based Differential Generation Differentials propagated one at a time For each differential dR –Start at dR and compute node differentials bottom- up along the “best plan” in a topological order –Differential of a node computed as a function of its inputs and their differentials e.g. d(E 1 E 2 ) = E 1 dE 2 U E 2 dE 1 U dE 1 dE 2 where dE i = differential of E i wrt dR –Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above Ross et al. [SIGMOD96]

Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dA BC BCdA BdA C BdA Best Plan

Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dB CdB ACdB CDdB CD AdB C D dBA Best Plan

Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dC BdC ABdC BDdC DdC AB dC D BA Best Plan

Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dD BC BCdD CdD C dD B Best Plan

Incorporating Incremental Plans Logical representation AB A BdA B AdB dB dA For each equiv node and each base differential affecting it – –Introduce a new equiv node representing its differential – –Populate with the differential plans Maintain statistics for the full expression after successive merges Large space overhead! recomputation plan incremental plan Merge operator

Incorporating Incremental Plans AB A BdA B AdB dB dA Reuse the same structure for successive propagation cycles separate best plan pointers for each cycle separate statistics for the full expression after successive merges Also incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00] Actual space-efficient representation

Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

Maintenance Plan Computation Given Set of nodes M t materialized transiently –can include full results as well as differentials Set of nodes M p materialized permanently –includes full results but not differentials compute the best consolidated maintenance plan for M p

Maintenance Plan Computation Best plan computed using a query optimizer extended as follows: Plan accessing a materialized view (trans/perm) does not include its computation, only its use Cost of a maintenance plan totalcost(M p, M t ) =  e  Mp maintcost(e | M p, M t ) +  e  Mt trmatcost(e | M p, M t ) where maintcost(M p, M t ) : cost of cheapest maintenance plan for e (recomputation/incremental) trmatcost(M p, M t ) : cost of computing and materializing e

Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

Transient/Permanent Materialized View Selection Given set of MVs M already materialized, determine Set of nodes M t to materialize transiently Set of nodes M p (  M) to materialize permanently such that totalcost(M p, M t ) is minimized Exhaustive approach too expensive. Need heuristics!

Transient/Permanent Materialized View Selection A Greedy Heuristic Input: Initial MVs M Output: M p (  M), M t, corresp. best plan Begin M p = M; M t = {} S = set of equivalence nodes in the DAG for M While ( S  {} ) Pick z  S which maximizes Benefit(z | M p, M t ) If ( Benefit(z | M p, M t )  0 ) break If ( z is a full result and maintcost(z | M p, M t ) < trmatcost(z | M p, M t ) ) M p = M p U {z} else M t = M t U {z} S = S – {z} Return (M p, M t ) End How to compute Benefit(z | M p, M t )?

Transient/Permanent Materialized View Selection Benefit Computation Benefit(z | M p, M t ) = gain(z | M p, M t ) - investment(z | M p, M t ) where gain(z | M p, M t ) =  e  Mp (maintcost(e | M p, M t ) - maintcost(e | M p, M t U {z})) +  e  Mt (trmatcost(e | M p, M t ) - trmatcost(e | M p, M t U {z})) and investment(z | M p, M t ) = min(maintcost(z | M p, M t ), trmatcost(z | M p, M t )) if z is a full result trmatcost(z | M p, M t ) if z is a differential Benefit computation expensive. Need efficient techniques!

Transient/Permanent Materialized View Selection Improving Efficiency of the Greedy Heuristic Cost-propagation based incremental techniques to efficiently compute Benefit Monotonicity assumption –Reduces the number of Benefit computations Techniques to determine if a node can be shared across a given maintenance plan –Reduces the number of nodes considered for transient materialization Adapted from Roy et al. [SIGMOD00]. See paper for details.

Benchmark Single Views –Same views as above, refreshed separately Set of Views –10 views (5 with aggregates, 5 without) on 8 distinct relations, refreshed together

Effect of Transient and Permanent Materialization Single Views Set of Views

Effect of Adaptive Maintenance Policy Selection Single Views Set of Views

Scalability Analysis Optimization Memory Requirements Optimization Time Negligible one-time costs

Conclusion Presented techniques Automate sharing of computation Automate view selection Automate maintenance policy selection and plan computation Do the above in an integrated manner –leading to benefits greater than could be achieved by considering each dimension individually Are efficient and scalable –the overall benefits greatly outweigh the one-time cost Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)

Future Work Extend presented techniques To handle limited space To speed up a workload of queries in addition to maintenance of a set of materialized views To work in dynamic query result caching environments

Questions