Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
gSpan: Graph-based substructure pattern mining
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
The Volcano/Cascades Query Optimization Framework
Fast Algorithms For Hierarchical Range Histogram Constructions
Solving Problem by Searching
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
Lecture 10 Query Optimization II Automatic Database Design.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications Ashraf Aboulnaga Alaa R. Alameldeen Jeffrey F. Naughton Computer Sciences.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Processing Presented by Aung S. Win.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
CMSC424: Database Design Instructor: Amol Deshpande
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
Informed search algorithms Chapter 4. Outline Best-first search Greedy best-first search A * search Heuristics.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Attila Barta Mariano P. Consens Alberto O. Mendelzon University of Toronto.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Gspan: Graph-based Substructure Pattern Mining
Chapter 14: Query Optimization
Optimizing Parallel Algorithms for All Pairs Similarity Search
Efficient Join Query Evaluation in a Parallel Database System
Introduction to Query Optimization
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
Data Integration with Dependent Sources
Automatic Physical Design Tuning: Workload as a Sequence
Structure and Content Scoring for XML
3. Brute Force Selection sort Brute-Force string matching
Structure and Content Scoring for XML
3. Brute Force Selection sort Brute-Force string matching
A Framework for Testing Query Transformation Rules
Relax and Adapt: Computing Top-k Matches to XPath Queries
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud, 2 Athens University of Economics and Business Athens University of Economics and Business

View selection in XML databases Materialized View Selection for XQuery Workloads 2 Problem definition Find a set of materialized views that minimizes workload evaluation costs not exceeding a space budget.

Materialized View Selection for XQuery Workloads View selection for multiple-views XQuery rewriting Rich subset of XQuery Tree patterns with multiple return nodes and value joins We provide Candidate view pruning methods View selection algorithms: Utility-Based Greedy (UDG) Reduce-Optimize Algorithm (ROA) Extensive experimental evaluation Outperforming & extending state-of-the-art works Materialized View Selection for XQuery Workloads 3 Contributions

Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation Materialized View Selection for XQuery Workloads - 4

Query and view language Materialized View Selection for XQuery Workloads 5 Anatomy of a query cont=subtree of the text element Value-join Return the ID of every book along with its text and author if the book author has a paper in the SIGMOD conference. ID ID of book book text cont author val paper author conference [=“SIGMOD”]

Candidate Views Materialized View Selection for XQuery Workloads 6 JOIN [v1.author ID >v2.book ID ] SCAN(v 1 )SCAN(v 2 ) PROJECT [text cont, author val ] Rewriting v1v1 author ID,val v2v2 book ID text ID,cont Candidate Views Example: Query book author val text cont Candidate views: views that can participate in a rewriting of a query. Property: candidate views are exactly those embeddable in a query.

Candidate Views Number of candidate views For query of m value joins and k tree patterns: Early pruning is needed Rules of thumb for pruning Drop all views that can be replaced by others Views should not store anything extraneous Challenge: remove maximum number of views Preserve low cost and/or small size rewriting possibilities. Materialized View Selection for XQuery Workloads 7

Candidate Views Materialized View Selection for XQuery Workloads 8 Pruning techniques book author val text cont v2v2 author ID, cont v2‘v2‘ author ID,val v3v3 book ID text ID,cont v1v1 book author ID QueryCandidate Views ② Do not store unnecessary data i.e. useless cont, val or //-axis Avoid expensive rewritings Save space ① Annotate all nodes with ID Maximize rewriting opportunities v1‘v1‘ book ID author ID v3‘v3‘ book ID text ID,cont

Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation Materialized View Selection for XQuery Workloads - 9

Materializing a set of views Benefit of materializing a set of views benefit (V, Q)=(cost of evaluating Q over D) – (cost of evaluating Q over V) Computation of benefit requires invoking rewriting algorithm Expensive! Space occupancy of a view set V Total size (in bytes) Materialized View Selection for XQuery Workloads 10 View set benefit

View Selection Algorithms High similarity with the classic 0-1 knapsack problem Typical element of the greedy algorithms for knapsack: utility(v,Q)=benefit({v} U V, Q)/size(v) Materialized View Selection for XQuery Workloads 11 Knapsack-inspired view selection KnapsackView Selection WeightView Size Profit Benefit (evaluation cost savings)

S=12 View Selection Algorithms Materialized View Selection for XQuery Workloads 12 Utility-Driven Greedy (UDG) Algorithm U=Utility(=benefit/size) S=Space occupancy Space Budget Candidate Views U=10 S=7 U=60 S=5 U=50 S=4 U=8 S=2 1. Enumerate candidate views 2. Compute view utilities 3. Order views by utility 4. Select the view of largest utility fitting in budget 5. Repeat 2-4 until budget exhausted

S=12 View Selection Algorithms 1. Enumerate candidates 2. Compute utilities 3. Order by utility 4. Select the view of largest utility fitting in budget 5. Repeat 2-4 until budget exhausted Materialized View Selection for XQuery Workloads 13 Utility-Driven Greedy (UDG) Algorithm U=Utility(=benefit/size) S=Space occupancy Space Budget Candidate Views U=12 S=7 U=40 S=5 U=64 S=4 U=9 S=2

S=12 View Selection Algorithms Materialized View Selection for XQuery Workloads 14 Utility-Driven Greedy (UDG) Algorithm U=Utility(=benefit/size) S=Space occupancy Space Budget Candidate Views U=13 S=7 U=10 S=5 U=64 S=4 U=4 S=2 1. Enumerate candidates 2. Compute utilities 3. Order by utility 4. Select the view of largest utility fitting in budget 5. Repeat 2-4 until budget exhausted  Greedy algorithms for knapsack not a perfect fit for our problem  Utility of a view  may change after every round  depends on other views already selected  Greedy algorithms for knapsack not a perfect fit for our problem  Utility of a view  may change after every round  depends on other views already selected

View Selection Algorithms Materialized View Selection for XQuery Workloads 15 State space search (state=candidate view set) S1 S3 S4S5 S6S7S8 S9 S10 S11 S12S13 S14 S15S16 Initial state: Best state: query workload largest benefit under space budget transform(S1)  S8

View Selection Algorithms View Break: break a view in smaller parts Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs Materialized View Selection for XQuery Workloads 16 State Transformations: Break, Join, Generalize, Adapt book text cont author val paper author conference [=“SIGMOD”]

View Selection Algorithms Join: opposite to Break, join two views into one Reduces evaluation costs Joined views can be smaller in size Materialized View Selection for XQuery Workloads 17 State Transformations: Break, Join, Generalize, Adapt book text cont author val paper author conference [=“SIGMOD”] ID val,ID

View Selection Algorithms Generalize: generalization/relaxation of a view Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs Materialized View Selection for XQuery Workloads 18 State Transformations: Break, Join, Generalize, Adapt book text cont author val paper author conference val [=“SIGMOD”] val cont

View Selection Algorithms Adapt: specialization of views by 1. Conversion of //-axis to /-axis 2. Addition of existential nodes Reduces evaluation costs “Adapted” views can be smaller in size Materialized View Selection for XQuery Workloads 19 State Transformations: Break, Join, Generalize, Adapt book text author paper author conference val [=“SIGMOD”] cont  Break, Join, Generalize, Adapt  Allow to generate all states  Guaranteed not to generate pruned views  Break, Join, Generalize, Adapt  Allow to generate all states  Guaranteed not to generate pruned views

View Selection Algorithms Huge number of states Call rewriting algorithm after every state transition Need for heuristics Proposal: heuristic three-phase algorithm ROA Materialized View Selection for XQuery Workloads 20 The Reduce-Optimize algorithm (ROA) OptimizeJump Reduce

View Selection Algorithms Materialized View Selection for XQuery Workloads 21 The Reduce-Optimize algorithm (ROA) Space Budget Time Space Occupancy Benefit Reduce Optimize Jump Reduce Optimize Reduce... SolutionBest StateRevisited StateIntermediary State

View Selection Algorithms 1. Some transitions may apply several transformations at once 2. Stop the rewriting algorithm early After k rewritings found or At a timeout 3. Consider only the lowest cost rewritings Materialized View Selection for XQuery Workloads 22 Reducing ROA search time - heuristics

Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation Materialized View Selection for XQuery Workloads - 23

Related Work Materialized View Selection for XQuery Workloads 24 AlgorithmRewriting power [Mandhani, Suciu VLDB05]1-view rewritings [Tang et. al. DASFAA09]1-view rewritings Utility-Driven GreedyMultiple view rewritings Reduce-OptimizeMultiple view rewritings

Experimental Evaluation Queries Workloads Tree patterns: Q 1 (14), Q 2 (50), Q 3 (100) Tree patterns + joins: Q 4 (50), 20% joins Query Selectivity ⅓ low, ⅓ medium, ⅓ high Database: 1GB XMark (10x100MB documents) Materialized View Selection for XQuery Workloads 25 Settings Space budget S=size(Q) Tested space budgets: S, S/2, S/4, S/6 Algorithms UDG and ROA Competitors: [Mandhani & Suciu VLDB05] [Tang et al. DASFAA09] Implementation ViP2P*, Java *

Experimental Evaluation Materialized View Selection for XQuery Workloads 26 Workload Evaluation Time of Q 1 (14 queries) Reduce-Optimize (ROA) Space/Time Greedy [Tang et al. DASFAA09] Set-Cover Greedy [Mandhani & Suciu VLDB05] Utility-Driven Greedy (UDG) Space Optimal [Tang et al. DASFAA09] Hit Ratio Evaluation time versus docs

Experimental Evaluation Materialized View Selection for XQuery Workloads 27 Evaluation Time & hit ratio for Q 3 (100 queries) Reduce-Optimize (ROA) Set-Cover Greedy [Mandhani & Suciu VLDB05] Hit Ratio Evaluation time versus docs

Experimental Evaluation Materialized View Selection for XQuery Workloads 28 ROA evaluation for Q 4 (50 queries, 20% value-joined) % of evaluation time vs. documents Hit Ratio

Conclusions Automatic selection of XQuery views for multiple-views rewritings Reduction of candidate views By orders of magnitude ROA performs better than related work Scales and manages to find good solutions relatively fast 80% of the benefits attained in ~2 minutes Maximum benefit attained within 25 minutes. Algorithms of [Tang et. al. DASFAA09] did not scale beyond 14 queries Utility Drive Greedy (UDG) did not scale beyond 50 queries Materialized View Selection for XQuery Workloads 29

Thank you - 30 Questions? ?

BACKUP Materialized View Selection for XQuery Workloads - 31

Cost of algebraic plans Algebraic Plan cost Execution cost of an operator has A CPU execution cost and An IO cost Both depend on input Evaluation cost of a plan: Calculated bottom-up Materialized View Selection for XQuery Workloads 32 Estimating the evaluation cost of a rewriting Data Statistics DataGuide of every document Enriched with information: # of instances of a path Average path val size (bytes) Average path cont size (bytes) Distinct values of each path Used to estimate Cardinality & size of a view

Cost of algebraic plans ViewSizeCardinality v1v1 500KB50 v2v2 100KB10 Materialized View Selection for XQuery Workloads 33 Cost estimation example JOIN [v1.author=v2.author] SCAN(v 1 ) SCAN(v 2 ) SELECT [conference=“SIGMOD”] PROJECT [text cont, author val ] IO=100 | CPU=10 IO=100 | CPU=10+10 IO= 500 | CPU=50 IO= | CPU=70+50*5 IO=600 | CPU= OUTPUT=50 OUTPUT=10 OUTPUT=5 OUTPUT=25 (50*5*0.1) OUTPUT=25

Experimental Evaluation Materialized View Selection for XQuery Workloads 34 ROA time to attain increasing benefits (minutes)

Experimental Evaluation Materialized View Selection for XQuery Workloads 35 Candidate views pruning CS 0 max Maximum estimated number of candidate views CS 0 min Minimum estimated number of candidate views CS 1 Pruned candidate view set CS 2 Pruned candidate view set – only linear path candidates

Candidate Views The cardinality of the set of candidate views of a tree pattern query q of |q| nodes is bounded by: Materialized View Selection for XQuery Workloads 36 Size of the set of candidate views for a tree pattern Combinations of nodes of q: ({a},{b},{c},{a,b},{a,c},{a,b,c}) Edge combinations: how to connect nodes with (/, //) e.g. /a/b, //a/b, /a//b, //a//b}. There are 12 return node variations for each node in a pattern e.g. (a ID,cont,a val,a ID,val… ) Example: q=/a/b val /c

Candidate Views Given a joined pattern q with: k tree patterns and m value-joins The candidate view set size of q is bounded by: Materialized View Selection for XQuery Workloads 37 Size of the set of candidate views for a joined pattern Value join combinations Number of views resulting from all possible cartesian products of k tree patterns

View Selection Algorithms The benefit of materializing a view set V is The difference in cost of evaluating the workload over V vs. evaluating from the documents Materialized View Selection for XQuery Workloads 38 Benefit of materializing a set of views Cost of evaluating query q given the set of materialized views V Cost of evaluating query q from the documents Frequency of query q

Materialized View Selection for XQuery Workloads 39 Tree Pattern query of |q| nodes Joined Pattern query with m value joins & k tree patterns