Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Slides:



Advertisements
Similar presentations
A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital.
Advertisements

XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Maintaining Sliding Widow Skylines on Data Streams.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
Query Processing Presented by Aung S. Win.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Querying Structured Text in an XML Database By Xuemei Luo.
Lesley Charles November 23, 2009.
Michael Soffner A Variability Model for Query Optimizers Michael Soffner 1, Norbert Siegmund 1, Marko Rosenmüller 1, Janet Siegmund 1, Thomas.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Search CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Database Management System
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
A Framework for Testing Query Transformation Rules
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Presentation transcript:

Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA VLDB 2005

Schema-Based Query Optimization (SQO) Schema knowledge can be utilized to optimize queries Well studied in deductive/relational databases  Join elimination  predicate elimination,  detection of empty answer set … Equally applicable to XML for flat value filtering

SQO for XML Pattern Retrieval General XML SQO  Applicable to both static and streaming XML  E.g..: Query tree minimization [Amer-Yahia+02] Static XML Specific SQO  Focus on expediting random access of data  E.g.: Query rewrite using “extents” (indices built on element types) [Fernandez+98], … Stream specific XML SQO  Focus on expediting token-by-token sequential access of data

Stream Specific SQO Example /seller[shipTo] Without schema Buffer seller element Retrieve /shipTo Buffer seller element Retrieve /shipTo Retrieve /sameAddr … … buffer: When retrieved Skip computation

Related Work YFilter [Diao02] and XSM [Ludscher 03]  Use schema to decide whether pattern results are recursive or types of child elements  Essentially propose general XML SQO FluXQuery [Koch+04]  Use schema to minimize buffer size  Is complementary to our focus (aim to skip unnecessary computations) SIX [Gupta+03]  Use indices interleaved with XML data to reduce parsing  Could be combined with our techniques

Challenge: Constraint Useful? /seller/shipTo Retrieve /shipTo Retrieve /sameAddr When retrieved Nothing to save: /shipTo is the only pattern retrieval /seller[shipTo]/billTo Retrieve /shipTo Retrieve /sameAddr When retrieved Retrieve /billTo Nothing to save: /billTo has already been retrieved

Challenge : Benefits/Overhead? Maximal benefits: no beneficial optimization should be missed  Any failed patterns should be detected as early as possible Minimal overhead: no redundant optimization should be introduced  Whether a particular pattern fails should not be repeatedly checked

Challenge: Plan Execution Optimization at lower level than query rewrite Specific physical implementations are needed /seller[shipTo] Buffer seller element Retrieve /shipTo Retrieve /sameAddr When retrieved No query can capture this optimization

Outline SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Physical Implementation of Pattern Retrieval Note:  Important to understand physical stream engine implementation for designing effective SQO Our implementation:  Widely used automata implementation [e.g., Tukwila, YFilter]

Example Query and its Automata auctionsauction shipTo seller primary, secondary phone 3 λ 10 … for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“ ” return for $c in $a/item where $c//keyword=“auto” return $b/*/phone * … … input [2,3] [1] [0] [1] [0] stack [12#] [11] … [2,3] [1] [0] … … [11] … [2,3] [1] [0] #: buffering flag

Example Query and its Automata auctionsauction shipTo seller primary, secondary phone 3 λ 10 … * … … input [2,3] [1] [0] [1] [0] stack [12#] [11] … [2,3] [1] [0] … … [11] … [2,3] [1] [0] #: buffering flag Opt. opportunities: 1.avoid transitions as much as possible 2.revoke buffering flag as soon as possible

Is Constraint Useful for Opt.? Constraints used to find “ending marks” of a pattern within a context element is ending mark of /shipTo within seller element context

Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier:

Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Ending mark for $a/seller is not helpful for $a in /auctions/auction, $b in $a/seller … + Ending mark for $a/seller is helpful

Is Constraint Useful for Opt.? Ending mark for $a/seller is not helpful for $a in /auctions/auction, $b in $a/seller … + Ending mark for $a/seller is helpful Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required

Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required for $c in $a/item return $a/category <!element item (category?, desc, …)> + Ending mark for $a/category is not helpful for $c in $a/item[category] return $a/category Ending mark for $a/category is helpful

Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required and  The early filtering can be beneficial: Transitions may happen after ending marks Buffering flags may be raised before ending marks

SQO Design Helpful ending marks identified by our SQO Three SQO rules designed using  Occurrence constraints  Exclusive constraints  Order constraints

Example SQO Rule Use occurrence constraint Event-condition-action output by rule for $a in /auctions/auction, $b in $a/seller Where $b/*/phone = “ ” … + Event: second is encountered in a seller Condition: $b/*/phone = “ ” not satisfied yet Action: skip rest computations within current seller element

Outline SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Properties of SQO Application Maximal benefits Minimal overhead

Maximal Benefit  Definition of “rule independence”  Proof of “maximal benefits” given If rules are all independent, as long as each rule is applied on each pattern, maximal benefits are ensured

Minimal Overhead: Redundancy Same pattern redundancy : Multiple ending marks adopted for same pattern for $a in /auctions/auction, $b in $a/seller[shipTo] … Query Schema Constraints Ending mark for $b/shipTo guarantees to capture failure of /shipTo Ending mark for $b/shipTo Redundant

Minimal Overhead: Redundancy? Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern for $a in /auctions/auction, $b in $a/seller[shipTo] … optional QueryConstraints for $b/shipTo for $a/seller required Can be used to capture failure of $a/seller[shipTo] Redundant

SQO Application Algorithm Input:  XQuery represented as a tree  XML Schema represented as a graph Processing:  Query tree traversed top-down “maximal benefits” ensured  Tree node applied by local/regional appliers Same pattern redundancy excluded by local applier Parent-child pattern redundancy excluded by regional applier Output:  Event-condition-actions attached to tree nodes

Outline SQO Technique Design Guideline SQO Application Execution of Optimized Plan Experimentations

Encoding ECAs in Automata E: push-in or pop-out of state C: pattern result buffer checked A: actions include:  Suspend computations by removing automata transitions  Clean up result generated within current context element  Prepare for recovering computation for next context element (e.g., backup transitions)

Example: ECAs in Automata auctions auction shipTo item seller sameAddr (1, startTag, none,state 2) … Event: 1 st encountered Condition: none Action: cut all transitions from 1.q2 2.States reachable via : q3 3.States between q2 and q13: q9 … primary, secondary 1112 phone (…, state 3) <sameAddr> </sameAddr> <item> </item> <primary> </primary> … for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“ ” return for $c in $a/item …

Outline SQO technique design guideline SQO application Execution of optimized plan Experimentations

Optimization Effected by ? How often pattern fails (pattern selectivity) How much gain each early filtering brings (unit gain)

Necessity of Design Guideline Selectivity of Pattern with the Only Useful Ending Mark Plan without SQO Plan with SQO (1 ending mark) Plan with SQO but no guideline considered (30 ending marks)

Conclusion First SQL on streaming XML Support SQO on nested XQuery with “*” or “//” Offer criteria of “useful” constraints Ensure maximal benefits and minimal overhead in SQO application Provide execution strategy in widely-used automata- based model Implement SQO optimizer in Raindrop system (VLDB’04 demo) Experimentally demonstrate SQO brings significant improvement with little overhead

Visit our XQuery engine over XML stream project (RAINDROP) website Supported by USA National Science Foundation and IBM PhD Fellowship