Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
A Graphical Environment to Query XML Data with XQuery
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
1 COS 425: Database and Information Management Systems XML and information exchange.
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
CORE 2: Information systems and Databases STORAGE & RETRIEVAL 2 : SEARCHING, SELECTING & SORTING.
Design & Analysis of Algorithms Introduction. Introduction Algorithms are the ideas behind computer programs. An algorithm is the thing which stays the.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Querying Structured Text in an XML Database By Xuemei Luo.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Database Systems Part VII: XML Querying Software School of Hunan University
TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs –
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
XML Access Control Koukis Dimitris Padeleris Pashalis.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 7: SPARQL (1.0) Aidan Hogan
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing Stefan Manegold.
Efficient Evaluation of XQuery over Streaming Data
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
Tuning Transact-SQL Queries
Querying and Transforming XML Data
Database Management System
RE-Tree: An Efficient Index Structure for Regular Expressions
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Logics for Data and Knowledge Representation
(b) Tree representation
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
G-CORE: A Core for Future Graph Query Languages
Chapter 11 Database Performance Tuning and Query Optimization
XQuery Leonidas Fegaras.
Query Optimization.
Adaptive Query Processing (Background)
Presentation transcript:

Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA

Outline XML and XQuery Order and Duplicates Document Order OrderBy Clause Binding Order Duplicates and XQuery Hybrid Collections Correct Output Order Thinking Efficiently Experimental Evaluation Final Words

Document Order Usage Provides capability to re-establish the original document information Mario Stelios Alton Example: Return authors of book with title = “ Grilling…” FOR $b IN document(t)//book WHERE $b/title = “Grilling for amateurs” RETURN $b/author

Document Order Implicit, derived from XML data model The order in which data is represented in a document is important information Requires original XML order representation within a single document Requires an order amongst documents during a single execution of a query Enforced on every XPath expression and every sequence operation e.g. Union

ORDER BY Clause Order Explicit specification with ORDER BY clause Results sorted using item’s value Example: Return all books sorted by year of publication XQuery:FOR $b IN document(t)//book ORDER BY $b/year RETURN $b SQL: SELECT book FROM t ORDER BY year

Binding Order Usage Provides mechanism to produce results in multiple document orders Example: Return books and articles with the same author, order the results by document order of FOR $b IN document(t)//book FOR $a IN document(t)//article WHERE $b/author = $a/author RETURN ($b, $a) book1 – article1 book1 – article2 book2 – article1 book2 – article2 book2 – article3 FOR $a IN document(t)//article FOR $b IN document(t)//book WHERE $b/author = $a/author RETURN ($b, $a) book, articlearticle, book Results book1 – article1 book2 – article1 book1 – article2 book2 – article2 book2 – article3

Binding Order Implicit, derived from the way the query is typed by the user Results are sorted based on the order variables are bound Uses multiple document orders

XQuery and Duplicates XQuery operates on duplicate-free sequences LET clause creates binding to sequence of matching elements FOR clause creates binding to each element of sequence of matching elements Hence, XQuery requires all duplicates to be removed at variable binding

Outline XML and XQuery Order and Duplicates Hybrid Collections Correct Output Order Thinking Efficiently Experimental Evaluation Final Words

Dilemma: Use Sequences or Sets (or Bags or …) Sets lose all ordering information Order can be important in intermediate steps Sequences are expensive to manipulate Optimization possibilities can be restricted Both sets and sequences are duplicate-free Duplicate elimination can be costly procedure that should be avoided when possible

Solution: Use Hybrid Collections A Hybrid Collection can have duplicate semantics that varies between a bag and a set and order semantics that varies between a set and a sequence Duplicate Specification Ordering Specification

Duplicate Specification (D-Spec) Given a collection of trees C T, D-Spec describes how duplicates were removed from the collection Possible Parameter Values: “empty”: Duplicates can be present “tree”: Duplicates were removed using deep-tree comparison amongst trees in C T List of Nodes u: Duplicates were removed using a comparison of the nodes referred by “u” in each tree in C T

Duplicate Specification Example

Ordering Item (O-Item) Minimum unit used when sorting a collection C T Parameters: Reference to sort by node Ascending (‘asc’) or descending (‘desc’) Empty greater (‘g’) or empty least (‘l’) for trees without a matching node Example: O-Item (B, asc, l)

Ordering Specification (O-Spec) Given a collection C T, O-Spec describes how the trees are sorted in the collection It accepts as parameter an ordered list of Ordering-Items Sorting took place in the order O-Items are specified

Ordering Specification Example “Fully-ordered” “Partially-ordered” “any order”

Outline XML and XQuery Order and Duplicates Hybrid Collections Correct Output Order Thinking Efficiently Experiments Final Words

TLC-C Correct Output Algorithm

TLC-C Basic Principles Duplicate behavior is correct with sets Document order is modeled by our node identifiers Pattern tree matches return information in document order ORDER BY clause is mapped to a list of ordering items and a sort operation Binding order is determined during parsing by tracking how the query was typed A sort operation is used at the end of each single block FLWOR statement to capture the binding order

Binding Order Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Algebraic plan (TLC) Orderlist: 2, 3, 5, 6, 4

Binding Order Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Algebraic plan with correct output order (TLC-C) Orderlist: 2, 3, 5, 6, 4

Outline XML and XQuery Order and Duplicates Hybrid Collections Correct Output Order Thinking Efficiently Enhancing an algebra with Hybrid Collections Minimizing Duplicate Elimination procedures Selections and Ordering Nested Queries and Ordering Experimental Evaluation Final Words

Operators with Ordering (example) Select S[apt, ord](C T ): produces the matches of the annotated pattern tree (apt) on the input collection C T New parameter ord is used for ordering ‘empty’, unspecified order ‘maintain’, preserve order of input C T ‘list-resort u’, destroy order of C T and resort using input list of node references u ‘list-add u’, preserve order of input C T and sort ties using input list of node references u

Algebraic Identities (example) Select S and Sort O can be merged O[ol](S[any, any](…)) ↔ S[any, ol](…) Select S and Sort O can be swaped O[ol](S[any, maintain](…)) ↔ S[any, maintain](O[ol](…))

Minimize Duplicate Eliminations Step 1: Remove redundant duplicate elimination procedures Step 2: Explore partial duplicate specifications to further minimize duplicate elimination procedures

Minimize DEs Step 1 Example FOR $o IN document(“auction.xml”)//open_auction WHERE count($o/bidder) > 5 RETURN {$o/quantity} {$o/type} From 6 DE procedures to 1

Minimize DEs Step 2 Example FOR $o IN document(“auction.xml”)//open_auction WHERE count($o/bidder) > 5 RETURN {$o/quantity} {$o/type} DE procedure is modified to DE: ID(2). Then using algebraic rewrites is eliminated completely.

Selections and Ordering For “selection” type queries, use algebraic rewrites and push the sort down to the select operator.

Selections and Ordering Example FOR $b IN document(“lib.xml”)//book FOR $a IN $b/author FOR $e IN $b/editor FOR $h IN $e/hobby FOR $i IN $a/interest RETURN $b Push Sort into Select using algebraic identities. Optimizer can plan Select operator without having the forced blocking sort at the end.

Joins and Ordering Example FOR $a IN document(t)//article FOR $b IN document(t)//book WHERE $b/author = $a/author RETURN ($b, $a) Algebraic plan with correct output order (TLC-C)

Joins and Ordering Example Push Sort into Join using algebraic identities.

Joins and Ordering Example Push Sort further down into Selects using algebraic identities.

Nested Queries and Ordering FOR $b IN document(“lib.xml”)/book LET $k := FOR $a IN document(“lib.xml”)/article WHERE $b/author = $a/author AND $a/conf = “VLDB” RETURN $a WHERE $b/year = 1999 RETURN {$b} {$k} Algebraic plan with correct output order (TLC-C)

Nested Queries and Reorder Rewrite Sort and blocking Join to Reorder operation.

Outline XML and XQuery Order and Duplicates Hybrid Collections Correct Output Order Thinking Efficiently Experimental Evaluation Final Words

Experimental Setup Timber System 128MB buffer pool Value index when necessary (not for all queries) Intel Pentium III-M 866 Mhz Windows 2000 professional IDE Hard Drive 512MB RAM XMark dataset factor 1 707MB total space (472MB data + 241MB index)

Minimizing Duplicate Eliminations x17 more selective x19 less selective q2 value join

Selections and Ordering x13 simple output x17 more selective x19 less selective

Join and Ordering q1 less selective q2 more selective x3 less selective

Nested Queries and Ordering

Ordering and Duplicate Optimizations x19 selection q2 value join X8 nested query

Outline XML and XQuery Order and Duplicates Hybrid Collections Correct Output Order Thinking Efficiently Experimental Evaluation Final Words

Related Work Relational Systems recognize smart sort placement as a problem D. Simmen, E. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In Proc.SIGMOD Conf., 1996 XML Navigational-based approach has study of ordering requirements in: J. Hidders and P. Michiels. Avoiding unnecessary ordering operations in XPath. In Proc. DBPL Conf.,2003. XML Algebraic-based approaches use sets or sequences. Aside from the performance limitations, it is unknown whether they fully address the XQuery binding order to produce correct results.

Final Words Ordering in XQuery is a complex procedure with significant performance ramifications Introduced Hybrid Collections with Ordering Specification as means to a correct and flexible solution Similar path for Duplicates Showed algebraic optimizations that take advantage of provided flexibility Demonstrated experimentally the performance increase