ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing Stefan Manegold.

Slides:

Advertisements

Similar presentations

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

The Volcano/Cascades Query Optimization Framework

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

XQUERY. What is XQuery? XQuery is the language for querying XML data The best way to explain XQuery is to say that XQuery is to XML what SQL is to database.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peter Boncz (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery.

Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.

1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,

Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:

1 COS 425: Database and Information Management Systems XML and information exchange.

Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.

Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.

Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.

Peter BonczCWI Scientific Meeting 28/4/2006MonetDB/XQuery MonetDB/XQuery: using relational technology to query XML documents Peter Boncz Centrum voor Wiskunde.

Access Path Selection in a Relation Database Management System (summarized in section 2)

Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.

TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.

XML Processing Moves Forward XSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005.

Comparing XSLT and XQuery Michael Kay XTech 2005.

Hexastore: Sextuple Indexing for Semantic Web Data Management

MonetDB/XQuery Technology Preview 1 Stefan Manegold CWI Amsterdam -

XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.

Access Path Selection in a Relational Database Management System Selinger et al.

Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.

Database Management 9. course. Execution of queries.

Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.

MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands.

ADT 2010 XML/XQuery Data Management MonetDB/XQuery (1/2) Beyond Chapter 10 of Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.

Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

Leonidas FegarasThe Joy of SAX1 The Joy of SAX Leonidas Fegaras University of Texas at Arlington

Streaming XPath Engine Oleg Slezberg Amruta Joshi.

CS4432: Database Systems II Query Processing- Part 2.

Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Attila Barta Mariano P. Consens Alberto O. Mendelzon University of Toronto.

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

Query Processing – Implementing Set Operations and Joins Chap. 19.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.

1 Updates ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery Stefan Manegold

MonetDB/XQuery Technology Preview 1 Stefan Manegold Centrum voor Wiskunde en Informatica Amsterdam -

CS4432: Database Systems II Query Processing- Part 1 1.

XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.

ADT 2010 Introduction to (XML, XPath &) XQuery Chapter 10 in Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold

Chiu Luk CS257 Database Systems Principles Spring 2009

CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

Practical Database Design and Tuning

Efficient Evaluation of XQuery over Streaming Data

Querying and Transforming XML Data

UNIT 11 Query Optimization

Database Management System

OrientX: an Integrated, Schema-Based Native XML Database System

(b) Tree representation

Practical Database Design and Tuning

Querying XML XPath.

Querying XML XPath.

Early Profile Pruning on XML-aware Publish-Subscribe Systems

One-Pass Algorithms for Database Operations (15.2)

Query Optimization.

Presentation transcript:

ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing Stefan Manegold

2 ADT 2010 XPath evaluation (SQL)‏ Example query: /descendant::open_auction[./bidder]/annotation SELECT DISTINCT a.pre FROM doc r, doc oa, doc b, doc a WHERE r.pre=0 AND oa.pre > r.pre AND oa.post oa.pre AND b.post oa.pre AND a.post < oa.post AND a.level = oa.level + 1 AND a.name = “annotation” AND a.kind = “elem” ORDER BY a.pre (potentially?) expensive joins due to range predicates (potentially?) expensive duplicate elimination <- descendant } child

3 ADT 2010 Staircase Join [VLDB03] document List of context nodes seek pre|post are not random numbers: =>  exploit the tree properties encoded in them

4 ADT 2010 document List of context nodes scan Staircase Join [VLDB03] pre|post are not random numbers: =>  exploit the tree properties encoded in them

5 ADT 2010 document List of context nodes skip Staircase Join [VLDB03] pre|post are not random numbers: =>  exploit the tree properties encoded in them

6 ADT 2010 document List of context nodes seek seek  scan  skip  seek  scan  skip ... Staircase Join [VLDB03] pre|post are not random numbers: =>  exploit the tree properties encoded in them

7 ADT 2010 skipping: avoid touching node ranges that cannot contain results Generate a duplicate-free result in document order pruning: reduce the context set a-priori partitioning: single sequential pass over the document document List of context nodes seek seek  scan  skip  seek  scan  skip ... Staircase Join [VLDB03]

8 ADT 2010 Example: (c1,c2)/descendant:* SELECT DISTINCT doc.pre FROM c, doc WHERE doc.pre > c.pre AND doc.post < c.post Avoid comparing large chunks of the document table. Staircase Join: Skipping

9 ADT 2010 Example: (c1,c2,c3,c4)/ancestor:* SELECT DISTINCT doc.pre FROM c, doc WHERE doc.pre < c.pre AND doc.post > c.post Eliminate: c1, c3 Keep: c2, c4 Staircase Join: Pruning

10 ADT 2010 Example: (c1,c2,c3)/ancestor:* SELECT DISTINCT doc.pre FROM c, doc WHERE doc.pre < c.pre AND doc.post > c.post Single-pass algorithm that avoids generating duplicates Staircase Join: Partitioning

11 ADT 2010 Example: (c1,c2,c3)/ancestor:* SELECT DISTINCT doc.pre FROM c, doc WHERE doc.pre < c.pre AND doc.post > c.post Single-pass algorithm that avoids generating duplicates Staircase Join: Partitioning

12 ADT 2010 Schedule So far: RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath Accelerator, Pre/Post plane)‏ XPath navigation (Staircase Join)‏

13 ADT 2010 Schedule So far RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath Accelerator, Pre/Post plane)‏ XPath navigation (Staircase Join)‏ Now: XQuery to Relational Algebra Compiler: Item- & Sequence- Representation Efficient FLWoR Evaluation (Loop-Lifting)‏ Optimization RDBMS back-end support for XML/XQuery (2/2): Updateable Document Representation

14 ADT 2010 Source Language: XQuery Core

15 ADT 2010 Target Language: Relational Algebra

16 ADT 2010 sequence = table of items add pos column for maintaining order ignore polymorphism for the moment Sequence Representation

17 ADT 2010 Item sequences, sequence order (10, “x”,, 10) → Item Problems: Polymorphic columns Redundant storage Copy overhead (especially with strings)‏ Sequence Representation

18 ADT 2010 Item sequences, sequence order (10, “x”,, 10) → Item Item Representation

19 ADT 2010 Iterations

20 ADT 2010 Iterations

21 ADT 2010 Iterations

22 ADT 2010 Iterations

23 ADT 2010 Iterations

24 ADT 2010 Loop-Lifting

25 ADT 2010 Loop-Lifting

26 ADT 2010 Loop-Lifting

27 ADT 2010 Loop-Lifting

28 ADT 2010 Deriving loop

29 ADT 2010 Nested Scopes

30 ADT 2010 Nested Scopes

31 ADT 2010 Loop-lifting

32 ADT 2010 Full Example join calcproject

33 ADT 2010 XQuery On SQL Hosts [VLDB04] XQuery Construct sequence construction if-then-else for-loops calculations list functions, e.g. fn:first()‏ element construction XPath steps Relational Mapping A union B select(A=true,B) union select(A=false,C)‏ cartesian product project(A,x=expr(Y1,..Yn)‏)‏ select(A,pos=1)‏ updates in temporary tables staircase-join

34 ADT 2010 XMark Query 8 let $auction := doc("auctions.xml")‏ return for $p in $auction/site/people/person let $a := for $t in $auction/site/ closed_auctions/closed_auction where = return $t return {count($a)}

35 ADT 2010 Peephole Optimization [Grust, XIME-P 2005]

36 ADT 2010 Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

37 ADT 2010 Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

38 ADT 2010 Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

39 ADT 2010 Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

40 ADT 2010 Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

41 ADT 2010 Plan simplification Peephole Optimization Peephole Optimization [Grust, XIME-P 2005]

42 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [pos|iter], [Wang&Cherniack, VLDB’03] C43 B23 A13 Z102 Y51 X41 itempositer [iter,pos]  Peephole Optimization [Boncz et al., SIGMOD 2006]

43 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [pos|iter], [Wang&Cherniack, VLDB’03] C43 B23 A13 Z102 Y51 X41 itempositer  C33 B23 A13 Z12 Y21 X11 itempositer [iter,pos] Peephole Optimization [Boncz et al., SIGMOD 2006]

44 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [pos|iter], [Wang&Cherniack, VLDB’03]  hash-based (streaming) DENSE_RANK C43 B23 Y51 A13 Z102 X41 itempositer  C33 B23 Y21 A13 Z12 X11 itempositer [pos|iter] Peephole Optimization [Boncz et al., SIGMOD 2006]

45 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [pos|iter], [Wang&Cherniack, VLDB’03]  hash-based (streaming) DENSE_RANK Peephole Optimization [Boncz et al., SIGMOD 2006]

46 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [pos|iter], [Wang&Cherniack, VLDB’03]  hash-based (streaming) DENSE_RANK Sort Reduction if [iter,item] order is required, and [iter] only is present use a refine-sort rather than a full sort.  pipelinable sort Peephole Optimization [Boncz et al., SIGMOD 2006]

47 ADT 2010 Sort Avoidance generated by DENSE RANK pos ORDER BY pos PARTITION BY iter by tracking secondary ordering properties [item|iter], [Wang&Cherniack, VLDB’03]  hash-based (streaming) DENSE_RANK Sort Reduction if [iter,item] order is required, and [iter] only is present use a refine-sort rather than a full sort.  pipelinable sort Join Detection detect cartesian products as multi-valued dependencies in the precense of a theta-selection  theta-join Peephole Optimization [Boncz et al., SIGMOD 2006]

48 ADT 2010 Loop-lifted XPath Steps Many algorithms have been proposed & studied for XPath evaluation: Dataguide based, Structural Join, Staircase Join, Holistic Twig Join IN: sequence of context nodes in (doc order)‏ OUT: sequence of document nodes (unique, in doc order)‏

49 ADT 2010 Loop-lifted XPath Steps In XQuery, expressions generally occur inside FLWR blocks, i.e. inside a for-loop for $x in doc()//employee $x/ancestor::department Choice: call XPath algorithm N times, accessing document and index structures N times. use a loop-lifted algorithm: IN: for each iteration, a sequence of context nodes OUT: for each iteration, a sequence of document nodes (per iteration unique, in doc order)‏

50 ADT 2010 Staircase join document List of context nodes

51 ADT 2010 Loop-lifted staircase join document List of context nodesActive stack Multiple lists of context nodes Adapt: pruning, partitioning and skipping rules to correctly deal with multiple context sets

52 ADT 2010 Loop-lifted staircase join Results on the 20 XMark queries:

53 ADT 2010 Performance Evaluation Extensive performance Evaluation on XMark data sizes 110KB, 1MB, 11MB, 110MB, 1.1GB, 11GB MonetDB/XQuery, Galax0.6, X-Hive 6.0, Berkeley DB XML 2.0, eXisT 8GB RAM Extensive XMark performance Literature Overview IPSI-XQ v1.1.1b, Dynamic Interval Encoding, Kweelt, QuiP, FluX, TurboXPath, Timber, Qizx/Open (Version 0.4/p1), Saxon (Version 8.0), BEA/XQRL, VX Crude comparison (normalized by CPU SPECint)‏

54 ADT 2010 XMark Benchmark 1MB XML 1GB XML Galax X-Hive Berkeley DB XML eXisT

55 ADT S. Manegold: “An Empirical Evaluation of XQuery Processors”, Information Systems, 33: , April More Benchmarks & Performance Results? S. Manegold: “An Empirical Evaluation of XQuery Processors”, Information Systems, 33: , April S. Manegold: “An Empirical Evaluation of XQuery Processors”, Information Systems, 33: , April