MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

MonetDB/XQuery Reloaded HOSP Nieuwjaars Borrel 2007 MonetDB/XQuery Reloaded Update Transactions SOAP Distributed XQuery (XRPC) Text Retrieval (beta) Peter.
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
XML: Extensible Markup Language
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
SilkRoute: A Framework for Publishing Relational Data in XML Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of.
XQuery John Annechino Steven Pow. Agenda What is XQuery? Uses of XQuery XQuery vs. XSLT Syntax –Built-In Functions –FLWOR –if-then-else –User-Defined.
XQUERY. What is XQuery? XQuery is the language for querying XML data The best way to explain XQuery is to say that XQuery is to XML what SQL is to database.
1 XQuery Web and Database Management System. 2 XQuery XQuery is to XML what SQL is to database tables XQuery is designed to query XML data What is XQuery?
Chapter 5 Types. 5-2 Topics in this Chapter Values vs. Variables Types vs. Representations Type Definition Operators Type Generators SQL Facilities.
Compiler Construction
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peter Boncz (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
A Graphical Environment to Query XML Data with XQuery
1 COS 425: Database and Information Management Systems XML and information exchange.
XML and The Relational Data Model
XQuery – The W3C XML Query Language Jonathan Robie, Software AG Don Chamberlin, IBM Research Daniela Florescu, INRIA.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Peter BonczCWI Scientific Meeting 28/4/2006MonetDB/XQuery MonetDB/XQuery: using relational technology to query XML documents Peter Boncz Centrum voor Wiskunde.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Information Retrieval in Practice
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Advisor: Prof. Zaniolo Hung-chih Yang Ling-Jyh Chen XML Query Language.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XML Processing Moves Forward XSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005.
Comparing XSLT and XQuery Michael Kay XTech 2005.
Session II Chapter 2 – Chapter 2 – XSLhttp://
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
Sofia, Bulgaria | 9-10 October Using XQuery to Query and Manipulate XML Data Stephen Forte CTO, Corzen Inc Microsoft Regional Director NY/NJ (USA) Stephen.
MonetDB/XQuery Technology Preview 1 Stefan Manegold CWI Amsterdam -
INTERPRETING IMPERATIVE PROGRAMMING LAGUAGES IN EXTENSIBLE STYLESHEET LANGUAGE TRANSFORMATIONS (XSLT) Authors: Ruhsan Onder Assoc.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
1 XSLT An Introduction. 2 XSLT XSLT (extensible Stylesheet Language:Transformations) is a language primarily designed for transforming the structure of.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
ADT 2010 XML/XQuery Data Management MonetDB/XQuery (1/2) Beyond Chapter 10 of Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
XML and Database.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
XML Query: xQuery Reference: Xquery By Priscilla Walmsley, Published by O’Reilly.
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
1 Updates ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery Stefan Manegold
MonetDB/XQuery Technology Preview 1 Stefan Manegold Centrum voor Wiskunde en Informatica Amsterdam -
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
SDPL 2005Notes 7: XQuery1 7 Querying XML n How to access different sources (DBs, docs) as XML? n XQuery, W3C XML Query Language –"work in progress", (last.
ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing Stefan Manegold.
ADT 2010 Introduction to (XML, XPath &) XQuery Chapter 10 in Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold
XML: Extensible Markup Language
OrientX: an Integrated, Schema-Based Native XML Database System
XQuery Leonidas Fegaras.
Presentation transcript:

MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

XML Standard, flexible syntax for data exchange –Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values –Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values Lingua franca of B2B Applications… –Increase access to products & services –Integrate disparate data sources –Automate business processes … and numerous other application domains –Bio-informatics, library science, …

XML : A First Look XML document describing catalog of books No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended.

XQuery 1.0 Functional, strongly-typed query language XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules + Strong typing

XSLT vs. XQuery XSLT 1.0: XML  XML, HTML, Text –Loosely-typed scripting language –Format XML in HTML for display in browser –Must be highly tolerant of variability/errors in data XQuery 1.0: XML  XML –Strongly-typed query language –Large-scale database access –Must guarantee safety/correctness of operations on data Over time, XSLT & XQuery may both serve needs of many application domains XQuery will become a hidden, commodity language

Navigation, Selection, Extraction Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day Publications with Jerome Simeon as author or editor $cat//*[(author|editor) = “Jerome Simeon”] XQuery from the Experts … XQuery Formal Semantics …

Transformation & Construction First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } Don Chamberlin XQuery from the Experts

Literals & Constants Strings “hello world” Booleans fn:true() fn:false() –Avoid lexical conflicts with, e.g., //false $v/flag/true Numbers 12 [integer], 10.3E2 [double], 1.0 [decimal] xs:decimal(“1.0”) xs:unsignedLong(“ ”) Dates, times, & (totally ordered) durations xs:date(" ") xs:time(“04:20:00") xdt:dayTimeDuration("P21D") xdt:yearMonthDuration("P1Y2M") User-defined atomic types mycompany:inventory-id(“XXX-123")

Functions & Operators Arithmetic & comparison operators –Numerics E2 (-, *, div, idiv, mod) 1900, =, =) –Dates/Times/Durations xs:date(“ ”) + xdt:dayTimeDuration(“P10D”) xs:date(“ ”) >= xs:date(“ ”) –Nodes //incision >) Built-in functions –Strings fn:starts-with(“WWW 2004”, “WWW”) fn:matches(“WWW 2004”, “^W*”) –Sequences fn:avg((1,2,3,4)) fn:distinct-values(//price) –All other XML Schema primitive types …

Selection & Projection Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title => No Such Thing As A Bad Day Publications with Jêróme Siméon as author or editor $cat//*[(author|editor) = “Jêróme Siméon”] => XQuery from the Experts.., XQuery 1.0 Formal Semantics … Books with “good” reviews $cat//book[fn:contains(review/text(), “2 thumbs up”)]

Sources of Input Several ways to access inputs Document function fn:doc(“ fn:doc( Expr ) Variables –Bound in for expression or in host language $cat/catalog/book

Sequences & Iteration Sequence constructor Return all books followed by all W3C specifications ($cat/catalog/book, $cat/catalog/W3Cspec) XPath Expression Return all books & W3C specifications in doc order $cat/catalog/(book|W3Cspec) For Expression –Similar to map : apply function to each item in sequence Return number of authors in each book for $b in $cat/catalog/book return fn:count($b/authors) => (3,1,2,…)

Conditional & Quantified Conditional if //show[year >= 2000] then “A-OK!” else “Error!” Existential quantification –Implicit meaning of predicate expressions //show[year >= 2000] –Explicit expression: //show[some $y in./year satisfies $y >= 2000] Universal quantification //show[every $y in year satisfies $y >= 2000]

Putting It Together For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“ Joinwww.bn.com/catalog.xml $sales := fn:doc(“ for $author in distinct-values($cat//author) Grouping let $books := >= 2000 and author = $a], S.J. $receipts := = order by $author Ordering return XML Construction { $author } { fn:count($books) } Aggregation { fn:sum($receipts) }

Recursive Processing Recursive functions support recursive data => declare function partCount($p as element(part)) as element(partCt) { { for $p2 in $p/part return partCount($p2) } }

XML Schema Languages Many variants… –DTDs, XML Schema, RELAX-N/G, XDuce … with similar goals to define –Types of literal (terminal) data –Names of elements & attribute XQuery designed to support (all of) XML Schema –Structural & name constraints over types –Regular tree expressions over elements, attributes, atomic types

TeXQuery : Full-text extensions Text search & querying of structured content Limited support in XQuery 1.0 –String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in Mr. English for himself and Mr.Coyne sponsored the bill in the Committee for Financial Services

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

XQuery Systems: 2 Approaches Tree-based –Tree is basic data structure Also on disk (if an XQuery DBMS) –Navigational Approach Galax [Simeon..], Flux [Koch..], X-Hive –Tree Algebra Approach TIMBER [Jagadish..] Relational –Data shredded in relational tables –XQuery translated into database query (e.g. SQL) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

The Pathfinder Project Challenge / Goal: –Turn RDBMSs into efficient XQuery engines People: –Maurice van Keulen University of Twente –Torsten Grust, Jens Teubner University of Konstanz –Jan Rittinger University of Konstanz & CWI Peter BonczTU Delft Pathfinder - MonetDB/XQuery

The Pathfinder Project Challenge / Goal: –Turn RDBMSs into efficient XQuery engines People: –Maurice van Keulen University of Twente –Torsten Grust, Jens Teubner University of Konstanz –Jan Rittinger University of Konstanz & CWI Task: generate code for MonetDB Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MonetDB: Applied CS Research at CWI a decade of “query-intensive” application experience image retrieval: Peter Bosch  ImageSpotter audio/video retrieval: Alex van Ballegooij  RAM XML text retrieval: de Vries / Hiemstra  TIJAH biological sequences: Arno Siebes  BRICKS XML databases: Albrecht Schmidt  XMark Grust / vKeulen  Pathfinder GIS: Wilco Quak  MAGNUM data warehousing / OLAP / data mining SPSS  DataDistilleries Univ. Massachussetts  PROXIMITY CWI research group successfully spun off DataDistilleries (now SPSS) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MIL (Query Algebra) Pathfinder — MonetDB Pathfinder MonetDB Parser Sem. Analysis Core Translation Typechecking Relational Algebra Database SQL Core to MIL Translation Parser Sem. Analysis Core Translation Typechecking Database Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Open Source MonetDB + Pathfinder on Sourceforge – Mozilla License Project Homepage – Developers website: – RoadMap 14-apr-04: initial Beta release MonetDB/SQL 30-sep-04: first official release MonetDB/SQL 30-may-05: beta release of MonetDB/XQuery (i.e. Pathfinder) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MonetDB Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MonetDB Particulars Column wise fragmentation –BAT: Binary Association Tables [oid,X] –Don’t touch what you don’t need Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Binary Association Tables (BATs) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

BAT storage as thin arrays Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MonetDB Particulars Column wise fragmentation –BAT: Binary Association Tables [oid,X] –Don’t touch what you don’t need Void (virtual-oid) columns –Contain dense sequence 0,1,2,3,4,… –Require no space –Positional access (nice for XPath skipping) pre = void Peter BonczTU Delft Pathfinder - MonetDB/XQuery

DBMS Architecture Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Monet: DBMS Microkernel Peter BonczTU Delft Pathfinder - MonetDB/XQuery

MonetDB: extensible architecture Front-end/back-end: support multiple data models support multiple end- user languages support diverse application domains Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Front-end/back-end: support multiple data models support multiple end- user languages support diverse application domains Pathfinder XQuery Frontend MonetDB: extensible architecture Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Architecture Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond MonetDB Implementation –Data structures Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

XPath on and RDBMS Node-based relational encoding of XQuery's data model Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Tree Knowledge 1: pruning

Tree Knowledge 2: Partitioning

Staircase Join Algorithm

Tree Knowledge 3: Skipping

Pre/Post  Pre/Level/Size done for better skipping and updates Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Updates Dense pre-numbers are nice for XPath – Positional skipping in Staircase join! But how to handle updates?

Updates Dense pre-numbers are nice for XPath – Positional skipping in Staircase join! But how to handle updates? Dense Not Dense

Planned Update Solution

XPath  XQuery Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Sequence Representation sequence = table of items add pos column for maintaining order ignore polymorphism for the moment (10, “x”,, 10) → PosItem 110 2“X” 3pre(a) 410 Peter BonczTU Delft Pathfinder - MonetDB/XQuery

For-loops: the iter column Peter BonczTU Delft Pathfinder - MonetDB/XQuery

For-loops: the iter column Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-lifting Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-lifting Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Full Example joincalcproject Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Mapping Rules XQuery construct  relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction  union –If-Then-[Else]  select, [union] –For loop  map with cartesian product (all combinations) –Calculations  projection expressions –List-functions (e.g. fn:first)  select(pos=1) –Element Construction  updates using descendant –Path steps  selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Xmark Query 2 Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Xmark Query 2 (common subexpr) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond MonetDB Implementation –Data structures Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Order Prevention To encode order, we use the pos column New pos columns are created using DENSE RANK (sql) primitive Needs [pos] | [iter] order More commonly [iter,pos] Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Order Prevention To encode order, we use the pos column New pos columns are created using DENSE RANK (SQL) primitive Needs [pos] | [iter] order More commonly [iter,pos] This requires a lot of sorting!  often not necessary Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Order Prevention [VLDB03 Wang&Cherniack] Order properties of relations Order propagation rules for relational operators Decoration of physical plans with order properties  eliminate sort New ideas: RefineSort: pipelined algorithm that extends sort order Order property [C1] | [C2] “for each equal value of [C2] in order of appearance, the values in [C1] are monotonically increasing” Hash-based DENSE RANK only requires [pos] | [iter]  sorts on [iter,pos] avoided Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Order Prevention [VLDB03 Wang&Cherniack] define: Order properties of relations Order propagation rules for relational operators Decoration of physical plans with order properties  eliminate sort Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Order Prevention XQuery Strategies Generate Logical Plan (SQL) RDBMS optimizer must be order-aware Generate Physical Plan (MIL) XQuery generator is order-aware (current Pathfinder/MonetDB approach) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Join Recognition (recap Mapping Rules) XQuery construct  relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction  union –If-Then-[Else]  select, [union] –For loop  map with cartesian product (all combinations) –Calculations  projection expressions –List-functions (e.g. fn:first)  select(pos=1) –Element Construction  updates using descendant –Path steps  selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft Pathfinder - MonetDB/XQuery

–For loop  map with all combinations  O(N*N) –If `simple’ condition exist on two loop variables  join –Only make a map with the matching combinations –E.g. with Hash-Table  O(N) Join Recognition for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where = return $t Peter BonczTU Delft Pathfinder - MonetDB/XQuery

–For loop  map with all combinations  O(N*N) –If `simple’ condition exist on two loop variables  join –Only make a map with the matching combinations –E.g. with Hash-Table  O(N) Performed on the XCore tree Recognize if-then expressions Open question: where to optimize best?? Join Recognition for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where = return $t Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Join Optimization for $x in $foo for $y in $bar where < return $x p1 p2 theta- join project Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Join Optimization for $x in $foo for $y in $bar where < return $x p1 /p1 /p2 theta- join project p1 /p1 /p2 theta- join Aggr(min)Aggr(max) Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-Lifted StaircaseJoin (recap rules) XQuery construct  relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction  union –If-Then-[Else]  select, [union] –For loop  map with cartesian product (all combinations) –Calculations  projection expressions –List-functions (e.g. fn:first)  select(pos=1) –Element Construction  updates using descendant –Path steps  selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-lifted staircase join Staircase join [VLDB03]: –Single-pass for a *set* of context nodes Loop-lifting  multiple iters  multiple sets of context nodes –elaborate skipping! –Loop-Lifted Staircase Join In a single pass: process multiple input context node lists –Use a stack –Exploit axis properties for pruning Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Staircase join document List of context nodes Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-lifted staircase join document List of context nodesActive stack Multiple lists of context nodes Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Loop-lifted staircase join Staircase join [VLDB03]: –Single-pass for a *set* of context nodes Loop-lifting  multiple iters  multiple sets of context nodes –elaborate skipping! –Loop-Lifted Staircase Join In a single pass: process multiple input context node lists –Use a stack –Exploit axis properties for pruning Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Scalability Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Scalability Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Some swapping in the join queries Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Scalability Peter BonczTU Delft Pathfinder - MonetDB/XQuery Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Some swapping in the join-queries Q11 + Q12 generate quadratic result

XMark 10MB : Pathfinder vs XHive & Galax Peter BonczTU Delft Pathfinder - MonetDB/XQuery

XMark 1GB: Pathfinder vs X-Hive did not finish Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Conclusions Relational approach can be scalable & fast Crucial Optimizations –Join recognition –Loop-lifted XPath steps –Order awareness Peter BonczTU Delft Pathfinder - MonetDB/XQuery

Conclusions Relational approach can be scalable & fast Crucial Optimizations –Join recognition –Loop-lifted XPath steps –Order awareness Future Roadmap (beta: May 30, Holland Open) Alegebraic Query Optimization Updates (not in release) Peter BonczTU Delft Pathfinder - MonetDB/XQuery