Download presentation
Presentation is loading. Please wait.
Published byGodwin Godfrey Short Modified over 9 years ago
1
MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands
2
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
3
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
4
XML Standard, flexible syntax for data exchange –Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values –Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values Lingua franca of B2B Applications… –Increase access to products & services –Integrate disparate data sources –Automate business processes … and numerous other application domains –Bio-informatics, library science, …
5
XML : A First Look XML document describing catalog of books No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc. 17.60 Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended.
6
XQuery 1.0 Functional, strongly-typed query language XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules + Strong typing
7
XSLT vs. XQuery XSLT 1.0: XML XML, HTML, Text –Loosely-typed scripting language –Format XML in HTML for display in browser –Must be highly tolerant of variability/errors in data XQuery 1.0: XML XML –Strongly-typed query language –Large-scale database access –Must guarantee safety/correctness of operations on data Over time, XSLT & XQuery may both serve needs of many application domains XQuery will become a hidden, commodity language
8
Navigation, Selection, Extraction Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day Publications with Jerome Simeon as author or editor $cat//*[(author|editor) = “Jerome Simeon”] XQuery from the Experts … XQuery Formal Semantics …
9
Transformation & Construction First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } Don Chamberlin XQuery from the Experts
10
Literals & Constants Strings “hello world” Booleans fn:true() fn:false() –Avoid lexical conflicts with, e.g., //false $v/flag/true Numbers 12 [integer], 10.3E2 [double], 1.0 [decimal] xs:decimal(“1.0”) xs:unsignedLong(“22222222222222”) Dates, times, & (totally ordered) durations xs:date("2000-01-01") xs:time(“04:20:00") xdt:dayTimeDuration("P21D") xdt:yearMonthDuration("P1Y2M") User-defined atomic types mycompany:inventory-id(“XXX-123")
11
Functions & Operators Arithmetic & comparison operators –Numerics 1.0 + 10.E2 (-, *, div, idiv, mod) 1900, =, =) –Dates/Times/Durations xs:date(“2004-01-01”) + xdt:dayTimeDuration(“P10D”) xs:date(“2004-01-01”) >= xs:date(“2002-10-10”) –Nodes //incision >) Built-in functions –Strings fn:starts-with(“WWW 2004”, “WWW”) fn:matches(“WWW 2004”, “^W*”) –Sequences fn:avg((1,2,3,4)) fn:distinct-values(//price) –All other XML Schema primitive types …
12
Selection & Projection Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title => No Such Thing As A Bad Day Publications with Jêróme Siméon as author or editor $cat//*[(author|editor) = “Jêróme Siméon”] => XQuery from the Experts.., XQuery 1.0 Formal Semantics … Books with “good” reviews $cat//book[fn:contains(review/text(), “2 thumbs up”)]
13
Sources of Input Several ways to access inputs Document function fn:doc(“http://www.books.com/catalog.xml”)http://www.books.com/catalog.xml fn:doc( Expr ) Variables –Bound in for expression or in host language $cat/catalog/book
14
Sequences & Iteration Sequence constructor Return all books followed by all W3C specifications ($cat/catalog/book, $cat/catalog/W3Cspec) XPath Expression Return all books & W3C specifications in doc order $cat/catalog/(book|W3Cspec) For Expression –Similar to map : apply function to each item in sequence Return number of authors in each book for $b in $cat/catalog/book return fn:count($b/authors) => (3,1,2,…)
15
Conditional & Quantified Conditional if //show[year >= 2000] then “A-OK!” else “Error!” Existential quantification –Implicit meaning of predicate expressions //show[year >= 2000] –Explicit expression: //show[some $y in./year satisfies $y >= 2000] Universal quantification //show[every $y in year satisfies $y >= 2000]
16
Putting It Together For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml“), Joinwww.bn.com/catalog.xml $sales := fn:doc(“www.publishersweekly.com/sales.xml“)www.publishersweekly.com/sales.xml for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2000 and author = $a], S.J. $receipts := $sales/book[@isbn = $books/@isbn]/receipts order by $author Ordering return XML Construction { $author } { fn:count($books) } Aggregation { fn:sum($receipts) }
17
Recursive Processing Recursive functions support recursive data => declare function partCount($p as element(part)) as element(partCt) { { $p1/@id, for $p2 in $p/part return partCount($p2) } }
18
XML Schema Languages Many variants… –DTDs, XML Schema, RELAX-N/G, XDuce … with similar goals to define –Types of literal (terminal) data –Names of elements & attribute XQuery designed to support (all of) XML Schema –Structural & name constraints over types –Regular tree expressions over elements, attributes, atomic types
19
TeXQuery : Full-text extensions Text search & querying of structured content Limited support in XQuery 1.0 –String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in Mr. English for himself and Mr.Coyne sponsored the bill in the Committee for Financial Services
20
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
21
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
22
XQuery Systems: 2 Approaches Tree-based –Tree is basic data structure Also on disk (if an XQuery DBMS) –Navigational Approach Galax [Simeon..], Flux [Koch..], X-Hive –Tree Algebra Approach TIMBER [Jagadish..] Relational –Data shredded in relational tables –XQuery translated into database query (e.g. SQL) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
23
The Pathfinder Project Challenge / Goal: –Turn RDBMSs into efficient XQuery engines People: –Maurice van Keulen University of Twente –Torsten Grust, Jens Teubner University of Konstanz –Jan Rittinger University of Konstanz & CWI Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
24
The Pathfinder Project Challenge / Goal: –Turn RDBMSs into efficient XQuery engines People: –Maurice van Keulen University of Twente –Torsten Grust, Jens Teubner University of Konstanz –Jan Rittinger University of Konstanz & CWI Task: generate code for MonetDB Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
25
MonetDB: Applied CS Research at CWI a decade of “query-intensive” application experience image retrieval: Peter Bosch ImageSpotter audio/video retrieval: Alex van Ballegooij RAM XML text retrieval: de Vries / Hiemstra TIJAH biological sequences: Arno Siebes BRICKS XML databases: Albrecht Schmidt XMark Grust / vKeulen Pathfinder GIS: Wilco Quak MAGNUM data warehousing / OLAP / data mining SPSS DataDistilleries Univ. Massachussetts PROXIMITY CWI research group successfully spun off DataDistilleries (now SPSS) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
26
MIL (Query Algebra) Pathfinder — MonetDB Pathfinder MonetDB Parser Sem. Analysis Core Translation Typechecking Relational Algebra Database SQL Core to MIL Translation Parser Sem. Analysis Core Translation Typechecking Database Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
27
Open Source MonetDB + Pathfinder on Sourceforge – Mozilla License Project Homepage – http://monetdb.cwi.nl Developers website: – http://sf.net/projects/monetdb RoadMap 14-apr-04: initial Beta release MonetDB/SQL 30-sep-04: first official release MonetDB/SQL 30-may-05: beta release of MonetDB/XQuery (i.e. Pathfinder) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
28
MonetDB Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
29
MonetDB Particulars Column wise fragmentation –BAT: Binary Association Tables [oid,X] –Don’t touch what you don’t need Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
30
Binary Association Tables (BATs) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
31
BAT storage as thin arrays Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
32
MonetDB Particulars Column wise fragmentation –BAT: Binary Association Tables [oid,X] –Don’t touch what you don’t need Void (virtual-oid) columns –Contain dense sequence 0,1,2,3,4,… –Require no space –Positional access (nice for XPath skipping) pre = void Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
33
DBMS Architecture Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
34
Monet: DBMS Microkernel Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
35
MonetDB: extensible architecture Front-end/back-end: support multiple data models support multiple end- user languages support diverse application domains Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
36
Front-end/back-end: support multiple data models support multiple end- user languages support diverse application domains Pathfinder XQuery Frontend MonetDB: extensible architecture Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
37
Architecture Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
38
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
39
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond MonetDB Implementation –Data structures Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
40
XPath on and RDBMS Node-based relational encoding of XQuery's data model Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
41
Tree Knowledge 1: pruning
42
Tree Knowledge 2: Partitioning
43
Staircase Join Algorithm
44
Tree Knowledge 3: Skipping
45
Pre/Post Pre/Level/Size done for better skipping and updates Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
46
Updates Dense pre-numbers are nice for XPath – Positional skipping in Staircase join! But how to handle updates?
47
Updates Dense pre-numbers are nice for XPath – Positional skipping in Staircase join! But how to handle updates? Dense Not Dense
48
Planned Update Solution
51
XPath XQuery Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
52
Sequence Representation sequence = table of items add pos column for maintaining order ignore polymorphism for the moment (10, “x”,, 10) → PosItem 110 2“X” 3pre(a) 410 Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
53
For-loops: the iter column Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
54
For-loops: the iter column Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
55
Loop-lifting Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
56
Loop-lifting Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
57
Full Example joincalcproject Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
58
Mapping Rules XQuery construct relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction union –If-Then-[Else] select, [union] –For loop map with cartesian product (all combinations) –Calculations projection expressions –List-functions (e.g. fn:first) select(pos=1) –Element Construction updates using descendant –Path steps selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
59
Xmark Query 2 Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
60
Xmark Query 2 (common subexpr) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
61
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
62
Outline Basic XML / XQuery Introduction of Pathfinder and MonetDB projects Relational XQuery –XPath steps in the pre/post plane –Translating for-loops, and beyond MonetDB Implementation –Data structures Optimizations –Order prevention –Loop-Lifted Staircase join –Join recognition Outlook –Conclusions Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
63
Order Prevention To encode order, we use the pos column New pos columns are created using DENSE RANK (sql) primitive Needs [pos] | [iter] order More commonly [iter,pos] Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
64
Order Prevention To encode order, we use the pos column New pos columns are created using DENSE RANK (SQL) primitive Needs [pos] | [iter] order More commonly [iter,pos] This requires a lot of sorting! often not necessary Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
65
Order Prevention [VLDB03 Wang&Cherniack] Order properties of relations Order propagation rules for relational operators Decoration of physical plans with order properties eliminate sort New ideas: RefineSort: pipelined algorithm that extends sort order Order property [C1] | [C2] “for each equal value of [C2] in order of appearance, the values in [C1] are monotonically increasing” Hash-based DENSE RANK only requires [pos] | [iter] sorts on [iter,pos] avoided Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
66
Order Prevention [VLDB03 Wang&Cherniack] define: Order properties of relations Order propagation rules for relational operators Decoration of physical plans with order properties eliminate sort Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
67
Order Prevention XQuery Strategies Generate Logical Plan (SQL) RDBMS optimizer must be order-aware Generate Physical Plan (MIL) XQuery generator is order-aware (current Pathfinder/MonetDB approach) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
68
Join Recognition (recap Mapping Rules) XQuery construct relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction union –If-Then-[Else] select, [union] –For loop map with cartesian product (all combinations) –Calculations projection expressions –List-functions (e.g. fn:first) select(pos=1) –Element Construction updates using descendant –Path steps selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
69
–For loop map with all combinations O(N*N) –If `simple’ condition exist on two loop variables join –Only make a map with the matching combinations –E.g. with Hash-Table O(N) Join Recognition for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
70
–For loop map with all combinations O(N*N) –If `simple’ condition exist on two loop variables join –Only make a map with the matching combinations –E.g. with Hash-Table O(N) Performed on the XCore tree Recognize if-then expressions Open question: where to optimize best?? Join Recognition for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
71
Join Optimization for $x in $foo for $y in $bar where $x/p1/@a < $y/p2/@a return $x p1 p2 theta- join project Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
72
Join Optimization for $x in $foo for $y in $bar where $x/p1/@a < $y/p2/@a return $x p1 /p1 /p2 theta- join project p1 /p1 /p2 theta- join Aggr(min)Aggr(max) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
73
Loop-Lifted StaircaseJoin (recap rules) XQuery construct relational algebra See VLDB’04 / TDM’04 [Grust,Teubner] –Sequence construction union –If-Then-[Else] select, [union] –For loop map with cartesian product (all combinations) –Calculations projection expressions –List-functions (e.g. fn:first) select(pos=1) –Element Construction updates using descendant –Path steps selections on the pre/post plane Staircase join [VLDB03]: –Single-pass for a *set* of context nodes –elaborate skipping! Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
74
Loop-lifted staircase join Staircase join [VLDB03]: –Single-pass for a *set* of context nodes Loop-lifting multiple iters multiple sets of context nodes –elaborate skipping! –Loop-Lifted Staircase Join In a single pass: process multiple input context node lists –Use a stack –Exploit axis properties for pruning Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
75
Staircase join document List of context nodes Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
76
Loop-lifted staircase join document List of context nodesActive stack Multiple lists of context nodes Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
77
Loop-lifted staircase join Staircase join [VLDB03]: –Single-pass for a *set* of context nodes Loop-lifting multiple iters multiple sets of context nodes –elaborate skipping! –Loop-Lifted Staircase Join In a single pass: process multiple input context node lists –Use a stack –Exploit axis properties for pruning Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
78
Scalability Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
79
Scalability Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Some swapping in the join queries Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
80
Scalability Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery Test platform Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit Can process 11GB document! Mostly linear scaling with document size Some swapping in the join-queries Q11 + Q12 generate quadratic result
81
XMark 10MB : Pathfinder vs XHive & Galax Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
82
XMark 1GB: Pathfinder vs X-Hive did not finish Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
83
Conclusions Relational approach can be scalable & fast Crucial Optimizations –Join recognition –Loop-lifted XPath steps –Order awareness Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
84
Conclusions Relational approach can be scalable & fast Crucial Optimizations –Join recognition –Loop-lifted XPath steps –Order awareness Future Roadmap (beta: May 30, Holland Open) Alegebraic Query Optimization Updates (not in release) Peter BonczTU Delft 10-5-2005Pathfinder - MonetDB/XQuery
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.