MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peter Boncz (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Extensions: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Extensions Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 What is XML? The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. Base specifications: XML 1.0, W3C Recommendation Feb '98 Namespaces, W3C Recommendation Jan '99
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 A Little Bit Of History Database world 1980 relational databases 1990 nested relational model and object oriented databases 2000 semi-structured databases Documents world 1974 SGML (Structured Generalized Markup Language) 1990 HTML (Hypertext Markup Language) 1992 URL (Universal Resource Locator) Data + documents = information 1996 XML (Extended Markup Language) URI (Universal Resource Identifier)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML : A First Look XML document describing catalog of books No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. Mixed content
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Semantics: a Tree ! No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. catalog No Such Thing As A Bad Day book titlereview reviewer titl e isbn ISBN Element node Text node Attribute node authorpublisherprice Hamilton Jordan Longstreet Press Inc. currency USD Text node Element node
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Food Chain Data Exchange Format XML Schema, Formatting, Querying, XML Schema, Xpath, XSLT, XQuery, … Web Services Infrastructure Business Process Frameworks BEA, IBM WebSphere, WebMethods… BPEL, BPML, ebXML, RosettaNet, CommerceXML Enabling Technology: Does NOT solve hard problems Removes excuses for ignoring them SOAP, WSDL, UDDI W3C Recommendations
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Killer XML advantages Code/schema/data independence Covers the continuous spectrum from totally structured data to documents from data management to information management Unique model for representing data, metadata and code
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery 1.0 Functional, strongly-typed query language XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules Modularize large queries Process recursive data + Strong typing Checks values of required type (operator, function) Guarantees result value instance of output type Enforced statically or dynamically
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 FLWR (“Flower”) Expressions FOR... sequence expression LET... variable definition WHERE... condition RETURN... result expression
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 FLWR (“Flower”) Expressions FOR... sequence expression LET... variable definition WHERE... condition RETURN... result expression ORDERBY SELECT… result expressions FROM... tables WHERE... condition GROUP BY... ORDER BY... SQL
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Navigation, Selection, Extraction Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day Publications with Don Chamberlin as author or editor $cat//*[(author|editor) = “Don Chamberlin”] XQuery from the Experts … XQuery Formal Semantics …
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Transformation & Construction First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } Don Chamberlin XQuery from the Experts
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 TeXQuery : Full-text extensions Text search & querying of structured content Limited support in XQuery 1.0 String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in Mr. English for himself and Mr.Coyne sponsored the bill in the Committee for Financial Services
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Schema Languages Many variants… DTDs, XML Schema, RELAX-N/G, XDuce … with similar goals to define Types of literal (terminal) data Names of elements & attribute “Vertical” & “horizontal” structure of documents XQuery designed to support (all of) XML Schema Structural & name constraints over types Regular tree expressions over elements, attributes, atomic types
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery Update Facility (XUF) V1.0 W3C Recommendation Internal primitives: upd:insertBefore upd:insertAfter upd:insertInto upd:insertIntoAsLast upd:insertAttributes upd:delete upd:replaceValue upd:rename Pending update list concept upd:applyUpdates
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 insert Brazil 200 XML in a nutshell Credit Card, Personal check Will ship internationally as last into fn:doc("xmark.xml")/site/regions/samerica Example
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XSLT vs. XQuery XSLT 1.0: XML XML, HTML, Text Loosely-typed scripting language Format XML in HTML for display in browser Must be highly tolerant of variability/errors in data XQuery 1.0: XML XML Strongly-typed query language Large-scale database access Must guarantee safety/correctness of operations on data Over time, XSLT & XQuery may both serve needs of many application domains XQuery will become a hidden, commodity language
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XSLT vs. XQuery Updates XSLT Transformations, complexity at least linear Large cost when making a small update to a huge document Effective when full document is transformed XQuery Update Facility Concurrent access, transactions (“ACID” properties) Systems optimized for small non-conflicting transactions Systems break down when update volume is large
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery Systems: 2 Approaches Native Tree is basic data structure tree-storage manager tree-query processing (algebra) tree-query optimization Relational Leverage RDBMS storage, query processing & optimization XML shredded into tables XQuery translated into SQL X-Hive Timber VIPER (IBM) BDB-XML Galax Microsoft Oracle MonetDB/ XQuery
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 MonetDB open-source Mozilla license download at monetdb-xquery.org
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 MonetDB/XQuery open-source Mozilla license download at monetdb-xquery.org Pathfinder Project Torsten Grust, Jens Teubner, Jan Rittinger Maurice van Keulen
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XPath Accellerator [SIGMOD02] prepost a 09 b 13 c 22 d 30 e 41 f 58 g 64 h 77 i 85 j 96 Node-based relational encoding of XQuery's data model descendant ancestorfollowing preceding
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XPath evaluation (SQL) Example query: /descendant::open_auction[./bidder]/annotation SELECT DISTINCT a.pre FROM doc r, doc oa, doc b, doc a WHERE r.pre=0 AND oa.pre > r.pre AND oa.post oa.pre AND b.post oa.pre AND a.post < oa.post AND a.level = oa.level + 1 AND a.name = “annotation” AND a.kind < “elem” ORDER BY a.pre
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifted Staircase Join [SIGMOD06] document List of context nodes seek pre|post are not random numbers: exploit the tree properties encoded in them
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Staircase Join [VLDB03] document List of context nodes scan pre|post are not random numbers: exploit the tree properties encoded in them
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 document List of context nodes skip pre|post are not random numbers: exploit the tree properties encoded in them Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 pre|post are not random numbers: exploit the tree properties encoded in them document List of context nodes seek seek scan skip seek scan skip ... Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 skipping: avoid touching node ranges that cannot contain results Generate a duplicate-free result in document order pruning: reduce the context set a-priori partitioning: single sequential pass over the document document List of context nodes seek seek scan skip seek scan skip ... Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Relational Algebra
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Sequence Representation sequence = table of items add pos column for maintaining order ignore polymorphism for the moment
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 For-loops: the iter column
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifting
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifting
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Full Example join calcproject
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery On SQL Hosts [VLDB04] XQuery Construct sequence construction if-then-else for-loops calculations list functions, e.g. fn:first() element construction XPath steps Relational Mapping A union B select(cond=true,B) union select(A=false,C) cartesian product project(A,x=expr(Y1,..Yn) select(A,pos=1) updates in temporary tables staircase-join
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XMark Query 8 let $auction := doc("auctions.xml") return for $p in $auction/site/people/person let $a := for $t in $auction/site/ closed_auctions/closed_auction where = return $t return {count($a)}
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization Plan simplification
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Performance Evaluation Extensive performance Evaluation on XMark data sizes 110KB, 1MB, 11MB, 110MB, 1.1GB, 11GB MonetDB/XQuery, Galax0.6, X-Hive 6.0, Berkeley DB XML 2.0, eXisT 8GB RAM Extensive XMark performance Literature Overview IPSI-XQ v1.1.1b, Dynamic Interval Encoding, Kweelt, QuiP, FluX, TurboXPath, Timber, Qizx/Open (Version 0.4/p1), Saxon (Version 8.0), BEA/XQRL, VX Crude comparison (normalized by CPU SPECint)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XMark Benchmark (sigmod 2006) 1MB XML1GB XML Galax X-Hive Berkeley DB XML eXisT
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009
Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) Standoff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009
XML Annotation Use Cases multimedia objects (MPEG-7, SMIL) which text is spoken in which video shot? hard-drive dumps (forensic data analysis) which telephone numbers occur in s from X at date Y? DNA sequences (bioinformatics) who annotated the same DNA region? text (Natural Language Processing) return phrases that have “Eiffel Tower” as subject and in the object contain “meter”
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 StandOff Joins
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 StandOff Joins
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifted StandOff Join r1 c1 c2 r2 c3 r3 c4 r4 iter1 r1 result iter1 r4
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Evaluation
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 ]> <rdf:RDF xmlns:H3K4Me3="ENCODEChIPchipDataModel.owl#" xmlns:TFBS="TFBSdataModel.owl#" xmlns:rdf=" xmlns:rdfs=" xmlns:xsd=" chr BioInformatics RDF
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Goals of XRPC XQuery Distributed XQuery (or even P2P)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Design Goals Orthogonal All XQuery semantics, typing and validation also including XQuery Update Facility Interoperable Network-layer specification! SOAP-based Potential for efficiency Protocol exposes set-at-a-time opportunities Keep it simple A minimal extension
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt XRPC Syntax Extension execute at { Expr } { FunApp(ParamList) } XRPC: XQuery + RPC
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for each person from Vienna in doc1.xml who has bought something ( item ) from doc2.xml return... something FULL DOC FULL DOC... doc(“doc1.xml”) doc(“doc2.xml”) //item Data Shipping Strategy
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for $person in execute at {Peer1} getViennaPersons() return execute at {Peer2} Function Shipping using XRPCDistributed Semi-Join! getViennaPersons(): getPersonItems($pid): = $pid]
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for $person in execute at {Peer1} getViennaPersons() return execute at {Peer2} getViennaPersons() at {Peer1}... at {Peer2} at {Peer2}... Distributed Semi-Join ➠ Push loop dependent parameters Selection Pushdown Bulk RPC Selection Semi-Join Result Semi-Join
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Remote Function Calls Functions: unit of distributed execution in XRPC Defined in an XQuery module, identified by a URL XQuery: a compositional, functional language each sub expression = function of its free variables each function can be executed with XRPC on any peer Automatic Query decomposition described in ICDE 2009 “Efficient Decomposition of Full-Fledged XQuery”, Zhang, Tang, Boncz
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Indexes for XML: a challenge Building an index on all nodes that have a values of a certain type is easy Eg a B-tree containing all node-ids that have a numeric value However, how to maintain the index if the node is changed? node change affects the value of all ancestors root node always changes! Locking hotspot. computing the string value of the root is... very expensive Not well thought over by W3C committee
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009
Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization
MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009
Questions? [sigmod 2009] R. Abdel Kader, P. A. Boncz, S. Manegold, M. van Keulen. ROX: Run-time Optimization of XQueries [dataX 2009] L. Sidirourgos, P. A. Boncz. Generic and Updatable XML Value Indices Covering Equality and Range Lookups [icde 2009] Y. Zhang, N. Tang, P. A. Boncz. Efficient Distribution of Full Fledged XQuery [vldb 2007] Y. Zhang, P. A. Boncz. XRPC: Interoperable and Efficient Distributed XQuery [digital forensics 2007] W. Alink, R. Bhoedjang, A. P. de Vries, P. A. Boncz. XIRAF: Ultimate Forensic Querying [sigmod 2006] P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, J. Teubner. MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine.