MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peter Boncz (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery.

Slides:



Advertisements
Similar presentations
MonetDB/XQuery Reloaded HOSP Nieuwjaars Borrel 2007 MonetDB/XQuery Reloaded Update Transactions SOAP Distributed XQuery (XRPC) Text Retrieval (beta) Peter.
Advertisements

XML: Extensible Markup Language
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
XQUERY. What is XQuery? XQuery is the language for querying XML data The best way to explain XQuery is to say that XQuery is to XML what SQL is to database.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
1 COS 425: Database and Information Management Systems XML and information exchange.
XML and The Relational Data Model
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Peter BonczCWI Scientific Meeting 28/4/2006MonetDB/XQuery MonetDB/XQuery: using relational technology to query XML documents Peter Boncz Centrum voor Wiskunde.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
XML at Work John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
Using XML in SQL Server 2005 NameTitleCompany. XML Overview Business Opportunity The majority of all data transmitted electronically between organizations.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XML Processing Moves Forward XSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
Comparing XSLT and XQuery Michael Kay XTech 2005.
IBM Research © 2005 IBM Corporation XJ: Robust XML Processing in Java™ Mukund Raghavachari, Rajesh Bordawekar, Michael Burke, and Igor Peshansky IBM T.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Sofia, Bulgaria | 9-10 October Using XQuery to Query and Manipulate XML Data Stephen Forte CTO, Corzen Inc Microsoft Regional Director NY/NJ (USA) Stephen.
MonetDB/XQuery Technology Preview 1 Stefan Manegold CWI Amsterdam -
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands.
ADT 2010 XML/XQuery Data Management MonetDB/XQuery (1/2) Beyond Chapter 10 of Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold.
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
An Introduction to XML Paul Donohue May 8th 2002 Hotel Senator Zürich.
XML and Database.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Martin Kruliš by Martin Kruliš (v1.1)1.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
1 Updates ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery Stefan Manegold
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
MonetDB/XQuery Technology Preview 1 Stefan Manegold Centrum voor Wiskunde en Informatica Amsterdam -
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2) Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging.
ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing Stefan Manegold.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Information Retrieval and Web Design
More XML XML schema, XPATH, XSLT
Presentation transcript:

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peter Boncz (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Extensions: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Extensions Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 What is XML? The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. Base specifications: XML 1.0, W3C Recommendation Feb '98 Namespaces, W3C Recommendation Jan '99

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 A Little Bit Of History Database world 1980 relational databases 1990 nested relational model and object oriented databases 2000 semi-structured databases Documents world 1974 SGML (Structured Generalized Markup Language) 1990 HTML (Hypertext Markup Language) 1992 URL (Universal Resource Locator) Data + documents = information 1996 XML (Extended Markup Language) URI (Universal Resource Identifier)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML : A First Look XML document describing catalog of books No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. Mixed content

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Semantics: a Tree ! No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. catalog No Such Thing As A Bad Day book titlereview reviewer titl e isbn ISBN Element node Text node Attribute node authorpublisherprice Hamilton Jordan Longstreet Press Inc. currency USD Text node Element node

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Food Chain Data Exchange Format XML Schema, Formatting, Querying, XML Schema, Xpath, XSLT, XQuery, … Web Services Infrastructure Business Process Frameworks BEA, IBM WebSphere, WebMethods… BPEL, BPML, ebXML, RosettaNet, CommerceXML Enabling Technology: Does NOT solve hard problems Removes excuses for ignoring them SOAP, WSDL, UDDI W3C Recommendations

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Killer XML advantages Code/schema/data independence Covers the continuous spectrum from totally structured data to documents  from data management to information management Unique model for representing data, metadata and code

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery 1.0 Functional, strongly-typed query language XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules Modularize large queries Process recursive data + Strong typing Checks values of required type (operator, function) Guarantees result value instance of output type Enforced statically or dynamically

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 FLWR (“Flower”) Expressions FOR... sequence expression LET... variable definition WHERE... condition RETURN... result expression

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 FLWR (“Flower”) Expressions FOR... sequence expression LET... variable definition WHERE... condition RETURN... result expression ORDERBY SELECT… result expressions FROM... tables WHERE... condition GROUP BY... ORDER BY... SQL

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Navigation, Selection, Extraction Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day Publications with Don Chamberlin as author or editor $cat//*[(author|editor) = “Don Chamberlin”] XQuery from the Experts … XQuery Formal Semantics …

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Transformation & Construction First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } Don Chamberlin XQuery from the Experts

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 TeXQuery : Full-text extensions Text search & querying of structured content Limited support in XQuery 1.0 String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in Mr. English for himself and Mr.Coyne sponsored the bill in the Committee for Financial Services

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML Schema Languages Many variants… DTDs, XML Schema, RELAX-N/G, XDuce … with similar goals to define Types of literal (terminal) data Names of elements & attribute “Vertical” & “horizontal” structure of documents XQuery designed to support (all of) XML Schema Structural & name constraints over types Regular tree expressions over elements, attributes, atomic types

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery Update Facility (XUF) V1.0 W3C Recommendation Internal primitives: upd:insertBefore upd:insertAfter upd:insertInto upd:insertIntoAsLast upd:insertAttributes upd:delete upd:replaceValue upd:rename Pending update list concept upd:applyUpdates

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 insert Brazil 200 XML in a nutshell Credit Card, Personal check Will ship internationally as last into fn:doc("xmark.xml")/site/regions/samerica Example

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XSLT vs. XQuery XSLT 1.0: XML  XML, HTML, Text Loosely-typed scripting language Format XML in HTML for display in browser Must be highly tolerant of variability/errors in data XQuery 1.0: XML  XML Strongly-typed query language Large-scale database access Must guarantee safety/correctness of operations on data Over time, XSLT & XQuery may both serve needs of many application domains XQuery will become a hidden, commodity language

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XSLT vs. XQuery Updates XSLT Transformations, complexity at least linear Large cost when making a small update to a huge document Effective when full document is transformed XQuery Update Facility Concurrent access, transactions (“ACID” properties) Systems optimized for small non-conflicting transactions Systems break down when update volume is large

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery Systems: 2 Approaches Native Tree is basic data structure tree-storage manager tree-query processing (algebra) tree-query optimization Relational Leverage RDBMS storage, query processing & optimization XML shredded into tables XQuery translated into SQL X-Hive Timber VIPER (IBM) BDB-XML Galax Microsoft Oracle MonetDB/ XQuery

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 MonetDB open-source Mozilla license  download at monetdb-xquery.org

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 MonetDB/XQuery open-source Mozilla license  download at monetdb-xquery.org Pathfinder Project Torsten Grust, Jens Teubner, Jan Rittinger Maurice van Keulen

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XPath Accellerator [SIGMOD02] prepost a 09 b 13 c 22 d 30 e 41 f 58 g 64 h 77 i 85 j 96 Node-based relational encoding of XQuery's data model descendant ancestorfollowing preceding

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XPath evaluation (SQL) Example query: /descendant::open_auction[./bidder]/annotation SELECT DISTINCT a.pre FROM doc r, doc oa, doc b, doc a WHERE r.pre=0 AND oa.pre > r.pre AND oa.post oa.pre AND b.post oa.pre AND a.post < oa.post AND a.level = oa.level + 1 AND a.name = “annotation” AND a.kind < “elem” ORDER BY a.pre

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifted Staircase Join [SIGMOD06] document List of context nodes seek pre|post are not random numbers:  exploit the tree properties encoded in them

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Staircase Join [VLDB03] document List of context nodes scan pre|post are not random numbers:  exploit the tree properties encoded in them

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 document List of context nodes skip pre|post are not random numbers:  exploit the tree properties encoded in them Loop-lifted Staircase Join [SIGMOD06]

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 pre|post are not random numbers:  exploit the tree properties encoded in them document List of context nodes seek seek  scan  skip  seek  scan  skip ... Loop-lifted Staircase Join [SIGMOD06]

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 skipping: avoid touching node ranges that cannot contain results Generate a duplicate-free result in document order pruning: reduce the context set a-priori partitioning: single sequential pass over the document document List of context nodes seek seek  scan  skip  seek  scan  skip ... Loop-lifted Staircase Join [SIGMOD06]

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Relational Algebra

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Sequence Representation sequence = table of items add pos column for maintaining order ignore polymorphism for the moment

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 For-loops: the iter column

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifting

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifting

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Full Example join calcproject

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XQuery On SQL Hosts [VLDB04] XQuery Construct sequence construction if-then-else for-loops calculations list functions, e.g. fn:first() element construction XPath steps Relational Mapping A union B select(cond=true,B) union select(A=false,C) cartesian product project(A,x=expr(Y1,..Yn) select(A,pos=1) updates in temporary tables staircase-join

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XMark Query 8 let $auction := doc("auctions.xml") return for $p in $auction/site/people/person let $a := for $t in $auction/site/ closed_auctions/closed_auction where = return $t return {count($a)}

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Peephole Optimization Plan simplification

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Performance Evaluation Extensive performance Evaluation on XMark data sizes 110KB, 1MB, 11MB, 110MB, 1.1GB, 11GB MonetDB/XQuery, Galax0.6, X-Hive 6.0, Berkeley DB XML 2.0, eXisT 8GB RAM Extensive XMark performance Literature Overview IPSI-XQ v1.1.1b, Dynamic Interval Encoding, Kweelt, QuiP, FluX, TurboXPath, Timber, Qizx/Open (Version 0.4/p1), Saxon (Version 8.0), BEA/XQRL, VX Crude comparison (normalized by CPU SPECint)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XMark Benchmark (sigmod 2006) 1MB XML1GB XML Galax X-Hive Berkeley DB XML eXisT

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009

Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) Standoff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009

XML Annotation Use Cases multimedia objects (MPEG-7, SMIL) which text is spoken in which video shot? hard-drive dumps (forensic data analysis) which telephone numbers occur in s from X at date Y? DNA sequences (bioinformatics) who annotated the same DNA region? text (Natural Language Processing) return phrases that have “Eiffel Tower” as subject and in the object contain “meter”

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 StandOff Joins

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 StandOff Joins

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Loop-lifted StandOff Join r1 c1 c2 r2 c3 r3 c4 r4 iter1 r1 result iter1 r4

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Evaluation

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 ]> <rdf:RDF xmlns:H3K4Me3="ENCODEChIPchipDataModel.owl#" xmlns:TFBS="TFBSdataModel.owl#" xmlns:rdf=" xmlns:rdfs=" xmlns:xsd=" chr BioInformatics RDF

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotations Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Goals of XRPC XQuery Distributed XQuery (or even P2P)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Design Goals Orthogonal All XQuery semantics, typing and validation also including XQuery Update Facility Interoperable Network-layer specification! SOAP-based Potential for efficiency Protocol exposes set-at-a-time opportunities Keep it simple A minimal extension

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt XRPC Syntax Extension execute at { Expr } { FunApp(ParamList) } XRPC: XQuery + RPC

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for each person from Vienna in doc1.xml who has bought something ( item ) from doc2.xml return... something FULL DOC FULL DOC... doc(“doc1.xml”) doc(“doc2.xml”) //item Data Shipping Strategy

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for $person in execute at {Peer1} getViennaPersons() return execute at {Peer2} Function Shipping using XRPCDistributed Semi-Join! getViennaPersons(): getPersonItems($pid): = $pid]

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Distributed XQuery with XRPC: Possibilities Peer1 doc1.xml Peer2 doc2.xml Peer3 for $person in execute at {Peer1} getViennaPersons() return execute at {Peer2} getViennaPersons() at {Peer1}... at {Peer2} at {Peer2}... Distributed Semi-Join ➠ Push loop dependent parameters Selection Pushdown Bulk RPC Selection Semi-Join Result Semi-Join

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt Remote Function Calls Functions: unit of distributed execution in XRPC Defined in an XQuery module, identified by a URL XQuery: a compositional, functional language each sub expression = function of its free variables each function can be executed with XRPC on any peer Automatic Query decomposition described in ICDE 2009 “Efficient Decomposition of Full-Fledged XQuery”, Zhang, Tang, Boncz

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics: Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 XML node values (W3C spec)

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Indexes for XML: a challenge Building an index on all nodes that have a values of a certain type is easy  Eg a B-tree containing all node-ids that have a numeric value However, how to maintain the index if the node is changed?  node change affects the value of all ancestors  root node always changes! Locking hotspot.  computing the string value of the root is... very expensive  Not well thought over by W3C committee

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009

Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009 Overview Crash course XML and XQuery + XQ Update Facility XML Databases MonetDB/XQuery Open-source XML database by me and my group at CWI Start downloading now  monetdb.cwi.nl Additional Topics Keyword Search in XML (PF/Tijah) StandOff Annotation Distributed XML (XRPC) More Internals XML Indexing and why it is hard Runtime Query Optimization

MonetDB/XQueryhttp://monetdb.cwi.nlBioWise InfoMgmt 2009

Questions? [sigmod 2009] R. Abdel Kader, P. A. Boncz, S. Manegold, M. van Keulen. ROX: Run-time Optimization of XQueries [dataX 2009] L. Sidirourgos, P. A. Boncz. Generic and Updatable XML Value Indices Covering Equality and Range Lookups [icde 2009] Y. Zhang, N. Tang, P. A. Boncz. Efficient Distribution of Full Fledged XQuery [vldb 2007] Y. Zhang, P. A. Boncz. XRPC: Interoperable and Efficient Distributed XQuery [digital forensics 2007] W. Alink, R. Bhoedjang, A. P. de Vries, P. A. Boncz. XIRAF: Ultimate Forensic Querying [sigmod 2006] P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, J. Teubner. MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine.