CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Intermediate Code Generation
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 COS 425: Database and Information Management Systems XML and information exchange.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
Run time vs. Compile time
XML To Relational Model. Key Index – Forward Traversal Backward Traversal.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Database Systems More SQL Database Design -- More SQL1.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Database Management 9. course. Execution of queries.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
XML and Database.
Leonidas FegarasThe Joy of SAX1 The Joy of SAX Leonidas Fegaras University of Texas at Arlington
IKI 10100I: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100I: Data.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture13.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
XML: Extensible Markup Language
Efficient Evaluation of XQuery over Streaming Data
Top 50 Data Structures Interview Questions
A Simple Syntax-Directed Translator
SilkRoute: A Framework for Publishing Rational Data in XML
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Advance Database Systems
Lecture 13: Query Execution
XQuery Leonidas Fegaras.
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras

CSE 6331 © Leonidas Fegaras XML and Relational Databases 2 Two Approaches XML Publishing –treat existing relational data sets as if they were XML define an XML view of the relational data pose XML queries over this view –global as view (GAV) vs local as view (LAV) –materializing (parts of) the view XML Storage –use an RDBMS to store and query existing XML data choose a relational schema for storing XML data translate XML queries to SQL

CSE 6331 © Leonidas Fegaras XML and Relational Databases 3 Publishing without Views Constructs XML data in main memory on the fly Based on language extensions to SQL and modified query engine Requires user-defined functions for XML element construction Example: define XML constructor ARTICLE ( artId:integer, title:varachar(20), authorList:xml ) AS { $title $authorList } Special function to concatenate input fragments –list vs set

CSE 6331 © Leonidas Fegaras XML and Relational Databases 4 Publishing with Support for Views Provide XML views over relational data –a view is not necessarily materialized Queries are XML queries over these views –goal: retrieve only the required fragments of relational data by pushing the computation into the relational engine as much as possible –we don't want to reconstruct the entire XML document from all the relational data and then extract the answer from the document

CSE 6331 © Leonidas Fegaras XML and Relational Databases 5 Case Study: XPERANTO Reference: Carey et al: “XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents.” in VLDB Automatically creates a default XML view from relational tables –top-level elements correspond to table names –row elements are nested under the table elements –for each row element, a column corresponds to an element whose tag name is the column name and text is the column value Example Relational schema: Department ( deptno, dname, address) Employee ( ssn, dno,name, phone, salary ) DTD of the default view:...

CSE 6331 © Leonidas Fegaras XML and Relational Databases 6 XPERANTO (cont.) The default view may be refined by a user view –the view is defined using an XQuery { for $d in view(“default”)/db/Departments for $e in view(“default”)/db/Employees[dno=$d/deptno] return {$e/name,$d/dname} } Query can be on the view for $e in return $e/name

CSE 6331 © Leonidas Fegaras XML and Relational Databases 7 XPERANTO (cont.) It uses the XML Query Graph Model (XQGM) as internal representation –enables the translation from XQuery to SQL –exploits an XML query algebra It removes all XML navigation operators –to avoid intermediate results It pushes joins and selections down to the relational query engine –query decorrelation

CSE 6331 © Leonidas Fegaras XML and Relational Databases 8 Relational Schemas for XML Various approaches –generic mapping regardless of any schema or data knowledge same for all kinds of XML data –user-defined mapping from XML to relational tables –mapping is inferred from DTD or XML Schema –mapping is derived from conceptual model –mapping is deduced from ontologies or domain knowledge –mapping is derived from query workload

CSE 6331 © Leonidas Fegaras XML and Relational Databases 9 Generic Mapping XML data can be seen as a graph Three ways of storing graph edges: –edge approach: store all edges in a single table –binary approach: group all edges with the same label into a separate table –universal table: an outer join between all tables from the binary approach Two ways of mapping values: –using a separate value table –inlining the values into the edge table(s) Usually binary approach with inlining

CSE 6331 © Leonidas Fegaras XML and Relational Databases 10 A Single Table create table element ( tagname varchar(20), content varchar(100), begin int not null, end int not null, level int not null ) text1 text <-- begin/end positions tagname content begin end level A null B null B null null text null text

CSE 6331 © Leonidas Fegaras XML and Relational Databases 11 A Single Table (cont.) For example, the XPath query: //book/title is translated into the following SQL query: select e2 from element e1, element e2 where e1.tagname = 'book' and e2.begin > e1.begin and e2.end < e1.end and e2.level = e1.level+1 and e2.tagname = 'title'

CSE 6331 © Leonidas Fegaras XML and Relational Databases 12 A Single Table (cont.) The XPath query: /books//book[author/name="Smith"]/title is translated into: select e6 from element e1, element e2, element e3, element e4, element e5, element e6 where e1.level = 0 and e1.tagname = 'books' and e2.begin > e1.begin and e2.end < e1.end and e2.level > e1.level and e2.tagname = 'book' and e3.begin > e2.begin and e3.end < e2.end and e3.level = e2.level+1 and e3.tagname = 'author' and e4.begin > e3.begin and e4.end < e3.end and e4.level = e3.level+1 and e4.tagname = 'name' and e5.begin > e4.begin and e5.end < e4.end and e5.level = e4.level+1 and e5.content = 'Smith' and e6.begin > e2.begin and e6.end < e2.end and e6.level = e2.level+1 and e6.tagname = 'title'

CSE 6331 © Leonidas Fegaras XML and Relational Databases 13 Inferring the Relational Schema from DTD A DTD graph is generated from the DTD –one node for each DTD –a node '*' for repetition –an arrow connects a parent element to a child element in DTD Two approaches: –Shared inlining an element node corresponds to one relation … but element nodes with one parent are inlined … but nodes below a '*' node correspond to a separate relations mutual recursive elements are always mapped to separate relations –Hybrid inlining may inline elements even with multiple parents, below '*', or recursive

CSE 6331 © Leonidas Fegaras XML and Relational Databases 14 Example Shared inlining: proceeding(ID) article(ID,parent,author) title(ID,parent,title) book(ID,editor) Hybrid inlining: proceeding(ID) article(ID,parent,author,title) book(ID,editor,title) book proceeding article * author title editor

CSE 6331 © Leonidas Fegaras XML and Relational Databases 15 XML Indexing Many approaches Data guides –based on structural summary is the minimum graph that captures every valid path to data deterministic: from each node you can go to only one node via a tagname –the leaves are sets of nodes (the indexed data) –designed for evaluating XPath efficiently –may take the form of a DFA or a tree depts department student faculty name firstname lastname gpasalary name firstname lastname

CSE 6331 © Leonidas Fegaras XML and Relational Databases 16 Inverted Index Inverted indexes are used in Information Retrieval (IR) in mapping words to sets of text documents that contain the word –typically implemented as a B+-tree having the word as a key Each XML element is assigned two numbers. Two choices: –(begin,end) which are the positions of the start/end tags of the element –(order,size) which are order=begin and size=end-begin We will use the following representation of an XML element: (docnum,begin:end,level) where level is the depth level of the element Words in PCData are represented by: (docnum,position,level) Two indexes: –E-index for indexing tagnames –T-index for indexing words

CSE 6331 © Leonidas Fegaras XML and Relational Databases 17 Example Computer Science Science and Engineering <-- begin/end positions E-index: { (1,0:9,0) } { (1,1:4,1), (1,5:8,1) } T-index: Computer { (1,2,3) } Science { (1,3,3), (1,6,3) } Engineering { (1,7,3) } E-index is implemented as a table with secondary index on tag Table element: tagname doc begin end level A B B

CSE 6331 © Leonidas Fegaras XML and Relational Databases 18 Containment Join XPath steps are evaluated using containment joins –a join that indicates that the inner element should be 'contained' inside the outer element For example, the XPath query //book/title is translated into the following SQL query: select e2 from element e1, element e2 where e1.tagname = “book” and e2.doc = e1.doc and e2.begin > e1.begin and e2.end < e1.end and e2.level = e1.level+1 and e2.tagname = “title” It uses the E-index twice

CSE 6331 © Leonidas Fegaras XML and Relational Databases 19 Evaluating XPath Steps From path/A, we generate the SQL query select e2 from PATH e1, element e2 where e2.tagname = “A” and e2.doc = e1.doc and e2.begin > e1.begin and e2.end < e1.end and e2.level = e1.level+1 where PATH is the SQL query that evaluates path From path//A, we get: select e2 from PATH e1, element e2 where e2.tagname = “A” and e2.doc = e1.doc and e2.begin > e1.begin and e2.end < e1.end

CSE 6331 © Leonidas Fegaras XML and Relational Databases 20 Problems Advantages: –you can use an existing relational query evaluation engine –the query optimizer will use the E-index Disadvantages: –many levels of query nesting as many as the XPath steps need query decorellation –even after query unnesting, we get a join over a large number of tables these are self joins because we are joining over the same table (element) most commercial optimizers can handle up to 12 joins Need a special evaluation algorithm for containment join –based on sort-merge join –requires that the indexes deliver the data sorted by major order of docnum and minor order of begin/position –facilitates pipelining

CSE 6331 © Leonidas Fegaras XML and Relational Databases 21 Pipeline Processing of XPath Queries A pipeline is a sequence of iterators class Iterator { Tuple current();// current tuple from stream void open ();// open the stream iterator Tuple next ();// get the next tuple from stream boolean eos ();// is this the end of stream? } An iterator reads data from the input stream(s) and delivers data to the output stream Connected through pipelines –an iterator (the producer) delivers a stream element to the output only when requested by the next operator in pipeline (the consumer) –to deliver one stream element to the output, the producer becomes a consumer by requesting from the previous iterator as many elements as necessary to produce a single element, etc, until the end of stream

CSE 6331 © Leonidas Fegaras XML and Relational Databases 22 Pipelines Pass one Tuple at a Time For XPath evaluation, a Tuple is a Fragment class Fragment { int document;// document ID short begin;// the start position in document short end;// the end position in document short level;// depth of term in document } E-index delivers Fragments sorted by major order of 'document' and minor order of 'begin'

CSE 6331 © Leonidas Fegaras XML and Relational Databases 23 XPath Steps are Iterators class Child extends Iterator { String tag; Iterator input; IndexIterator ti; void open () { ti = new IndexIterator(tag); } Fragment next () { while (!ti.eos() && !input.eos()) { Fragment f = input.current(); Fragment h = ti.current(); if (lf.document < p.document) input.next(); else if (lf.document > p.document) ti.next(); else if (f.begin h.end && h.level == f.level+1) { ti.next(); return h; } else if (lf.begin < h.begin) input.next(); else ti.next();

CSE 6331 © Leonidas Fegaras XML and Relational Databases 24 Example X Y Z W (1,1:8,0)(1,2:4,1)(1,10:14,1) (1,9:18,0)(1,5:7,1) (1,11:13,2) (1,15:17,1) Query: //a/b

CSE 6331 © Leonidas Fegaras XML and Relational Databases 25 XPath Evaluation Based on Iterators Iterators implement containment joins using sort-merge joins –they maintain the invariant that all fragments are sorted by document (major) and begin/position (minor) order They can support two modes for path evaluation 1)starting from a specific document, evaluate an XPath query document(“book.xml”)//book/author 2)evaluate an XPath query against all indexed documents document(“*”)//book/author The sorted lists derived from E-index/T-index may be very long –improvement jump over the list elements that do not contribute to the result can be accomplished if the index is a B+-tree

CSE 6331 © Leonidas Fegaras XML and Relational Databases 26 A Problem Pure sort-merge join may not work in some extreme cases –Example: //a/b text text This can be easily fixed by using a stack that holds the 'open' elements of the left input –when we advance from (1,1:10,0) to (1,2:6,1) we push (1,1:10,0) –very little space overhead: max size of stack = depth of the XML tree (1,1:10,0)(1,3:5,2) (1,2:6,1)(1,7:9,1) will miss text1

CSE 6331 © Leonidas Fegaras XML and Relational Databases 27 Preorder/Postorder Encoding Each node is assigned a (pre,post) pair –replaces (begin,end) –Preorder is the document order of the opening tags –Postorder is the document order of the closing tags 0 A 9 1 B 3 2 C 2 3 D 04 E 1 5 F 8 6 G 4 7 H 7 8 I 59 J 6 pre post F A We can now check for all XPath axes (steps) using pre, post, & level

CSE 6331 © Leonidas Fegaras XML and Relational Databases 28 Pipelines for XQuery Same iterators... class Iterator { Tuple current();// current tuple from stream void open ();// open the stream iterator Tuple next ();// get the next tuple from stream boolean eos ();// is this the end of stream? }... but for XQuery, a Tuple may contain multiple fragments –one fragment for each for-loop variable class Tuple { Fragment[] components; } –when accessing a for-variable, you access the ith component known at compile time Some operations are blocking –sorting –concatenation

CSE 6331 © Leonidas Fegaras XML and Relational Databases 29 For-Loops using Iterators Need a stepper for a for-loop: Loop leftright pipeline right set Step class Step extends Iterator { boolean first; Tuple tuple; void open () { first = true; current = tuple; } Tuple next () { first = false; return current; } void set ( Tuple t ) { tuple = t; } boolean eos () { return !first; } } Tuple Loop.next () { if (!left.eos()) { while (right.eos()) { left.next(); right_step.set(left.current()); right.open(); }; current = left.current().append(right.current()); right.next(); return current; } right_step class Loop extends Iterator { Iterator left; Step right_step; Iterator right; }

CSE 6331 © Leonidas Fegaras XML and Relational Databases 30 Let-Bindings using Iterators Let-bindings are the hardest to implement: the let-value may be a sequence one producer -- many consumers we do not want to materialize the let-value in memory queue head tail slowest consumer fastest consumer Some cases are hopeless: let $v:=e return ($v,$v) backlog