Database Systems and XML David Wu CS 632 April 23, 2001.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
XML To Relational Model. Key Index – Forward Traversal Backward Traversal.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
XMLII XSchema XSchema XQuery XQuery. XML Schema XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports XML.
4/20/2017.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML-QL A Query Language for XML Charuta Nakhe
Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Querying Structured Text in an XML Database By Xuemei Luo.
Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Using Special Operators (LIKE and IN)
Publishing Relational Data in XML David McWherter.
Database Systems Part VII: XML Querying Software School of Hunan University
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
Chapter 10 Designing the Files and Databases. SAD/CHAPTER 102 Learning Objectives Discuss the conversion from a logical data model to a physical database.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
임 순 범 숙명여대 정보과학부 멀티미디어학과 1 III. XML-QL 멀티미디어 데이터베이스 ( ~11.1)
XML Storage We must upgrade to XML. Everyone is talking about it. Well, that is going to cost us XXX on YYY and earn us WWW on ZZZ.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
XML: Extensible Markup Language
Database Management System
Efficiently Publishing Relational Data as XML Documents
Semi-Structured data (XML Data MODEL)
Lecture 2- Query Processing (continued)
Evaluation of Relational Operations: Other Techniques
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

Database Systems and XML David Wu CS 632 April 23, 2001

Researched Papers J. Shanmugasundaram, et al. "Efficiently Publishing Relational Data as XML Documents", VLDB Conference, September J. Shanmugasundaram, et al. "Relational Databases for Querying XML Documents: Limitations and Opportunities," VLDB Conference, September 1999.

Efficiently Publishing Relational Data as XML Documents

Motivation Relational database systems and XML are heavily used on the Web. Would like some way to publish relational data as XML.

What is Needed Language to specify the conversion from relational data to XML. Implementation to efficiently carry out the conversion.

SQL Based Language

Implementation Alternatives Main differences between relations and XML: XML docs have tags XML has nested structure

Early Tagging, Early Structuring Stored Procedure Approach (outside engine) –Performs a nested-loop join by issuing queries for each nested structure in the desired XML. –High overhead due to the number of queries. –Fixed join order.

Early Tagging, Early Structuring Correlated CLOB Approach (inside engine) –Have one large query with sub-queries is run within the engine. –Must add XML constructor support to the engine. –XML fragments from the constructors are stored as CLOBs (Character Long Objects). Costly to handle. De-Correlated CLOB Approach (inside) –Perform query de-correlation to give optimizer more flexibility.

Late Tagging, Late Structuring Two phases: 1)Content creation 2)Tagging and structuring

Late Tagging, Late Structuring Content Creation: Redundant Relation Approach –Join all source tables –Both content and process redundancy

Late Tagging, Late Structuring Content creation: Outer Union Approach –Separate the children of the same parent (e.g. one tuple should represent either account or purchaseOrder). –At the end outer union the results. –Still some data redundancy (e.g. parent info)

Late Tagging, Late Structuring Outer Union Plan:

Late Tagging, Late Structuring Structuring/Tagging: Hashed-based Tagger Group by hashing Extract tuples and tag them.

Late Tagging, Early Structuring Late Tagging, Late Structuring requires much memory for the hash table. Fix by creating “structured content” and then tag.

Late Tagging, Early Structuring Structured content: Sorted Outer Union Approach –Desired format 1.Parent information comes before or with its child 2.All info of a node and its descendants occur together 3.Relative order of the tuples matches user-specified order –Achieve by performing a sort on ids on the result of the outer union.

Late Tagging, Early Structuring Tagging Sorted Data:ConstantSpaceTagger –Can append tags as soon as data is seen. –Only need to remember the parent ids of the last tuple seen to know when to append closing tags.

Experiement Inside Engine Outside Engine

Breakdown of Construction

Summary of Results Constructing inside the relational engine is more efficient. When processing can be done in main mem, the Unsorted Outer Union approach wins. When main mem is not enough, the Sorted Outer Union approach is best.

Relational Databases for Querying XML Documents

Why Bother? XML is becoming the standard for data representation in WWW. A query engine designed to tap information from XML documents is valuable. Relational database system is a mature technology and could be used to support XML querying.

Basic Idea Step 1: Generate a relational schema from the DTD Step 2: Parse the XML document and load the data into tuples of the relational table. Step 3: Translate the semi-structured XML queries into SQL corresponding to the relational data. Step 4: Convert the result back to XML.

Translating XML to Relational Schema Main Issues: 1.DTDs complexity 2.Arbitrary nesting of XML DTDs vs. two- level nature of relational schemas. 3.Set-valued attributes and recursion

1)Flattening transformation 2)Simplification transformation of unary operations 3)Grouping transformation

Techniques to translate XML DTD to relations. Basic Inlining Technique Shared Inlining Technique Hybrid Inlining technique

Basic Inlining Technique Inlining as many descendants of an element into a relation. (author:firstname,lastname,address) Every element will have a relation corresponding to it. (firstname, lastname, and address will all have elements)

Basic Inlining Technique (cont.) Complications: 1)Set-valued attributes (eg. Article) Solve by using foreign keys and other tables. 2)Recursion Solve with relational keys and relational recursive processing to retrieve the relationship.

Tools used in creating relations DTD Graph –Nodes are elements, attributes,operators –Each element appears once –Attributes and operators appear as many times as they do in the DTD –Cycles in the graph indicates recursion

Tools used in creating relations Element Graphs –Generated from the DTD graph –Created by doing a DFS from an element node

Creating a Relation Given an element graph, the root it made into a relation with all descendents inlined into it, except: 1)Children directly below a “*” are made into separate relations; 2)Each node with a backpointer edge are made into separate relations. These additional relations are named by their path from the root and have parentID fields that serve as foreign keys (e.g. Article.author has the attribute article.author.parentID)

Problems with Basic Large number of relations it creates Not efficient for certain queries –Good: “list all authors of books” –Bad: “list all authors having first name Jack”

Shared Inlining Technique Idea: Identify commonly used element nodes and share them by creating separate relations for them.

Shared Inlining Technique Rules for creating relations: –Nodes with in-degree>1 have relations made –Nodes with in-degree=1 are inlined –Nodes with in-degree=0 have relations made –Nodes following “*” have relations made –Nodes with in-degree=1 AND mutually recurive, one of them is made into a relation

Shared Inlining Technique Rules for designing the schema: –Relation X inlines all nodes Y that it an reach such that the path from X to Y does not contain a node that is to be made a separate relation. –Inlined elements are flagged as being a root with the isRoot field.

Problems with Shared Too many joins required!

Hybrid Inlining Technique Same as Shared except Hybrid also inlines elements that… –have in-degree>1 AND –are not recursive AND –are not reached through a “*” node.

Evaluation Metric For path expressions of length N, data was gathered on: The avg number of SQL queries generated The avg number of joins in each SQL query The total average number of joins in order to process the path expression

Results for N=3 For Basic, 1/3 of the DTDs tests didn’t run to completion due to lack of virtual memory. Basic is thus ignored.

Results for N=3

Group 1: Hybrid reduce join/query, increases a smaller amount of queries => Hybrid requires fewer joins than shared. Group 2: Hybrid reduces join/query, increases a comparable amount of queries=> Hybrid and Shared are the same.

Results for N=3 Group 3: Hybrid reduces some joins/query, but increased the queries by a lot => Hybrid generates more joins than Shared. Hybrid and Shared performed similarly in both joins/query and # of queries => Hybrid and Shared are about the same.

Semi-Structured Queries to SQL Semi-structured query languages –Allow path expressions with various operators and wildcards. XML-QL QueryLorel

Simple Path to SQL 1.The relations corresponding to the start of the root path is added to the FROM clause. 2.If needed, the path expressions are translated to joins.

Simple Recursive Path to SQL 1.Find initialization of the recursion (e.g. *.monograph.editor with condition monograph.title= “Subclass Cirripedia”) 2.Find the actual recursive path expression (e.g. monograph.editor) 3.Union the two

Arbitrary Path to Simple Recursive Path Use a general technique to translate path expressions to many simple (recursive) path expressions.

Relational Results to XML: Simple Structuring Requires only attaching appropriate tags to each tuple.

Relational Results to XML: Tag Variables Have the relational query contain the tag value in the result tuple. Then just covert it to a tag during XML generation.

Grouping a)Could sort the result tuples by the group- by field and and scan through it in order when generating the XML. b)Could do a grouping operation.

Other Cases Complex Element Construction –e.g. asking for all article elements and assume that may be multiple elements (e.g. author & title) –Difficult to do in traditional relational model. Heterogeneous Results –e.g. asking for either title or author of article. –Could be done in two queries and then merged.

Other Cases Nested Queries –Could be rewritten in terms of SQL queries using outer joins.

Conclusion Suggested modifications to relational systems: Untyped/variable-typed references. Information retrieval style indices Flexible comparison operators Multiple-query optimization/execution More powerful recursion support.