Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006.

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML to Relational Database Mapping
XML: Extensible Markup Language
Composing XSL Transformations with XML Publishing Views Chengkai LiUniversity of Illinois at Urbana-Champaign Philip Bohannon Lucent Technologies, Bell.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Storing and Querying XML Data in Databases Anupama Soli
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 COS 425: Database and Information Management Systems XML and information exchange.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
XML and The Relational Data Model
Database Systems and XML David Wu CS 632 April 23, 2001.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Exploring Microsoft® Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Robert Grauer and Maryann Barber Using.
Graph Algebra with Pattern Matching and Aggregation Support 1.
4/20/2017.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML: Overview MIS 181.9: Service Oriented Architecture 2 nd Semester,
Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.
School of Computing and Management Sciences © Sheffield Hallam University To understand the Oracle XML notes you need to have an understanding of all these.
IT420: Database Management and Organization XML 21 April 2006 Adina Crăiniceanu
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.
1 Maintaining Semantics in the Design of Valid and Reversible SemiStructured Views Yabing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
EXtensible Markup Language (XML) and Documentation --ManojBokil -- Manoj Bokil.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
1 Some of my XML/Internet Research Projects CSCI 6530 October 5, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
Report from Workshop 8: XML and related technologies ELAG 2001 Jan Erik Kofoed BIBSYS Library Automation.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML and Database.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
XML Databases – do they really exist? Jan Erik Kofoed BIBSYS Library Automation ELAG 2005 at CERN, Geneva.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Storage We must upgrade to XML. Everyone is talking about it. Well, that is going to cost us XXX on YYY and earn us WWW on ZZZ.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML to Relational Database Mapping
XML: Extensible Markup Language
Semi-Structured Data and Agile Application Development
XML and Databases.
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Storing and Querying XML Documents Without Using Schema Information
XML Data Introduction, Well-formed XML.
Semi-Structured data (XML Data MODEL)
CSE591: Data Mining by H. Liu
Presentation transcript:

Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006

2/28/2006Wayne State University2 Outline of Talk What is XML? HTML vs. XML Problem Statement Schema-based Relational Approach Schema Mapping Data Mapping Query Mapping Reconstruction Conclusions

2/28/2006Wayne State University3 What is XML? eXtensible Markup Language primarily created by Jon Bosak of Sun Microsystems officially recommended by W3C (World Wide Web Consortium) since 1998 a simplified form of SGML (Standard Generalized Markup Language)

2/28/2006Wayne State University4 What is XML? (cont.) a meta language allows you to create and format your own document markups separates content from format a method for putting structured data into a text file; these files are easy to read unambiguous extensible platform-independent

2/28/2006Wayne State University5 HTML vs. XML First Value Second Value

2/28/2006Wayne State University6 HTML vs. XML (cont.) front door back door double hung1 double hung2 kitchen hallway double hung1 living_room

2/28/2006Wayne State University7 HTML vs. XML (cont.) HTML - uses tags and attributes - content and formatting can be placed together - tags and attributes are pre- determined and rigid - describes what a document looks like - doesn’t allow user to define content rules XML - uses tags and attributes - content and format are separate; formatting is contained in a stylesheet - allows user to create his/her own set of tags and attributes - describes the information in a document - allows user to define content rules (DTD)

2/28/2006Wayne State University8 Why Storing and Querying XML? XML has emerged as the standard for representing and exchanging data on the World Wide Web. The increasing amount of XML documents requires the need to store and query XML data efficiently.

2/28/2006Wayne State University9 A Sample XML Dataset European Bioinformatics Institute Databases ftp://ftp.ebi.ac.uk/pub/databases/interpro/ match.xml ~ 700MB

2/28/2006Wayne State University10 Approaches of Storing and Querying XML Documents using Native XML repositories Software AG’s Tamino eXcelon’s XIS using XML-enabled commercial database systems Oracle XML DB DB2 XML Extender Microsoft SQLXML using RDBMS/ORDBMS to store and query XML documents (Relational Approach)

2/28/2006Wayne State University11 Why to store XML in RDBMS? to get advantage of mature RDBMS technology in efficient storage, indexing and optimization techniques to enable companies or researchers to store and query XML data using their existing RDBMS system to enable processing of transformed XML data using both XML and relational queries from a middleware environment

2/28/2006Wayne State University12 Relational Approach XML-Publishing XPERANTO - Carey et al., WebDB’00 SilkRoute – M. Fernandez et al., WWW’00 Schema-less approach Edge – D. Florescu et al., IEEE DEB’99 STORED – A. Deutsch et al., SIGMOD’99 Schema-based approach Basic, Shared and Hybrid inlining – J. Shanmugasundaram et al., VLDB’99 ODTDMap – M. Atay et al., IS’06

2/28/2006Wayne State University13 Schema-based Relational Approach Schema Mapping XML data model is mapped into the relational model Data Mapping XML documents are shredded and composed into tuples to be inserted into the relational database Query Mapping XML queries are translated into SQL queries Reverse Data Mapping (Reconstruction) Original XML document is recovered from the RDBMS

2/28/2006Wayne State University14 Schema Mapping Schema mapping algorithm ODTDMap contains the following steps: Simplifying DTDs Creating and inlining DTD graphs Generating relational schema and the schema mapping file.

2/28/2006Wayne State University15 Sample DTD – univ.dtd <!DOCTYPE univ [ ]>

2/28/2006Wayne State University16 Creating DTD Graph <!DOCTYPE univ [ ]>

2/28/2006Wayne State University17 Inlining DTD Graph

2/28/2006Wayne State University18 Generating Relational Schema

2/28/2006Wayne State University19 Data Mapping Challenging issues of data mapping Should respect to schema mapping Varying document structure Scalability We introduced two efficient linear algorithms OXInsert main memory data mapping algorithm DOM-based SDM streaming data mapping algorithm SAX-based

2/28/2006Wayne State University20 Sample XML document - univ.xml

2/28/2006Wayne State University21 XMLTree for univ.xml

2/28/2006Wayne State University22 XMLTree for univ.xml

2/28/2006Wayne State University23 Database state after univ.xml is mapped

2/28/2006Wayne State University24 Performance of OXInsert and SDM

2/28/2006Wayne State University25 Data Mapping Across Different Schema Mappings

2/28/2006Wayne State University26 Query Mapping We translate simple XPath expressions to SQL XPath is the core of XML query languages. We identified 3 algorithms for query mapping Naïve Cluster Containment Join

2/28/2006Wayne State University27 Naïve Takes an XPath expression creates a nested SQL query comprised of SQL queries for each XPath step e.g. XPath: /univ /colleges /college SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ where colleges.parentID in (Select univ.ID from univ where univ.parentID=0) ) )

2/28/2006Wayne State University28 Cluster A cluster is a sequence of consecutive elements stored in the same table Takes an XPath expression and creates a nested SQL query comprised of SQL queries for each XPath cluster e.g. XPath: /univ /colleges /college SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ) )

2/28/2006Wayne State University29 Containment Join Relies on the well-formedness of XML documents Requires the pre-computation of max. ID of descendants of each element instance (endID) Facilitates efficient evaluation of recursive XML queries e.g. XPath: /univ /colleges /college SQL: Select dep.ID from dep, college, univ where dep.dName=‘CS’ and dep.ID>=college.ID and dep.ID<=college.endID and college.parentID=univ.colleges.ID

2/28/2006Wayne State University30 A Recursive Query Example XPath: /univ //dep Sub queries of the recursive query /univ /colleges /college /dep /univ /schools /school /dep Naïve: 8 SQL queries + 6 joins + 1 union Cluster: 6 SQL queries + 4 joins + 1 union Containment Join: 1 SQL query + 1 join Select dep.ID from dep, univ where dep.ID>=univ.ID and dep.ID<=univ.endID

2/28/2006Wayne State University31 Reconstruction In query mapping stage, the elements selected by an XML query can be returned in one of the following two modes: Select mode: returns IDs Reconstruct mode: returns XML subtrees Algorithm Reconstruct reconstructs the XML subtree rooted at a given element The importance of Reconstruction lies in two aspects: XML subtree reconstruction has great impact on the query response time in reconstruct mode. It demonstrates that our mapping scheme is lossless

2/28/2006Wayne State University32 Conclusions Schema mapping [1,5] lossless and order preserving processing set-valued XML attributes simple processing of recursion Data mapping [1,3] We described the first linear-time schema- based data mapping algorithms We justified their effectiveness on different schema mapping algorithms

2/28/2006Wayne State University33 Conclusions (cont.) Query mapping We identified 3 algorithms Our CJ algorithm outperforms the only published recursive query mapping algorithm by Krishnamurthy et al., IEEE ICDE’04 Reconstruction [2] We introduced an efficient reconstruction algorithm It can be used in relational schema-based mapping unlike its rivals used in XML-publishing

2/28/2006Wayne State University34 Future Work Extending the schema mapping to XML Schema Extending the query mapping to XQuery Introducing DTD/Schema constraints to the proposed mapping scheme Incorporating access control methods to the proposed mapping scheme

2/28/2006Wayne State University35 Acknowledgements Dr. Shiyong Lu Dr. Farshad Fotouhi Artem Chebotko Dapeng Liu Yezhou Sun

2/28/2006Wayne State University36 Publications 1.Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi, "Efficient Schema- based XML-to-Relational Data Mapping", International Journal of Information Systems, (to appear) 2.Artem Chebotko, Dapeng Liu, Mustafa Atay, Shiyong Lu and Farshad Fotouhi, “Reconstructing XML Subtrees from Relational Storage of XML Documents”, in Proc. of the 2nd International Workshop on XML Schema and Data Management (XSDM’05), in conjunction with ICDE’2005, Tokyo, Japan, April, Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu and Farshad Fotouhi, “Mapping XML Data to Relational Data: DOM-based Approach”, in Proc. of the 8th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA’2004). Kauai, Hawaii, USA. August, Shiyong Lu, Yezhou Sun, Mustafa Atay, and Farshad Fotouhi, “On the consistency of XML DTDs”, International Journal of Data and Knowledge Engineering, Shiyong Lu, Yezhou. Sun, Mustafa Atay, and Farshad Fotouhi, “A new inlining algorithm for mapping XML DTDs to relational schemas”, In Proc. of the First International Workshop on XML Schema and Data Management, in conjuction with the 22nd ACM International Conference on Conceptual Modeling (ER2003), Chicago, Illinois, USA, October Shiyong Lu, Yezhou Sun, Mustafa Atay, Farshad Fotouhi, "A Sufficient and Necessary Condition for the Consistency of XML DTDs", in Proc. of the First International Workshop on XML Schema and Data Management, in conjunction with the 22nd ACM International Conference on Conceptual Modeling (ER'2003), Chicago, Illinois, USA, October, 2003.