XML To Relational Model. Key Index – Forward Traversal Backward Traversal.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

By Daniela Floresu Donald Kossmann
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Hierarchies & Trees in SQL by Joe Celko copyright 2008.
Introduction to Trees Chapter 6 Objectives
Data Structures: A Pseudocode Approach with C 1 Chapter 6 Objectives Upon completion you will be able to: Understand and use basic tree terminology and.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
XML: Extensible Markup Language. Slide Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree)
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Storing and Querying XML Data in Databases Anupama Soli
1 SCHEMALESS APPROACH OF MAPPING XML DOCUMENTS INTO RELATIONAL DATABASE Ibrahim Dweib, Ayman Awadi, Seif Elduola Fath Elrhman, Joan Lu CIT 2008 Sydney,
XSL Transformations Lecture 8, 07/08/02. Templates The whole element is a template The match pattern determines where this template applies Result element(s)
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
Mark Graves Leveraging Existing DBMS Storage for XML DBMS.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
Database Systems and XML David Wu CS 632 April 23, 2001.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Data Structures and Algorithms Session 13 Ver. 1.0 Objectives In this session, you will learn to: Store data in a tree Implement a binary tree Implement.
4/20/2017.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.
Chapter 10-A Trees Modified
Data Structure & File Systems Hun Myoung Park, Ph.D., Public Management and Policy Analysis Program Graduate School of International Relations International.
Database Systems Part VII: XML Querying Software School of Hunan University
TA. Min-Joong Lee x7837)
L09: Introduction to XML Data Management  XML and XML Query Languages  Structural Summary and Coding Scheme  Managing XML Data in Relational Systems.
Clustering XML Documents for Query Performance Enhancement Wang Lian.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Mapping RDB Schema to.
Trees : Part 1 Section 4.1 (1) Theory and Terminology (2) Preorder, Postorder and Levelorder Traversals.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Chapter 13.3: Databases Invitation to Computer Science, Java Version, Second Edition.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 Trees : Part 1 Reading: Section 4.1 Theory and Terminology Preorder, Postorder and Levelorder Traversals.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
XML Storage. Suppose that we are given some XML documents How should they be stored? Why does it matter? –Storage implies which type of use can be efficiently.
XML Storage We must upgrade to XML. Everyone is talking about it. Well, that is going to cost us XXX on YYY and earn us WWW on ZZZ.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
XML Storage.
Data Structures and Design in Java © Rick Mercer
Recursive Objects (Part 4)
Section 8.1 Trees.
Lecture 18. Basics and types of Trees
Storing and Querying XML Documents Without Using Schema Information
Early Profile Pruning on XML-aware Publish-Subscribe Systems
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
Trees.
Trees.
Presentation transcript:

XML To Relational Model

Key Index – Forward Traversal Backward Traversal

Binary Approach B name (source, ordinal, flag, target) Create many tables as different subelement and attribute names occur in XML document Partition Edge Table by name Universal table – Take outer join of all binary tables

Universal Table with Overflow

Converting Ordered XML to Relations

Skynet Hitech. Company Skynet Hitech Research John Smith Tom Jackson Sales Linda White Kevin Lee

Ordered XML model for Skynet Hitech. Company

Schema of the storing table Attributes ID ID: the unique index for each tuple DID: the document ID Path: the path from the root to the leaf node, this is to find a particular node Surrogate Pattern: number representation of nodes Value: Text value associated with each node

Numbering nodes

Tuple that stores “ Linda White ” ID: DID: 501 Path: Company/Department/Manager Surrogate Pattern: 1[1]2[2]2[1] Value: Linda White

Old Skynet file stored in the RDBMS OLD PathSurrogate PattenValue Company/Name1[1]1[1]Skynet Hitech Company/Department/Name1[1]2[1]1[1]Research Company/Department/Manager1[1]2[1]2[1]John Smith Company/Department/Employee1[1]2[1]3[1]Tom Jackson Company/Department/Name1[1]2[2]1[1]Sales Company/Department/Manager1[1]2[2]2[1]Linda White Company/Department/Employee1[1]2[2]3[1]Kevin Lee

<!ELEMENT book (booktitle, author)

Basic Inline Algorithm A relation is created for root of element of graph All element’s descendents are inlined into that relation except Children below a “*” node are made into separate relations – this corresponds to creating a new relation for a set-valued child Each node having a backpointer edge pointing to it is made into a separate relation

Drawbacks Grossly inefficient for many queries “List all authors having first name Jack” will have to be executed as the union of 5 separate queries Large number of relations it creates

To determine the set of relations to be created for an element, we construct an element graph by… Do a DFS traversal of DTD graph, starting at element node for which we are constructing relations Each node is marked as “visited” the first time it is reached and is unmarked once all its children have been traversed If an unmarked node in DTD graph is reach during DFS, a new node bearing the same name is created in the element graph A regular edge is created from the most recently created node in the element graph with the same names as the DFS parent of the current DTD node to newly created node If an attempt is made to traverse an already marked DTD, then a backpointer edge is added from the most recently created node in the element graph to the most recently created node in the element graph of the same name as the marked DTD node

Fragmentation: Example Results in 5 relations Just retrieving first and last names of an author requires three joins! author (authorID: integer, id: string) name (nameID: integer, authorID: integer) firstname (firstnameID: integer, nameID: integer, value: string) lastname (lastnameID: integer, nameID: integer, value: string) address (addressID: integer, authorID: integer, value: string)

Shared Inlining Method Relations are created for… All elements in the DTD graph whose nodes have an in-degree greater than one. Nodes with in-degree of one are inlined Elements have an in-degree of zero Elements below a “*” node Of mutually recursive elements all having in-degree one, one of them is made a separate relation Each element node X that is a separate relation inlines all nodes Y that are reachable from it such that the path from X to Y does not contain a node that is to be made a separate relation

Issues with Sharing Elements Parent of elements not fixed at schema level Need to store type and ids of parents parentCODE field (type of parent) parentID field (id of parent) No foreign key relationship

Hybrid Same as Shared except that it inlines some elements not inlined in Shared Inlines elements with in-degreee greater than one that are not recursive or reached through a “*” node. Set sub-elements and recursive elements are treated as in Shared

book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string) article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string) monograph (monographID: integer, monograph.parentID: integer, monograph.parentCODE: integer, monograph.editor.isroot: boolean, monograph.editor.name: string) title (titleID: integer, title.parentID: integer, title.parentCODE: integer, title: string) author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean, author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)

Shared Inline

Hybrid