XML To Relational Model
Key Index – Forward Traversal Backward Traversal
Binary Approach B name (source, ordinal, flag, target) Create many tables as different subelement and attribute names occur in XML document Partition Edge Table by name Universal table – Take outer join of all binary tables
Universal Table with Overflow
Converting Ordered XML to Relations
Skynet Hitech. Company Skynet Hitech Research John Smith Tom Jackson Sales Linda White Kevin Lee
Ordered XML model for Skynet Hitech. Company
Schema of the storing table Attributes ID ID: the unique index for each tuple DID: the document ID Path: the path from the root to the leaf node, this is to find a particular node Surrogate Pattern: number representation of nodes Value: Text value associated with each node
Numbering nodes
Tuple that stores “ Linda White ” ID: DID: 501 Path: Company/Department/Manager Surrogate Pattern: 1[1]2[2]2[1] Value: Linda White
Old Skynet file stored in the RDBMS OLD PathSurrogate PattenValue Company/Name1[1]1[1]Skynet Hitech Company/Department/Name1[1]2[1]1[1]Research Company/Department/Manager1[1]2[1]2[1]John Smith Company/Department/Employee1[1]2[1]3[1]Tom Jackson Company/Department/Name1[1]2[2]1[1]Sales Company/Department/Manager1[1]2[2]2[1]Linda White Company/Department/Employee1[1]2[2]3[1]Kevin Lee
<!ELEMENT book (booktitle, author)
Basic Inline Algorithm A relation is created for root of element of graph All element’s descendents are inlined into that relation except Children below a “*” node are made into separate relations – this corresponds to creating a new relation for a set-valued child Each node having a backpointer edge pointing to it is made into a separate relation
Drawbacks Grossly inefficient for many queries “List all authors having first name Jack” will have to be executed as the union of 5 separate queries Large number of relations it creates
To determine the set of relations to be created for an element, we construct an element graph by… Do a DFS traversal of DTD graph, starting at element node for which we are constructing relations Each node is marked as “visited” the first time it is reached and is unmarked once all its children have been traversed If an unmarked node in DTD graph is reach during DFS, a new node bearing the same name is created in the element graph A regular edge is created from the most recently created node in the element graph with the same names as the DFS parent of the current DTD node to newly created node If an attempt is made to traverse an already marked DTD, then a backpointer edge is added from the most recently created node in the element graph to the most recently created node in the element graph of the same name as the marked DTD node
Fragmentation: Example Results in 5 relations Just retrieving first and last names of an author requires three joins! author (authorID: integer, id: string) name (nameID: integer, authorID: integer) firstname (firstnameID: integer, nameID: integer, value: string) lastname (lastnameID: integer, nameID: integer, value: string) address (addressID: integer, authorID: integer, value: string)
Shared Inlining Method Relations are created for… All elements in the DTD graph whose nodes have an in-degree greater than one. Nodes with in-degree of one are inlined Elements have an in-degree of zero Elements below a “*” node Of mutually recursive elements all having in-degree one, one of them is made a separate relation Each element node X that is a separate relation inlines all nodes Y that are reachable from it such that the path from X to Y does not contain a node that is to be made a separate relation
Issues with Sharing Elements Parent of elements not fixed at schema level Need to store type and ids of parents parentCODE field (type of parent) parentID field (id of parent) No foreign key relationship
Hybrid Same as Shared except that it inlines some elements not inlined in Shared Inlines elements with in-degreee greater than one that are not recursive or reached through a “*” node. Set sub-elements and recursive elements are treated as in Shared
book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string) article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string) monograph (monographID: integer, monograph.parentID: integer, monograph.parentCODE: integer, monograph.editor.isroot: boolean, monograph.editor.name: string) title (titleID: integer, title.parentID: integer, title.parentCODE: integer, title: string) author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean, author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)
Shared Inline
Hybrid