LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

XML to Relational Database Mapping
XML: Extensible Markup Language
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
ModelicaXML A Modelica XML representation with Applications Adrian Pop, Peter Fritzson Programming Environments Laboratory Linköping University.
Ling Wang, Mukesh Mulchandani Advisor: Elke A. Rundensteiner Rainbow Research group, DSRG, WPI Updating XQuery Views over Relational Data.
1 COS 425: Database and Information Management Systems XML and information exchange.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
1 From XML Schema to Relations: A Cost- Based Approach to XML Storage Presented by Xinwan Bian and Danyu Wu
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
4/20/2017.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
1 DAN FARRAR SQL ANYWHERE ENGINEERING JUNE 7, 2010 SCHEMA-DRIVEN EXPERIMENT MANAGEMENT DECLARATIVE TESTING WITH “DEXTERITY”
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Database Design for DNN Developers Sebastian Leupold.
1 Introduction to databases concepts CCIS – IS department Level 4.
Chapter 10: XML.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Michael Soffner A Variability Model for Query Optimizers Michael Soffner 1, Norbert Siegmund 1, Marko Rosenmüller 1, Janet Siegmund 1, Thomas.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
1 Some of my XML/Internet Research Projects CSCI 6530 October 5, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
LegoDB XML-to-Relational Mapping using LegoDB Dustin Anderson CSC560 a way to map XML Schema structures to relational tables.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Declarative Languages and Model Based Development of Web Applications Besnik Selimi South East European University DAAD: 15 th Workshop “Software Engineering.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Jennifer Widom Relational Databases The Relational Model.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML to Relational Database Mapping
XML: Extensible Markup Language
Chapter 1: Introduction
Potter’s Wheel: An Interactive Data Cleaning System
Introduction to Query Optimization
Column Stores For Wide and Sparse Data
Relational Databases The Relational Model.
Relational Databases The Relational Model.
SilkRoute: A Framework for Publishing Rational Data in XML
2/18/2019.
Presentation transcript:

LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar

Managing XML Data XML has become widely used for exchange of data over the Web XML is extensible and flexible: it can be used in applications with widely different requirements There is no one-size-fits-all solution for all applications What are procedures to store, query and publish XML data? Need adaptable and flexible solutions LegoDB is acomponent-based XML data management system The database-anywhere paradigm: portable and adaptable to any data and any environment

Motivation & challenge’s Motivation: reuse of well developed features concurrency control crash recovery query processors wide variety of XML applications Integrate with existing data stored in an RDBMS Challenges: mismatch between nested tree structure and flat tuples of the relational model Inducing the flexibility to handle the wide domain of application But storing and querying XML data in an RDBMS is a non-trivial task

Mapping XML schema to relations Question ? Can different XML schemas validate the exact same set of documents? Yes Different but equivalent regular expression can describe the contents of a given element. (a(b|c*)) ((a,b) |(a,c*)) Sub-elements of an element can be referred to directly, or can be referred to by a type name

Sample XML Dataset: Internet Movie Database Fugitive, The 1993 Roger Ebert Two thumbs up! This is a fun action movie, Harrison Ford at his best. The standard Hollywood summer movie strikes back. 183,752,965 X Files,The Fallen Angel Larry Shaw

DTD and XML Schema

Question ? Can you find more storage mapping relations? By performing a sequence of transformations (i.e. rewritings) which preserve the semantics of the schema.

Mapping XML Schema into tables Inline as many elements as possible Partition reviews table one for NYTimes, and one for rest Split show table Into TV and Movies

Querying XML Presence of schema for XML documents For applications to interpret data For issuing queries Find the title, year and box office proceeds For all 2001 movies For $v in document (“imbdata”)/imbd/show Where $v/year=2001 Return $v/title, $v/year, $v/box_office

XML and Relational Databases There is a mismatch between the relational model and that of XML Relational: Normalized, flat and fragmented XML: Un-normalized, nested and monolithic How to store XML data into relational tables? – Need to map the nested and irregular XML data into flat and regular tables How to evaluate XML queries over relational tables? – Need to map XQuery into SQL

Problem: Storing XML in RDMS Taken from Juliana Freire’s presentation

Queries

Mapping an XML Schema into Tables Different applications requires specific mappings for best performance Publish W1={Q1 :0.4, Q2=0.4, Q3=0.1, Q4=0.1} Lookup W2={Q1=0.1, Q2=0.1, Q3=0.4, Q4=0.4} Taken from Juliana Freire’s presentation

The LegoDB Storage Mapping Engine An optimization approach: automatically explores a space of possible mappings selects the mapping which has the lowest cost for a given application Basic Principles: Application-driven: takes into account schema, data statistics and query workload Logical/physical independence: interface is XML-based ( XML Schema, XQuery, XML data statistics) Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations

LegoDB Create a p-schema for input XML schema Obtain cost estimates with input of data statistics and XQuery workload. Search space of alternative storage configurations to achieved an efficient mapping for a given application.

Architecture of the Mapping Engine Cost (SQi) RSi : Relational Schema/Queries/Stats PSi: Physical Schema

XML Schema to Relations How to transform a XML schema to a relation? P-Schema Type Show = show [String ], year [Integer ], title [String<#50,#34798.], Review* ] Type Review = review [ String ]

XML Schema to Relations For a type T and relation R R1- Create one relation R for each T. R2- Create a key for each T R3- Create a foreign key for all parents of T R4- Create columns for R for every physical type in T R5- Allow null values in R for every optional type in T

XML Schema to Relations Type Show = show [String ], year [Integer ], title [String<#50,#34798.], Review* ] Type Review = review [ String ] Show Show_ID Type Year Title Review Review_ID Review To_Show_Key (FK)

Types of XML Transformations Inlining/Outlining Union Factorization/Distribution Repetition Merge/Split Wildcard Rewritings

Inlining/Outlining Attributes can be “outlined” by removing them from a relation and using a foreign key to relate them to a table. Inlining is the exact opposite. Type TV = seasons[Integer], Description, Episode* Type Description = description[String] Type TV = seasons[Integer] description[String] Episode*

Union Factorization/Distribution ((a,(b|c)) == (a,b|a,c)) (a[t1|t2] == a[t1]|a[t2]) Type Show = show [String], title[String], year[Integer], (Movie|TV) ] Type Movie = box_office[Integer], video_sales[Int], Type TV = seasons[Integer], description[String], Episode* Type Show = show[ title[String], year[Integer], box_office[Integer], video_sales[Integer]) title[String], year[Integer], seasons[Integer], description[String], Episode*) ]

Repetition Merge/Split (a+ == a,a* == a,a*,a*) …etc Type Show = show [String], title[String], year[Integer], Aka{1,*}] Type Show = show [String], title[String], year[Integer], Aka, Aka{0,*} ]

Wildcard Rewritings We might want to access specific elements in a wildcard, such as NYTReview Type Review = review[~[String]*] Type Reviews = review[ (NYTReview | OtherReview)*] Type NYTReview = nyt[String] Type OtherReview = (~!nyt)[String]

Finding the best pSchema Use a Greedy Search Search until a “good” result is found 1. Get Initial/Current Schema 2. Get schema cost 3. Apply transformations to the schema 4. Select the best schema cost from the transformations 5. If the cost is better than the current schema, continue the search, mark this schema as the current schema. Otherwise stop searching.

Example Search

Problem? With the way that this algorithm is set up can you find a major oversight? Remember how the relational data is created: Sample XML Data Sample XML Queries

Problem… A problem can be that the relative number of each query type is not taken into consideration. For example, what will happen if 90% of queries are to gather a review for a website, while that is only 1 of 25 queries in the system. Query distribution is not uniform!

Problem...Solved…Kind Of... If we take into account the frequency of a query…

Related Work STORED- Storing Semistructured Data SilkRoute- Converting Relational Data to XML StatiX- XML Schema statistics framework

Conclusions LegoDB is an excellent way to take Cost of a query into account when transforming an XML document to the relational model Although LegoDB does an excellent job of transforming XML compared to static models, more work can be done on how to analyze how the frequency of queries affect the cost of the relational model.