LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.

Slides:



Advertisements
Similar presentations
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
By Daniela Floresu Donald Kossmann
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
The Volcano/Cascades Query Optimization Framework
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 From XML Schema to Relations: A Cost- Based Approach to XML Storage Presented by Xinwan Bian and Danyu Wu
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Database Design for DNN Developers Sebastian Leupold.
Overview of a Database Management System
Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML in SQL Server Overview XML is a key part of any modern data environment It can be used to transmit data in a platform, application neutral form.
DATABASE and XML Moussa Mané. Learning Objectives ● Learn about Native XML Databases ● Learn about the conversion technology available ● Understand New.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Storage Techniques.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Sofia, Bulgaria | 9-10 October Using XQuery to Query and Manipulate XML Data Stephen Forte CTO, Corzen Inc Microsoft Regional Director NY/NJ (USA) Stephen.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Access Path Selection in a Relational Database Management System Selinger et al.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Querying Structured Text in an XML Database By Xuemei Luo.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,
ROOT I/O for SQL databases Sergey Linev, GSI, Germany.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
LegoDB XML-to-Relational Mapping using LegoDB Dustin Anderson CSC560 a way to map XML Schema structures to relational tables.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Chapter 13: Query Processing
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
Querying and Transforming XML Data
Open Source distributed document DB for an enterprise
Database management concepts
1.1 The Evolution of Database Systems
Database management concepts
Query Optimization.
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint work with: Juliana Freire (Bell Labs, OGI) Jayant Haritsa (IISc, Bangalore) Maya Ramanath (IISc, Bangalore) Prasan Roy (Bell Labs, ITT Bombay)

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 XML Data Management  Application is based on XML – XML documents to represent information – XML Schema to model information – XPath / XQuery to access and manipulate information XML Documents Xquery Engine XML queries XML results Find the title, year and box office proceeds for all 2001 movies for $v in document(“imdbdata”)/imdb/show where $v/year=2001 return($v/title, $v/year, $v/box_office) RDBMS

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Why Storing XML in a RDBMS?  For some applications it makes sense to use a relational database backend: – Leverage many years of development of relational technology – concurrency control/transaction support – Scalability – Safety (crash recovery, duplication) – Integrate with existing data stored in an RDBMS – Performance matters!! But storing and querying XML data in an RDBMS is a non-trivial task

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 XML and Relational Databases  Mismatch between the relational model and XML.  How to store XML data into relational tables? – Mapping (“Shredding”) XML data into flat and regular tables  How to evaluate XML queries over relational tables? – Mapping XQuery into SQL (or SQL-XML?)...  Litterature filled with various mapping proposals: – Many variations over binary tables (Florescu et al, Grust et al). – Shanmugasundaram et. All try to inline as much nested elements as possible in the same table. There are many alternative mappings!

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Various mappings... TABLE Show (show_id INT, title STRING, year INT, box_office INT, seasons INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (I) Inline as many elements as possible TABLE Show (show_id INT, title STRING, year INT, box_office INT, seasons INT) TABLE NYTReview (review_id INT, review STRING, parent_Show INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (II)Partition reviews table-one for NYT,one for the rest TABLE Show1 (show1_id INT, title STRING, year INT, box_office INT) TABLE Show2 (show2_id INT, title STRING, year INT, seasons INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (III)Split Show table into TV and Movies Define group Show { element show { element title type string, element year type integer?, element review { element * type string* }, (element box type string | element seasons type string) }

LegoDB 1 Data Binding Workshop, Avaya Labs, June have various performances No given mapping is best Q1: Simple selection query Q2: Join involving a fragment of the reviews Q3: publishing query

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 The LegoDB Storage "Shredding" Engine  An optimization approach: – automatically explores a space of possible mappings – selects the mapping which has the lowest cost for a given application  Important features: – Application-driven: takes into account schema, data statistics and query workload – Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics) – Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Mapping an XML-Schema into Relations Define group Show { element show { element title type string, element year type integer, group Reviews*,... } define group Reviews { element review type string } XML Schema TABLE Show_Table( Show_id INT, title STRING, year INT ) TABLE Review_Table ( Review_id INT, review STRING, parent_Show INT ) Relational schema

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 A word about mapping queries XQuery for $x in //show/review where contains($x/review, “Potter”) SQL select review from Show_Table, Review_Table where Parent_show = Show_id // join here! and review contains “Potter”

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Transforming Schemas  Key idea: A given document can be validated by different XML Schemas: – Different but equivalent regular expressions can be used to define an element – The presence or absence of a type name does not change the semantics of an XML Schema  Applying transformations that manipulate the types (but preserve the element structure of schema) leads to a space of distinct relational configurations  Define XML Schema transformations that – Exploit the structure of the schema, and – lead to useful relational configurations

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Adding a group corresponds to splitting a table define group Show { element show { element title type string, element year type integer, element review type string*,... Original schema Define group Show { element show { element title type string, element year type integer, group Reviews*,... } define group Reviews { element review type string } Transformed schema TABLE Show( Show_id INT, title STRING, year INT ) TABLE Review (Review_id INT, review STRING, parent_Show INT ) Transformed Relational schema TABLE Show( Show_id INT, title STRING, year INT, review STRING) Relational schema

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Regular expression rewritings (group A|group B),group C ==> group D|group E horizontal table partition define group D=(group A,group C) define group E=(group B,group C) group A+ ==> put first item in a table, put the group A, group A* rest in a different table element * type string extracts certain element ==> in separate table element a type string | element (^a) type string

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Storage Design Physical Schema Transformation Query/Schema/ Stats Translation Traditional Relational Query Optimizer XML data statistics XML Schema Relational schema, stats and workload Good configuration Physical schema XQuery workload cost estimate Physical Schema Generation Physical schema

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Searching for a good configuration  Cost is key: use a relational optimizer as a black box – Support different cost-models – Quality of selected configuration depends on the accuracy of the optimizer!  Set of possible configurations that result from applying the rewritings is very large- possibly infinite!  How to search for the optimal solution? – LegoDB use a greedy search  Importance of statistics for cost evaluation – Large collections vs. small collections (e.g., many new york time review or not?) – Selectivity of predicates – XML structure distribution (distribution of children / parent relation)

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Runtime Support good configuration and mapping specification XQuery workload DB Loader XML document Query Translation mapping specification XML result Commercial RDBMS tuplesSQL query/results

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Conclusion  Cost-based approach for generating relational storage for XML – takes application characteristics into account » schema + data 'statistics' + queries  Performance shows that storage significant performance improvements  The same is likely to be true for other indexing techniques: – Full text index on XML (best for text queries?) – Native XML indexes (best for Xpath queries without schema?) – Files!!! (Best when a lot of small documents an no need for concurrency control) There is no one best way to store XML... We should try to hide that from the user!

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 A few other things I've done...  Mapping XQuery to SQL/XML – Work with Jonathan Robie – SQL/XML is an extension of SQL to XML » Standard mapping from table to XML document » Standard mapping from relational schema to XML Schema » Extension of SQL query language to builds XML elements – Goal is to evaluation XQueries on top of an ODBC driver supporting XQuery – Approach: identify a fragment of XQuery which has a direct syntactic mapping into SQL / XML – Surprise: the syntactic approach worked really well, because of the way XQuery is designed. (i.e., FLWOR close to SQL statement).  Galax : XQuery 1.0 implementation

LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 My take on the Data Binding problem  The main problem will be about the mismatch between type systems: SQL vs. XML Schema vs. Object-Oriented  I've never seen a good proposal that brings any two type systems together (then builds a language one top of it)  Where I see the best chance of success: – Use XML has the data model (can represent pretty much anything). – XML Schema can represent relational schema (see SQL/XML standard) – Can it represent Object-Oriented?