1 From XML Schema to Relations: A Cost- Based Approach to XML Storage Presented by Xinwan Bian and Danyu Wu 02-21-02.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
XML: Extensible Markup Language
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
Database Systems and XML David Wu CS 632 April 23, 2001.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
1 Maintaining Semantics in the Design of Valid and Reversible SemiStructured Views Yabing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
1 Some of my XML/Internet Research Projects CSCI 6530 October 5, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
XML and Database.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
LegoDB XML-to-Relational Mapping using LegoDB Dustin Anderson CSC560 a way to map XML Schema structures to relational tables.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Sets and Maps Chapter 9.
XML: Extensible Markup Language
Querying and Transforming XML Data
ITD1312 Database Principles Chapter 5: Physical Database Design
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
OrientX: an Integrated, Schema-Based Native XML Database System
Physical Database Design
SilkRoute: A Framework for Publishing Rational Data in XML
eXtensible Markup Language (XML)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Sets and Maps Chapter 9.
Wednesday, May 29, 2002 XML Storage Final Review
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

1 From XML Schema to Relations: A Cost- Based Approach to XML Storage Presented by Xinwan Bian and Danyu Wu

2 Introductions Where to save XML document?. XML database. Object-Oriented database. Object-Relational database. Relational database

3 Difficulties of Saving XML document into Relational Database XML has more complex tree structure than flat relational tables XML contains richer data types The integration with legacy tables

4 Different Approaches to schema mappings Fixed XML-to-relational mappings Commercial RDBMS utility tools Bell Laboratories cost-based approach

5 LegoDB, an XML storage mapping system Three design principles. Cost-based search. Logical/physical independence. Reuse of existing technology

6 The Basic Approach of LegoDB Create a p-schema for input XML schema Obtain cost estimates with input of data statistics and XQuery workload Exploit alternative storage configurations and achieve an optimal mapping

7 Architecture of the Mapping Engine Generate Physical Schema Physical Schema Transformation Query/Schema Translation Query Optimizer XML data statisticsXML Schema PS0PSiRSi Optimal ConfigurationXQuery workload cost(SQi) Rsi: Relational Schema/Queries/Stats Psi: Physical Schema

8 Questions Its Advantages? Its Disadvantages?

9 Example of P-Schema Creation type Show= type Show= TABLE Show show [String], show ( show_id INT, title [String], title [String] type STRING, year [Integer], year [Integer], year INT ) reviews [String]*, Reviews*, TABLE Review …] type Reviews = ( Review_id, reviews[String] review String, parent_show INT) (a) Initial XML schema (b) P-Schema © Relational table

10 What’s P-Schema? Physical schemas (p-schemas) is an extension of XML schemas in two significant ways:. They contain data statistics. They can be easily mapped into relational tables

11 Example of P-Schema with statistics type Show = show String ], year[ Integer ], title[ String ], Review* ] type Review = review[ String ] Scalar String *

12 Stratified Physical Types scalar type s ::= Integer | String | Boolean Physical type ps ::= ps Named type nt::= X (type name) | nt | nt (choice) |  (empty) | nt{n,m,# repetition Optional type ot ::= nt (named type) | s (optional scalar) | L[ot] (optional element) | ot, ot (optional sequence) | () (empty) Physical type pt ::= nt (named type) | ot{0,1} (optional type) | s (scalar) | L[pt] (element) | pt, pt (sequence) | () empty Schema item si ::= type X = pt (type declaration) Schema ::= schema Sn = si, si, … end (schema)

13 Mapping of p-schema to relations Create one relation R T for each type name T For each R T, create a key that stores node id For each R T, create a foreign key to all relations R PT such that PT is a parent type of T A column is created in R T for each sub-element of T that is a physical type If the data type is contained within an optional type then the corresponding column can be null

14 More details of P_Schema to relational mappings

15 Schema Transformations Advantages of transformations at XML Schema level. Much of the XML schema semantics not present in a given relational schema.. More natural rewriting at the XML level. The framework is more easily extensible to other non-relational stores

16 Inlining/Outlining Transformation One can either associate a type name to a given nested element (outlining) or next its definition directly within its parent element (inlining). type TV= seasons [Integer] type TV = Description, seasons[Integer], Episode* => description[String], Episode* type Description = description [String]

17 Union Factorization/Distribution Transformation The first law ((a,(b|c)) == (a,b|a,c) type Show = show title[String] show title[String], year [Integer], title[String], year[Integer], Aka{1,10}, Review*, {Movie|TV}] Aka{1,10}, Review*, box_office[Integer], type Movie = => video_sales[Integer]) box_office[Integer] | title[String], video_sales[Integer] year[Integer], Aka{1,10} Review*, seasons[Integer], Type TV = seasons[Integer], description[String],Episode*)] description[String], Episode*

18 Corresponding relational configuration TABLE TV ( TV_id INT, seasons String, TABLE TV ( parent_show ) TV_id INT, => seasons String, TABLE Description description String, ( Description_id INT parent_Show ) description String, parent_TV )

19 Union Factorization/Distribution continues The Second law (a[t1|t2] == a[t1]|a[t2]) Type Show = type Show = (Show Part1 | Show Part2 ) title[String],year[Integer], type Show Part 1 = show [String], Aka{1,10}, Review*, title [String], year[Integer], Aka{1,10}, box_office[Integer], Review*, box_office[Integer], video_sales[Integer]) video_sales[Integer] ] | [String], => title [String], year [Integer], type Show Part2 = Aka{1,10}, Review*, show [String], title[String], seasons [Integer], year [Integer], Aka{1,10}, description [String], Review*, seasons [Integer], Episode*) ] description [String], Episode* ]

20 Corresponding relational configurations TABLE Show ( Show_id INT, TABLE Show_Part1 ( type String, title String, Show_Part1_id INT, year INT) type String, title String, year INT, box_office INT, TABLE Movie ( video_sales INT) Movie_id INT, Box_Office INT, => video_sales INT, parent_show INT) TABLE Show_Part2 ( Show_Part2_id INT, TABLE TV ( type String, title String, TV_id INT, seasons INT, year INT, seasons INT, description string, parent_show INT) description String )

21 Wildcard rewritings ‘~’: any element names can be used ‘~!a’: any name but “a” can be used. Type Review = type Reviews = review [~[ String ]*] review[ (NYTReview | OtherReview)*] => type NYTReview = nyt[ String] type OtherReview = (~!nyt) [String]

22 XQuery Queries Examples Q1: FOR $v in imdb/show WHERE $v/year = 1999 RETURN ($v/title, $v/year, $v/nyt_reviews) Q2: FOR $v in imdb/show RETURN $v Q3: FOR $v in imdb/show WHERE $v/title = c3 RETURN $v/description Q4: FOR $v in imdb/show RETURN { $v/title, $v/year, (FOR $e IN $v/episode WHERE $e/guest_director = c4 RETURN $e) }

23 XQuery Workload Examples Publish = { Q1 : 0.4, Q2: 0.4, Q3: 0.1, Q4: 0.1} Lookup = {Q1: 0.1, Q2: 0.1, Q3:0.4, Q4: 0.4}

24 Search Algorithm Procedure GreedySearch Input: xSchema: schema, xWkld: query workload, xStats:data statistics Output: pSchema: an efficient physical schema 1 begin minCost = infinite large ; pSchema = GetInitialPhysicalSchema(xSchema) cost = GetPSchemaCost(pSchema, xWkld, xStats) while (cost < minCost) do 5 minCost = cost pSchemaList = ApplyTransformations(pSchema) for each pSchema’ € pSchemaList do cost’=GetPSchemaCost(pSchema’,xWkld,xStats) if cost’s < cost then cost = cost’; pSchema = pSchema’ endif 10 endfor endwhile return pSchema end.

25 Experimental Settings Two variations of the greedy search: greedy-so and greedy-si. Greedy-so: Initial physical schema: all element outlined (except base type). During search: Inlining transformations applied. Greedy-si: Initial physical schema: all elements inlined (except elements with multiple occurences) During search: Outlining transformations applied.

26 Efficiency of Greedy Search 5 lookup queries and 3 publish queries

27 Results For lookup: Greedy-so converges to the final configuration a lot faster. For publish: opposite.

28 Reasons: The traversals made by lookup queries are localized. The final configuration has only a few inlined elements. Greedy-so can reach this configuration earlier than greedy- si. The publish queries traverse larger number of elements. The final configuration has several inlined elements. Greedy-si can reach this configuration earlier than greedy- so.

29 Sensitivity of configurations to varied workloads Create a spectrum of workloads that combined the lookup queries and publish queries in the ratio k : (1- k), where k€[0,1] is the fraction of lookup queries in the particular workload. Three workloads corresponding to k = 0.25, 0.50, and 0.75, resulting three configurations.

30 Figure 11: Sensitivity to variations in the workload

31 Inlining as a bad idea to some queries (a)The query does limited, localized traversals and/or does not access all the attributes involved. (b)The query has highly selective selection predicates. (c)The query involves join of attributes not structurally adjacent in the XML Schema (e.g. actor and director).

32 Effectiveness of XML transformations:Union Distribution

33 Results of the union-transformed configuration Overlap between the curves for C[0.25] and C[0.75] with OPT. C[0.25] and C[0.75] cross at a small angle. C[All-inlined] performed 2~5 times worse than optimal.

34 Wildcards Find the NYTimes reviews for shows produced in 1999:

35 Questions The optimal mapping in this paper is cost-based. What else needs to be considered?

36 References P.Bohannon, J.Freire, P. Roy, and J. Sim’eon. From XML schema to relations: A cost –based approach to XML storage. Technical report, Bell Laboratories, Full version. A. Deutsch, M. Fernandez, and D. Suciu. Storing semi-structured data with STORED. In Proc. Of SIGMOND, pp , D. Florescu and D. Kossman. A performance evaluation of alternative mapping schemas for storing XML in a relational database. Technical Report 3680, INRIA, 1999 M. Klettke and H. Meyer. XML and object-relational database system – enhancing structural mappings based on statistics. In Proc. Of WebDB, pp63- 68, A. Schmidt, M. Kersten, M. Windhouwer, and F.Waas. Efficient relational storage and retrieval of XML documents. In Proc. Of WebDB, pp47-52, J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. Relational databases for querying XML documents: Limitations and Opportunities. In Proc. Of VLDB, pp , 1999.