LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.

LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar

Managing XML Data XML has become widely used for exchange of data over the Web XML is extensible and flexible: it can be used in applications with widely different requirements There is no one-size-fits-all solution for all applications What are procedures to store, query and publish XML data? Need adaptable and flexible solutions LegoDB is acomponent-based XML data management system The database-anywhere paradigm: portable and adaptable to any data and any environment

Motivation & challenge’s Motivation: reuse of well developed features concurrency control crash recovery query processors wide variety of XML applications Integrate with existing data stored in an RDBMS Challenges: mismatch between nested tree structure and flat tuples of the relational model Inducing the flexibility to handle the wide domain of application But storing and querying XML data in an RDBMS is a non-trivial task

Mapping XML schema to relations Question ? Can different XML schemas validate the exact same set of documents? Yes Different but equivalent regular expression can describe the contents of a given element. (a(b|c*)) ((a,b) |(a,c*)) Sub-elements of an element can be referred to directly, or can be referred to by a type name

Sample XML Dataset: Internet Movie Database Fugitive, The 1993 Roger Ebert Two thumbs up! This is a fun action movie, Harrison Ford at his best. The standard Hollywood summer movie strikes back. 183,752,965 X Files,The 1994 4 Fallen Angel Larry Shaw

DTD and XML Schema

Question ? Can you find more storage mapping relations? By performing a sequence of transformations (i.e. rewritings) which preserve the semantics of the schema.

Mapping XML Schema into tables Inline as many elements as possible Partition reviews table one for NYTimes, and one for rest Split show table Into TV and Movies

Querying XML Presence of schema for XML documents For applications to interpret data For issuing queries Find the title, year and box office proceeds For all 2001 movies For $v in document (“imbdata”)/imbd/show Where $v/year=2001 Return $v/title, $v/year, $v/box_office

XML and Relational Databases There is a mismatch between the relational model and that of XML Relational: Normalized, flat and fragmented XML: Un-normalized, nested and monolithic How to store XML data into relational tables? – Need to map the nested and irregular XML data into flat and regular tables How to evaluate XML queries over relational tables? – Need to map XQuery into SQL

Problem: Storing XML in RDMS Taken from Juliana Freire’s presentation

Queries

Mapping an XML Schema into Tables Different applications requires specific mappings for best performance Publish W1={Q1 :0.4, Q2=0.4, Q3=0.1, Q4=0.1} Lookup W2={Q1=0.1, Q2=0.1, Q3=0.4, Q4=0.4} Taken from Juliana Freire’s presentation

The LegoDB Storage Mapping Engine An optimization approach: automatically explores a space of possible mappings selects the mapping which has the lowest cost for a given application Basic Principles: Application-driven: takes into account schema, data statistics and query workload Logical/physical independence: interface is XML-based ( XML Schema, XQuery, XML data statistics) Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations

LegoDB Create a p-schema for input XML schema Obtain cost estimates with input of data statistics and XQuery workload. Search space of alternative storage configurations to achieved an efficient mapping for a given application.

Architecture of the Mapping Engine Cost (SQi) RSi : Relational Schema/Queries/Stats PSi: Physical Schema

XML Schema to Relations How to transform a XML schema to a relation? P-Schema Type Show = show [ @type [String ], year [Integer ], title [String<#50,#34798.], Review* ] Type Review = review [ String ]

XML Schema to Relations For a type T and relation R R1- Create one relation R for each T. R2- Create a key for each T R3- Create a foreign key for all parents of T R4- Create columns for R for every physical type in T R5- Allow null values in R for every optional type in T

XML Schema to Relations Type Show = show [ @type [String ], year [Integer ], title [String<#50,#34798.], Review* ] Type Review = review [ String ] Show Show_ID Type Year Title Review Review_ID Review To_Show_Key (FK)

Types of XML Transformations Inlining/Outlining Union Factorization/Distribution Repetition Merge/Split Wildcard Rewritings

Inlining/Outlining Attributes can be “outlined” by removing them from a relation and using a foreign key to relate them to a table. Inlining is the exact opposite. Type TV = seasons[Integer], Description, Episode* Type Description = description[String] Type TV = seasons[Integer] description[String] Episode*

Union Factorization/Distribution ((a,(b|c)) == (a,b|a,c)) (a[t1|t2] == a[t1]|a[t2]) Type Show = show [@type [String], title[String], year[Integer], (Movie|TV) ] Type Movie = box_office[Integer], video_sales[Int], Type TV = seasons[Integer], description[String], Episode* Type Show = show[ (@type[String], title[String], year[Integer], box_office[Integer], video_sales[Integer]) |(@type[String], title[String], year[Integer], seasons[Integer], description[String], Episode*) ]

Repetition Merge/Split (a+ == a,a* == a,a*,a*) …etc Type Show = show [@type [String], title[String], year[Integer], Aka{1,*}] Type Show = show [@type [String], title[String], year[Integer], Aka, Aka{0,*} ]

Wildcard Rewritings We might want to access specific elements in a wildcard, such as NYTReview Type Review = review[~[String]*] Type Reviews = review[ (NYTReview | OtherReview)*] Type NYTReview = nyt[String] Type OtherReview = (~!nyt)[String]

Finding the best pSchema Use a Greedy Search Search until a “good” result is found 1. Get Initial/Current Schema 2. Get schema cost 3. Apply transformations to the schema 4. Select the best schema cost from the transformations 5. If the cost is better than the current schema, continue the search, mark this schema as the current schema. Otherwise stop searching.

Example Search

Problem? With the way that this algorithm is set up can you find a major oversight? Remember how the relational data is created: Sample XML Data Sample XML Queries

Problem… A problem can be that the relative number of each query type is not taken into consideration. For example, what will happen if 90% of queries are to gather a review for a website, while that is only 1 of 25 queries in the system. Query distribution is not uniform!

Problem...Solved…Kind Of... If we take into account the frequency of a query…

Related Work STORED- Storing Semistructured Data SilkRoute- Converting Relational Data to XML StatiX- XML Schema statistics framework

Conclusions LegoDB is an excellent way to take Cost of a query into account when transforming an XML document to the relational model Although LegoDB does an excellent job of transforming XML compared to static models, more work can be done on how to analyze how the frequency of queries affect the cost of the relational model.

LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.

Similar presentations

Presentation on theme: "LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar.

Similar presentations

Presentation on theme: "LegoDB Customizing Relational Storage for XML Documents Timothy Sutherland Sachin Patidar."— Presentation transcript:

Similar presentations

About project

Feedback