Download presentation
Presentation is loading. Please wait.
Published byBryce Norman Modified over 9 years ago
2
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint work with: Juliana Freire (Bell Labs, OGI) Jayant Haritsa (IISc, Bangalore) Maya Ramanath (IISc, Bangalore) Prasan Roy (Bell Labs, ITT Bombay)
3
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 XML Data Management Application is based on XML – XML documents to represent information – XML Schema to model information – XPath / XQuery to access and manipulate information XML Documents Xquery Engine XML queries XML results Find the title, year and box office proceeds for all 2001 movies for $v in document(“imdbdata”)/imdb/show where $v/year=2001 return($v/title, $v/year, $v/box_office) RDBMS
4
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Why Storing XML in a RDBMS? For some applications it makes sense to use a relational database backend: – Leverage many years of development of relational technology – concurrency control/transaction support – Scalability – Safety (crash recovery, duplication) – Integrate with existing data stored in an RDBMS – Performance matters!! But storing and querying XML data in an RDBMS is a non-trivial task
5
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 XML and Relational Databases Mismatch between the relational model and XML. How to store XML data into relational tables? – Mapping (“Shredding”) XML data into flat and regular tables How to evaluate XML queries over relational tables? – Mapping XQuery into SQL (or SQL-XML?)... Litterature filled with various mapping proposals: – Many variations over binary tables (Florescu et al, Grust et al). – Shanmugasundaram et. All try to inline as much nested elements as possible in the same table. There are many alternative mappings!
6
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Various mappings... TABLE Show (show_id INT, title STRING, year INT, box_office INT, seasons INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (I) Inline as many elements as possible TABLE Show (show_id INT, title STRING, year INT, box_office INT, seasons INT) TABLE NYTReview (review_id INT, review STRING, parent_Show INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (II)Partition reviews table-one for NYT,one for the rest TABLE Show1 (show1_id INT, title STRING, year INT, box_office INT) TABLE Show2 (show2_id INT, title STRING, year INT, seasons INT) TABLE Review (review_id INT, tilde STRING, review STRING, parent_Show INT) (III)Split Show table into TV and Movies Define group Show { element show { element title type string, element year type integer?, element review { element * type string* }, (element box type string | element seasons type string) }
7
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003...have various performances No given mapping is best Q1: Simple selection query Q2: Join involving a fragment of the reviews Q3: publishing query
8
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 The LegoDB Storage "Shredding" Engine An optimization approach: – automatically explores a space of possible mappings – selects the mapping which has the lowest cost for a given application Important features: – Application-driven: takes into account schema, data statistics and query workload – Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics) – Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations
9
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Mapping an XML-Schema into Relations Define group Show { element show { element title type string, element year type integer, group Reviews*,... } define group Reviews { element review type string } XML Schema TABLE Show_Table( Show_id INT, title STRING, year INT ) TABLE Review_Table ( Review_id INT, review STRING, parent_Show INT ) Relational schema
10
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 A word about mapping queries XQuery for $x in //show/review where contains($x/review, “Potter”) SQL select review from Show_Table, Review_Table where Parent_show = Show_id // join here! and review contains “Potter”
11
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Transforming Schemas Key idea: A given document can be validated by different XML Schemas: – Different but equivalent regular expressions can be used to define an element – The presence or absence of a type name does not change the semantics of an XML Schema Applying transformations that manipulate the types (but preserve the element structure of schema) leads to a space of distinct relational configurations Define XML Schema transformations that – Exploit the structure of the schema, and – lead to useful relational configurations
12
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Adding a group corresponds to splitting a table define group Show { element show { element title type string, element year type integer, element review type string*,... Original schema Define group Show { element show { element title type string, element year type integer, group Reviews*,... } define group Reviews { element review type string } Transformed schema TABLE Show( Show_id INT, title STRING, year INT ) TABLE Review (Review_id INT, review STRING, parent_Show INT ) Transformed Relational schema TABLE Show( Show_id INT, title STRING, year INT, review STRING) Relational schema
13
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Regular expression rewritings (group A|group B),group C ==> group D|group E horizontal table partition define group D=(group A,group C) define group E=(group B,group C) group A+ ==> put first item in a table, put the group A, group A* rest in a different table element * type string extracts certain element ==> in separate table element a type string | element (^a) type string...........
14
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Storage Design Physical Schema Transformation Query/Schema/ Stats Translation Traditional Relational Query Optimizer XML data statistics XML Schema Relational schema, stats and workload Good configuration Physical schema XQuery workload cost estimate Physical Schema Generation Physical schema
15
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Searching for a good configuration Cost is key: use a relational optimizer as a black box – Support different cost-models – Quality of selected configuration depends on the accuracy of the optimizer! Set of possible configurations that result from applying the rewritings is very large- possibly infinite! How to search for the optimal solution? – LegoDB use a greedy search Importance of statistics for cost evaluation – Large collections vs. small collections (e.g., many new york time review or not?) – Selectivity of predicates – XML structure distribution (distribution of children / parent relation)
16
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Runtime Support good configuration and mapping specification XQuery workload DB Loader XML document Query Translation mapping specification XML result Commercial RDBMS tuplesSQL query/results
17
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 Conclusion Cost-based approach for generating relational storage for XML – takes application characteristics into account » schema + data 'statistics' + queries Performance shows that storage significant performance improvements The same is likely to be true for other indexing techniques: – Full text index on XML (best for text queries?) – Native XML indexes (best for Xpath queries without schema?) – Files!!! (Best when a lot of small documents an no need for concurrency control) There is no one best way to store XML... We should try to hide that from the user!
18
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 A few other things I've done... Mapping XQuery to SQL/XML – Work with Jonathan Robie – SQL/XML is an extension of SQL to XML » Standard mapping from table to XML document » Standard mapping from relational schema to XML Schema » Extension of SQL query language to builds XML elements – Goal is to evaluation XQueries on top of an ODBC driver supporting XQuery – Approach: identify a fragment of XQuery which has a direct syntactic mapping into SQL / XML – Surprise: the syntactic approach worked really well, because of the way XQuery is designed. (i.e., FLWOR close to SQL statement). Galax : XQuery 1.0 implementation
19
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 My take on the Data Binding problem The main problem will be about the mismatch between type systems: SQL vs. XML Schema vs. Object-Oriented I've never seen a good proposal that brings any two type systems together (then builds a language one top of it) Where I see the best chance of success: – Use XML has the data model (can represent pretty much anything). – XML Schema can represent relational schema (see SQL/XML standard) – Can it represent Object-Oriented?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.