Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April 15, 2003
Lecture XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued in my next lecture…
Lecture XML vs. SQL for Sensor Databases IrisNet represents data in XML (semi-structured model) Hierarchical documents, Queries in XPATH TinyDB represents data in the relational model Tables, Queries in SQL What are the pros and cons for each approach? How does it depend on the sensing context?
Lecture Why IrisNet Uses XML Rich, heterogeneous data Hard to capture in a rigid data model Self-describing tags useful Schema evolution XML supports on-the-fly schema changes Wide area sensing => Hierarchical organization Good match for XML, bad for relational Standard data exchange format
Lecture Disadvantages of XML Query languages are lacking Some minimal features: e.g., aggregates, updates Query processors not available for XQuery Query processing is SLOW Key research question: Can we store XML in a relational DB, and use a relational database system to process queries?
Lecture Why use Relational DB Systems? Highly reliable, scalable, optimized for performance, advanced functionality Result of 30+ years of Research & Development XML database systems are not “industrial strength” … and not expected to be in the foreseeable future Existing data and applications XML applications have to inter-operate with existing relational data and applications Not enough incentive to move all existing business applications to XML database systems Lessons from object-oriented database systems? Adapted from slides ©Jayavel Shanmugasundaram
Lecture XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued…
Lecture Storing and Querying XML Documents Relational Database System XML Translation Layer XML Schema Relational Schema Translation Information XML Documents Tuples XML Query SQL Query Relational Result XML Result Adapted from slides ©Jayavel Shanmugasundaram
Lecture Relational Data PurchaseOrder IdCustomer 200I YearMonth Cars R Us10June I Day Bikes R UsnullJuly1999 Payment Installment 40% Percentage Pid 300I 100% 200I 60% I Item Name 200I Cost Firestone Tire I Quantity Goodyear Tire Pid 300I Trek Tire20 300ISchwinn Tire Adapted from slides ©Jayavel Shanmugasundaram
Lecture SQL Query Find all the items bought by “Cars R Us” in the year 1999 Select it.name From PurchaseOrder po, Item it Where po.customer = “Cars R Us” and po.year = 1999 and po.id = it.pid Predicates Join PurchaseOrder Id Customer 200I YearMonth Cars R Us 10 June I Day Bikes R Usnull July 1999 Payment Installment 40% Percentage Pid 300I 100% 200I 60% I Item Name 200I Cost Firestone Tire I Quantity Goodyear Tire Pid 300I Trek Tire I Schwinn Tire Adapted from slides ©Jayavel Shanmugasundaram
Lecture XML Document 10 June % 60% Nested structure Self-describing tags Nested sets Order Adapted from slides ©Jayavel Shanmugasundaram
Lecture XML Schema Date (Item)* (Payment)* PurchaseOrder Date Day? Month Year Day {integer} Month {string} Year {integer} Item Quantity … and so on Adapted from slides ©Jayavel Shanmugasundaram
Lecture Schemas to Relations: Issues Complex schema specifications Two level nature of relational schema (tuples and attributes) vs. arbitrary nesting of XML Schema Recursion Adapted from slides ©Jayavel Shanmugasundaram
Lecture Naïve Approach PurchaseOrder Id (200I) Customer (Cars R Us) Date Day (10) Month (June) Year (1999) Item Payment (40%) … Element NodeAttribute Node Adapted from slides ©Jayavel Shanmugasundaram
Lecture Naïve Approach (Contd.) Problem: Many joins for queries (one per hop) eg. PurchaseOrder/Date/Year Edges Id Name 0 ParentIdType PurchaseOrdernullElement null 1 ValueOrdinal null AttributeId200I00 2 AttributeCustomerCars R Us10 3 ElementDatenull20 4 ElementDay ElementMonthJune13 6 ElementYear ……………… Adapted from slides ©Jayavel Shanmugasundaram
Lecture Desired Properties of Generated Relational Schema R All XML documents conforming to XML schema should be “mappable” to tuples in R All queries over XML documents should be “mappable” to SQL queries over R Not Required: Ability to re-generate XML schema from R Adapted from slides ©Jayavel Shanmugasundaram
Lecture XML Schema: Further Examples Date? (Item | Payment)* PurchaseOrder (Date | Payment*) (Item (Item Item)* Payment)* PurchaseOrder Date Item (PurchaseOrder)* Payment PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram
Lecture Simplifying XML Schemas XML schemas can be “simplified” for translation purposes Without undermining storage and query functionality Date? (Item)* (Payment)* PurchaseOrder (Date | (Payment)*) (Item (Item Item)* Payment)* PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram
Lecture Simplification Desiderata Simplify structure, but preserve differences that matter in relational model Single occurrence (attribute) Zero or one occurrences (nullable attribute) Zero or more occurrences (relation) (Date | (Payment)*) (Item (Item Item)* Payment)* PurchaseOrder Date? (Item)* (Payment)* PurchaseOrder Adapted from slides ©Jayavel Shanmugasundaram
Lecture Simplification Rules Flattening transformations (e1 e2)* -> e1* e2* (e1 e2)? -> e1? e2? (e1 | e2) -> e1? e2? Simplification transformations e** -> e* e*? -> e* e?* -> e* e?? -> e? Grouping transformations e1* e2* e1* -> e1* e2* …etc e+ -> e* What is lost? Adapted from slides ©Jayavel Shanmugasundaram
Lecture Result: Translation Normal Form An XML schema production is either of the form: … or of the form: {type} P a 1 … a p a p+1 ? … a q ? a q+1 *… a r * P where a i a j Adapted from slides ©Jayavel Shanmugasundaram
Lecture Simplified XML Schema Date (Item)* (Payment)* PurchaseOrder Date Day? Month Year Day {integer} Month {string} Year {integer} Item Quantity … and so on Adapted from slides ©Jayavel Shanmugasundaram
Lecture Relational Schema Generation PurchaseOrder (id, customer) Date DayMonthYear Item (name, cost) Quantity Payment 1 ?11 ** 1 Minimize: Number of joins for simple path expressions (of form /a/b/c) Satisfy: Tables are normalized Adapted from slides ©Jayavel Shanmugasundaram
Lecture Generated Relational Schema and Shredded XML Document PurchaseOrder IdCustomer 200I YearMonth Cars R Us10June1999 Day Payment Order 40% Value Pid 200I 60% I Item Order Name 200I Cost Firestone Tire I Quantity Goodyear Tire Pid 1 3 Adapted from slides ©Jayavel Shanmugasundaram
Lecture Example Schema Graph Not just a tree Adapted from slides ©Jayavel Shanmugasundaram
Lecture Thus far, works well for trees only Intuition: Inline as many sub-elements as possible Do not inline only if it is a shared, recursive or set sub- element. Technique: Necessary and Sufficient Condition for shared/ recursive element: In-degree >= 2 in (simplified) schema graph Shared Inlining Technique Adapted from slides ©Jayavel Shanmugasundaram
Lecture Relational Schema Generation and XML Document Shredding Any XML Schema X can be mapped to a relational schema R, and … Any XML document XD conforming to X can be converted to tuples in R Further, XD can be recovered from the tuples in R What do you think of the approach, for IrisNet? Exercise: What would the Parking Space Finder relational schema look like? Would there be many or few joins in queries? Adapted from slides ©Jayavel Shanmugasundaram
Lecture Path Expression with Length 3 Adapted from slides ©Jayavel Shanmugasundaram
Lecture Varying Path Expression Length Group 1 DTDGroup 3 DTD Adapted from slides ©Jayavel Shanmugasundaram
Lecture Storing and Querying XML Documents Relational Database System XML Translation Layer XML Schema Relational Schema Translation Information XML Documents Tuples XML Query SQL Query Relational Result XML Result Adapted from slides ©Jayavel Shanmugasundaram
Lecture XPERANTO XML view over tables to reconstruct shredded XML documents Query Processor for XML views of Relational Data XML Document Shredder Relational Schema Generator Relational Schema Information Create XML Document Repository Store XML Documents Query over Stored XML Documents Create tablesStore rows in tables Query over tables Relational Database System Table 1 Table n
Lecture XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued…
Lecture LegoDB [Bohannon et al, ICDE’02] An optimization approach: automatically explores a space of possible mappings selects the mapping which has the lowest cost for a given application Important features: Application-driven: takes into account schema, data statistics, and query workload Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics) Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations Adapted from slides ©Juliana Freire
Lecture But What If There’s No Schema? Revert to one row per edge STORED [Deutsch, Fernandez, Suciu, Sigmod’99] Looks at data, finds highly supported patterns for tables [Florescu, Kossman, Data Engineering Bulletin, 1999] Id Name 0 ParentIdType PurchaseOrdernullElement null 1 ValueOrdinal null AttributeId200I00 2 AttributeCustomerCars R Us10 3 ElementDatenull20 4 ElementDay ElementMonthJune13 6 ElementYear ……………… Adapted from slides ©Jayavel Shanmugasundaram
Lecture XML Query Processing: Outline XML vs. Relational XML on Relational DB: Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers LegoDB, STORED, Edge (2 slides) To be continued… (Thurs) Updates, Native XML DBMS Also in next lecture: Historical queries