Download presentation
Presentation is loading. Please wait.
1
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001
2
Outline Introduction An XML Primer Several related systems and results An XML middle-ware system: SilkRoute XML and Spreadsheets XML and OLAP
3
Introduction XML -- Basic ideas are simple but potential impact is significant Easy to read Simple and flexible Easy to extract useful information Research opportunities XML brings to Database Management XML will “turn the Web into a database” Thus general Database Management issues arise for XML
4
Introduction Why using XML to view Relational Data XML is emerging as the standard data-exchange format between applications on the Web While most existing data is stored in relational databases This scenario is common This scenario is challenging Relational data is flat, normalized, its schema is often proprietary XML data is nested, unnormalized, its schema is public So, the mapping is inherently complex and maybe difficult to compute efficiently
5
An XML(eXtensible Markup Language) Primer Example XML file (From Apache Tomcat 4.0 configuration file): 1. 2. 3. 4. 5. 6. 7. 8. 9. mail.smtp.host 10. localhost 11. 12. 13. 14. 15. 16. 17.
6
An XML Primer Example DTD(Document Type Definitions) 1. 2. 3. 4. 5. 6. <!ATTLIST Connector className #REQUIRED 7. port #REQUIRED> 8. 9. 10. 11.
7
An XML Primer XML is a method for putting structured data in a text file XML looks a bit like HTML but isn't HTML XML is a family of technologies http://www.w3c.org XML is new, but not that new XML is license-free, platform-independent and well- supported
8
Several Related Systems and Results An XML middle-ware system in AT&T research labs: SilkRoute Automate the conversion of realtional data into XML A new paper published to optimize the query processing algorithm IBM research center: Efficiently Publishing Relational Data as XML Documents Language specification is based on SQL with minor extension So standard APIs like ODBC can be used Query performance is worse than the revised SilkRoute
9
Several Related Systems and Results UCSD: MIX--Mediation of Information using XML DTD inference Concentrate on Information Integration A more complicated architecture University of Wisconsin-Madison: Relational Databases for Querying XML Documents Objective is different, but part of techniques is related Limitations and opportunities: some valuable points
10
SilkRoute Introduction Public DTDs: Numerious industries are working on it http://www.oasis-open.org/cover Construct XML views which conform to the public DTDs from vast stores of relational data automatically The system is general, dynamic, and efficient
11
Motivating Example A simple example from electronic commerce: Suppliers provide product information to resellers For mutual benefit, they have agreed on a particular DTD Supplier's business data is organized according to a relational schema Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers Assume supplier wants to export a subset of its inventory (e.g. only its winter-outerwear stock) Resellers: access that data by formulating queries over the XML view Reseller is typically only interested in a small subset of the info (e.g. sale price less than half of the retail price) Relational schemas differ from supplier to supplier
12
Architecture SilkRoute’s Architecture
13
The View Query: RXL Full power on both sides Joins, selection conditions, aggregates, and nested queries Generate XML data with arbitrary levels of nesting RXL has three powerful features which make it possible to create arbitrary complex XML structures Nested queries, Skolem functions, and Block structure
14
1. construct 2. 3. ”Acme Clothing” 4. { 5. from Clothing $c 6. where $c.category = “outerwear” 7. construct 8. 9. $c.item 10. $c.category 11. $c.price 12. {from SalePrice $s 13. where $s.pid = $c.pid 14. construct 15. $s.price 16. } 17. {from Problems $p 18. where $p.pid = $c.pid 19. construct 20. 21. $p.comments 22. 23. } 24. 25. } 26. RXL (Relational to XML transformation Language) view query ( V )
15
The User Query: XML-QL 1. Construct 2. 3. { where 4. $company 5. 6. $name 7. $retail 8. $sale 9. 10. in http://acme.com/products.xml,http://acme.com/products.xml 11. $sale < 0.5 * $retail 12. construct 13. 14. $company 15. $name 16. 17. } 18. XML-QL user query (U)
16
The Query Composer The composed RXL query is equivalent to the user query evaluated on the materialized view Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database
17
The Query Composer 1. Construct 2. from Clothing $c, SalePrice $s 3. { from Clothing $c, SalePrice $s 4. where $c.category = “outerwear”, 5. $c.pid = $s.pid, $s.price < 0.5 * $c.retail 6. $s.price < 0.5 * $c.retail 7. construct 8. 9. ”Acme Clothing” 10. $c.item 11. } 12. Composed RXL query ( C )
18
Composition Algorithm Problem Statement: * C = UοV * XD = V ( RDB ) * A = U ( XD ) = U ( V ( RDB ) ) * C ( RDB ) = A = U ( V ( RDB ) ) = ( UοV ) ( RDB )
19
Composition Algorithm Key Idea: Match U's pattern on V directly, without constructing XD First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted) Second step: Construct C C's construct clause is the same as U's construct clause, with variable substitutions C's from and where clauses consist of all the "relevant" from and where in V and all the where filters in U, with variable renaming
20
1. construct 2. 2. ”Acme Clothing” 3. ”Acme Clothing” 4. { 5. from Clothing $c 6. where $c.category = “outerwear” 7. construct 8. 9. $c.item 9. $c.item 10. $c.category $c.price 11. $c.price 12. {from SalePrice $s 13. where $s.pid = $c.pid 14. construct $s.price 15. $s.price 16. } 17. {from Problems $p 18. where $p.pid = $c.pid 19. construct 20. 21. $p.comments 22. 23. } 24. 25. } 26. RED RXL view query ( V ) with patterns from XML-QL query in RED
21
Composition Algorithm Diagram of Query Composition
22
Translator and XML Generator The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template Initial SilkRoute uses full partition strategy The IBM research paper: sorted, outer union strategy The 2001 SIGMOD paper gives an optimal algorithm XML generator merges the result tuples into XML document in a single pass
23
Other Scenarios Minor changes to the information flow permit other scenarios Export the entire database as one, large XML document by materializing the view query The result of query composition can be kept virtual for later composition with other user queries
24
Alternative Approaches Materialized XML view Precompute or compute on demand Feasible when the XML view is small and the applicaton needs to load the entire view in memory Data may become stale Use a native XML database engine Stanford DB group: Lore Project One can materialize an XML view using SilkRoute and store the result in an XML engine Avoid the cost of query composition Performance is unlikely to compete with SQL engine anytime soon Can't guarantee data freshness and incur a high space cost
25
XML and Spreadsheets XML support in Microsoft Excel 2002 for Office XP "..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment." Bi-direction transformation Excel can recognize and open XML documents including XSL processing XML flattening Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"
26
XML and Spreadsheets Well, Microsoft enable Spreadsheets to manipulate XML documents We can enable Spreadsheets to manipulate relational data using XML view!
27
XML and Spreadsheets Can we execute Spreadsheet style processing directly on XML files ? XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets Given system like SilkRoute Can we define Spreadsheet function map on XML view? Think the function executions as XML view queries Define one function on the other is similar as define a new XML view from an existing composed view It will also arise challenge in the view generation systems
28
XML and OLAP (OnLine Analytical Processing) Physically integrating unexpected data into OLAP systems is time-consuming Logical integration is the better choice XML’s increasing use in data-exchange suggests that the required data can be available through XML views Possibilities: Reference external XML data in OLAP queries XML data can be presented along with dimensional data in the result of an OLAP query Use XML data for selection and grouping Microsoft and Hyperion published "Open XML for Analysis Specification" in April 2001
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.