Presentation is loading. Please wait.

Presentation is loading. Please wait.

Universal Database Systems

Similar presentations


Presentation on theme: "Universal Database Systems"— Presentation transcript:

1 Universal Database Systems
Part 4: Databases and XML

2 Overview Querying and Storing XML Introduction to XML
DTDs and Schemas for XML Documents Languages for XML, in particular XSL Querying and Storing XML Summary and Outlook UDBS Part 4 -Winter 2001/2

3 What Else Makes XML Relevant to Databases?
Document storage, retrieval, and indexing Views and data warehouses Transformations between relational data representations and XML Mediators, data integration Schema aspects, integrity constraints, design, reverse engineering UDBS Part 4 -Winter 2001/2

4 Big Picture XML Data WEB (HTTP) Integrate Transform Warehouse
application application object-relational Integrate XML Data WEB (HTTP) Transform Warehouse Enterprise systems: 3-tier, strongly typed Distributed Object Technology (DCOM, CORBA) Web applications: multi-tier, loosely typed: Simplicity wins : exploit XML’s common illusion Access data from Multiple sources, On varied platforms, Across many enterprises Demand standards Portals demanding query support from providers application relational data legacy data UDBS Part 4 -Winter 2001/2

5 What Do We Look At? Storage of XML documents
Publication of XML document Sample vendor approaches UDBS Part 4 -Winter 2001/2

6 Storage vs Publication
Where and how to store native XML documents Here: Storage in relational DBs Publication Wrap arbitrary content in XML documents Here: Publication of relational content UDBS Part 4 -Winter 2001/2

7 Storing XML Data Scenario: Solutions:
receive a large XML data instance want to store, manage it, query it Solutions: build an XML management system from scratch (e.g., Tamino) preferably: use existing database systems The Storage Problem: map XML data into relational UDBS Part 4 -Winter 2001/2

8 Approaches Table-oriented: As a BLOB or CLOB
As a column value Across multiple tables As a BLOB or CLOB With appropriate (object) functionality, e.g., Extender Data Blade UDBS Part 4 -Winter 2001/2

9 Options XML document as a text file (or CLOB) Using ternary relations
Using a DTD or XML Schema for deriving a database schema Alternatively: derivation of a schema through mining from the data UDBS Part 4 -Winter 2001/2

10 Text File/CLOB Advantages Disadvantages simple
less space than you think reasonable clustering Disadvantages no updates needs specific query processor UDBS Part 4 -Winter 2001/2

11 Ternary Relation (edge-oriented)
Source Label Dest &o1 paper &o2 title &o3 author &o4 &o5 year &o6 Node Value The Calculus 1986 Ref Val &o1 paper &o2 year title author author &o3 &o4 &o5 &o6 "The Calculus" "…" "…" "1986" Order, attributes, comments? UDBS Part 4 -Winter 2001/2

12 Other Relational Representations
More detailed edge representation, e.g., (Source, Dest, Name, Type, Ordinal) with additional tables for all types Universal relation for storing edges (as full outer join of all ternary relations as above) Separate value tables per type Inline representation of the various types, e.g., (Source, Dest, Name, Int, String, Ordinal) with null values for the non-applicable types UDBS Part 4 -Winter 2001/2

13 Derive Schema from DTD (1)
<!ELEMENT bib (paper*)> <!ELEMENT paper (author*,title,year)> <!ELEMENT author (firstname, lastname)> Relational Schema: Paper(pid, title, year) Author(aid, fn, ln) PaperAuthor(pid, aid) Sometimes this is poor. E.g. 80% of papers have <= 2 authors 18% have 3 authors 2% have 4 or more… UDBS Part 4 -Winter 2001/2

14 Derive Schema from DTD (2)
ODMG classes: <!ELEMENT employee (name, address, project*)> <!ELEMENT address (street, city, state, zip)> class Employee public type tuple (name:string, address:Address, project:List(Project)) class Address public type tuple (street:string, …) UDBS Part 4 -Winter 2001/2

15 Storage vs Publication
Where and how to store native XML documents Publication Wrap arbitrary content in XML documents Here: Publication of relational content Current business data is relational data Scalability, reliability, performance UDBS Part 4 -Winter 2001/2

16 Publication Scenario XML Documents Relational Database publish
Web Server (XSL) transform HTML UDBS Part 4 -Winter 2001/2

17 Pragmatic ("Rowset") Approach
Tables represented as simple XML trees: table = root each row becomes a nested element each value becomes another nested element SNO SNAME SUPPLIERS PNO DESCRIP PARTS SNO PNO PRICE CATALOG <suppliers> <s_tuple> <sno> <sname> <parts> <p_tuple> <pno> <descrip> <catalog> <c_tuple> <sno> <pno> <price> UDBS Part 4 -Winter 2001/2

18 Pragmatic ("Rowset") Approach (2)
No "natural" XML No nesting, no hierarchies, no mapping for (foreign) keys May use XSLT to obtain "real" XML Better: Publish structured documents UDBS Part 4 -Winter 2001/2

19 Publish XML Documents Two issues Language for conversion
Following presentation based on Jayavel Shanmugasundaram et al: Efficiently Publishing Relational Data as XML Documents, VLDB 2000 Two issues Language for conversion Flat relational data to nested XML Implementation of conversion Efficient conversions UDBS Part 4 -Winter 2001/2

20 Sample Conversion Language
Relational data  SQL extension Natural extension  UDFs More specific Nesting through subqueries UDFs to construct XML elements/attributes from SQL data Aggregate functions to group children UDBS Part 4 -Winter 2001/2

21 Sample Relational Database
Project ProjId DeptId ProjName 888 10 Internet 795 Recycling Department DeptName Purchasing Employee EmpId EmpName 101 John 91 Mary Salary 50K 70K Task: Publish single XML document with information on departments (including their employees and projects) UDBS Part 4 -Winter 2001/2

22 Publication Query – Structure
Select DEPT(d.name, <subquery to produce emplist>, <subquery to produce projlist> ) From Department d UDBS Part 4 -Winter 2001/2

23 XML Constructor Create Function DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( <department name={dname}> <emplist> {emplist} </emplist> <projlist> {projlist} </projlist> </department> ) UDBS Part 4 -Winter 2001/2

24 Publication Query Select DEPT(d.name, <subquery to produce emplist>, <subquery to produce projlist> ) From Department d Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno)) From Department d UDBS Part 4 -Winter 2001/2

25 Query Result <department name="Purchasing"> <emplist>
<employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist> </department> UDBS Part 4 -Winter 2001/2

26 Implementation of Conversion
Two main differences: Nesting (structuring) Tagging Space of alternatives: Early Tagging Late Tagging Outside Engine Outside Engine Early Structuring Inside Engine Inside Engine Outside Engine Late Structuring Not applicable Inside Engine UDBS Part 4 -Winter 2001/2

27 Options Late vs early tagging Late vs early structuring
Late tagging: final step of query processing Early tagging: otherwise Late vs early structuring Late structuring: final step of query processing Early structuring: otherwise Inside vs outside engine Inside: completely inside db engine Outside: otherwise (ignored in following) UDBS Part 4 -Winter 2001/2

28 Early Tagging, Early Structuring, Outside Engine: Stored Procedure Approach
Issue queries for sub-structures and tag them Could be a Stored Procedure (10, Purchasing) DBMS Engine Department (Internet) (Recycling) (John) (Mary) Employee Project Problem: Too many SQL queries! UDBS Part 4 -Winter 2001/2

29 Query seen above (with UDFs to create XML)
Early Tagging, Early Structuring, Inside Engine: Correlated CLOB Approach Query seen above (with UDFs to create XML) Problem: Correlated execution of sub-queries UDBS Part 4 -Winter 2001/2

30 Compute employee lists associated with all departments
Early Tagging, Early Structuring, Inside Engine: De-Correlated CLOB Approach Compute employee lists associated with all departments Compute project lists associated with all departments Join results above on department id Problem: CLOBs during query processing UDBS Part 4 -Winter 2001/2

31 Late Tagging, Late Structuring: Redundant Relation Approach
How do we represent nested content as relations? (10, Purchasing) (10, Internet) (10, Recycling) (10, John) (10, Mary) (Purchasing, John, Internet) (Purchasing, John, Recycling) (Purchasing, Mary, Internet) (Purchasing, Mary, Recycling) Problem: Large relation due to data redundancy! UDBS Part 4 -Winter 2001/2

32 Late Tagging, Late Structuring: Outer Union Approach
How do we represent nested content as relations? Union (Purchasing, null, Internet , 0) (Purchasing, null, Recycling, 0) (Purchasing, John, null , 1) (Purchasing, Mary, null , 1) Department Employee Project Department Employee Project (Purchasing, John) (Purchasing, Mary) (Purchasing, Internet) (Purchasing, Recycling) (10, Purchasing) Problem: Wide tuples (having many columns) UDBS Part 4 -Winter 2001/2

33 Late Tagging, Late Structuring: Hash-based Tagger
Results not structured early In arbitrary order Tagger has to enforce order during tagging Hash-based approach Inside/Outside engine tagger Problem: Requires memory for entire document UDBS Part 4 -Winter 2001/2

34 Late Tagging, Early Structuring: Sorted Outer Union Approach
A B n D n n n A B n n E n n B C A n C n n F n D E F G A n C n n n G Sort By: Aid, Bid, Cid Problem: Only partial ordering required UDBS Part 4 -Winter 2001/2

35 Late Tagging, Early Structuring: Constant Space Tagger
Detects changes in XML document hierarchy Adds appropriate opening/closing tags Inside/outside engine UDBS Part 4 -Winter 2001/2

36 Performance of Alternatives
Constructing XML inside engine more efficient than outside When processing can be done in main memory: Late tagging, late structuring with outer union Otherwise: Late tagging, early structuring with sorted outer union UDBS Part 4 -Winter 2001/2

37 Sample DB Vendors and XML
Oracle 9i IBM DB2 UDB V7 UDBS Part 4 -Winter 2001/2

38 Oracle – CLOB Datatype XMLType (internally a CLOB)
Predefined functions createXML: creates XMLType-instance from string (if well-formed) extract: applies XPath expression to XMLType-instance and returns XMLType existsNode: checks whether XMLType-instance has non-empty result for given XPath expression UDBS Part 4 -Winter 2001/2

39 Oracle – Generate XML Functions
SYS_XMLGEN: takes single argument and converts it to an element SYS_XMLAGG: concatenates XML fragments Utility XSU (XML SQL Utility): Implements "Rowset" approach UDBS Part 4 -Winter 2001/2

40 DB2 – CLOB Three datatypes Checks against DTD possible
XMLCLOB (outside table) XMLVARCHAR (inside table) XMLFILE (external file) Checks against DTD possible PAGE tables? Via DADs UDBS Part 4 -Winter 2001/2

41 Basic Approach XML document is stored
completely (in a column of type XML, or an "XML column") or as file reference, or in multiple tables as result of a mapping (an as "XML collection") A Document Access Definition (DAD) specifies how XML documents are stored and published, how XML maps to tables and vice versa UDBS Part 4 -Winter 2001/2

42 XML Column UDTs: - XMLCLOB - XMLVARCHAR - XMLFILE UDFs:
DB2 XMLCLOB/XMLVARCHAR XML document <?xml?> <!DOCTYPE ...> <Order key = "1"> </Order> UDFs: Import/Storage - Retrieval - Extract - Update CLOB (Character Large Object) These data types are used to identify the storage type of XML documents in the application table You are not required to store XML documents inside DB2 UDBS Part 4 -Winter 2001/2

43 Legend XMLFile for external file names XMLVarchar for internal short documents XMLCLOB for internal long documents Extract Extracts XML element/attribute values from documents Converts values from XML documents into SQL data types Provides scalar as well as tabular UDFs UDBS Part 4 -Winter 2001/2

44 Example InvoiceNumber Order 355 . . . 356 ..<order>..<part>..<extendedPrice>1000</... 357 Select db2xml.extractDouble(Order, ‘/order/part/extendedPrice‘) from OrderTable where InvoiceNumber = 356 UDBS Part 4 -Winter 2001/2

45 XML Collection DAD Stored Procedures Composition - Decomposition DB2
XML document <?xml?> <!DOCTYPE ...> <Order key = "1"> </Order> Stored Procedures Composition - Decomposition DAD DAD file specifies how elements and attributes are mapped to one or more relational tables Stored procedures with XML collections, e.g.: dxxGenXML(): uses a DAD file for an XML collection to compose XML documents dxxShredXML(): uses a DAD file for an XML collection to decompose XML documents Relational storage of XML data with complex elements in multiple tables: Many joins to answer queries Management of large number of tables  simple reconstruction of XML document may become quite expensive (bad performance) Collection UDBS Part 4 -Winter 2001/2

46 Sample DAD <DAD> <Xcollection>
<SQL_stmt> SELECT book_id, price_date, price_text FROM book_table ORDER BY price_date </SQL_stmt> <doctype> <root_node> </root_node> </doc_type> </Xcollection> </DAD> UDBS Part 4 -Winter 2001/2

47 Overview Querying and Storing XML Introduction to XML
DTDs and Schemas for XML Documents Languages for XML, in particular XSL Querying and Storing XML Summary and Outlook UDBS Part 4 -Winter 2001/2

48 Discussion Do XML documents have to be stored directly in the database? XML documents are highly redundant (from a database perspective) The efficiency of a relational system (partially) comes from normalization Compromise: XML as an "intermediary" between the database and, say, a Web server UDBS Part 4 -Winter 2001/2

49 Summary XML asks for query languages, database-style
Database vendors experiment with XML extensions architectures languages internal data models Many open issues, e.g., Graphical query languages Updates  ACM SIGMOD 2001 Views defined in the query language Referential integrity, triggers, rules Distributed XML storage systems UDBS Part 4 -Winter 2001/2

50 Outlook for DBMS XML is and important database topic (both for practitioneers and for theoreticians) Declarative querying SQL-style is attractive Will there be a renaissance of hierarchical DBMS? Workshop: WebDB, annually, German counterpart as GI- Arbeitskreis "Web und Datenbanken" Initiatives found on the Web: XML:DB and Xindice UDBS Part 4 -Winter 2001/2

51 xLx Competition - Results
2. Place, 248 Points Christian Birmes Victor Pankratius Tobias Rieke 1. Place, 250 Points Kai Honsel UDBS Part 4 -Winter 2001/2

52 UDBS Winter 2001/2 Thank You For Listening! UDBS Part 4 -Winter 2001/2


Download ppt "Universal Database Systems"

Similar presentations


Ads by Google