10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass.

10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 2 Overview  Introduction –Motivation –Main Issues  Structure-Oriented Approach –Storage / Data Model –Queries –Evaluation  Opaque Approach –Storage –Queries (vs. XPath) –Evaluation  Prototype Implementation –Interface –Experience  Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 3 Motivation  XML is used in: –data publishing (document-centric documents) –data exchange (data-centric documents)  Why XML Documents without Schema? –generated by programs –mostly data-centric documents, e.g., account statements –high update frequency of the document structure  evolving schemas  Problems –How to deal with XML documents without DTD / XML Schema in databases? –Evaluate approaches –Use relational database systems Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 4 Main Issues  Evaluate Storage and Retrieval Methods –no defined document schema –platform: Oracle 8i  XML-to-Relational Mapping Approaches –structure-oriented decomposition –opaque approach  Identify Parameters of a Unified XML-DB Interface  Implement a Testbed –qualitiative assessment –performance of both approaches Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 5 Structure-Oriented Approach  Characteristics –decomposition of an XML document into smaller units (elements) –depends on the document structure only –target system: relational DBMs, generic schema  Variety of Mapping Methods –model XML document als directed ordered labeled graphs and map them to tables –proposed algorithms: edge tables [Florescu, Kossmann] universal table inlining techniques [Shanmugasundaram et.al.] model-based fragmentation Monet XML-model Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 6 Storing XML Data in Relations  XML-QL Data Model Peter Mary Fruitdale Ave. Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 7 Data Model tblDocs DocIdurl tblEdge SourceIdTargetIdLeafIdAttrIdDocIdEdgeNameTypeDepth tblLeafs LeafIdValue tblAttrs AttrIdValue 1 n 1 0/1 1 Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook tblDocs DocIdurl tblEdge SourceIdTargetIdLeafIdAttrIdDocIdEdgeNameTypeDepth tblLeafs LeafIdValue tblAttrs AttrIdValue 1 n 1 0/1 1

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 8 Import Algorithm Peter Main Road 4 04236 Leipzig Source Id Target Id Leaf Id Attr Id Doc Id EdgeNameTypeDepth 01NULL 1treeref3 12NULL 1personref2 23NULL11ageattr0 241NULL1nameleaf0 25NULL 1addressref1 562NULL1streetleaf0 573NULL1zipleaf0 584NULL1cityleaf0 DocIdurl 1Sample.xml LeafIdValue 1Peter 2Main Road 4 304236 4Leipzig AttrIdValue 136 Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 9 Query Processing  Query Language –XML query language most appropriate –data model of our solution is based on XML-QL  XML-QL preferred choice  XML-Relational Mismatch –relational DBMS “understands“ SQL only  requires translation from XML-QL to SQL –generate result document from the tuples retrieved Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 10 Query Processing Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook XML-QL Query Parser Generate SQL Statement Object Structure SQL Statement Row Set Execute SQL Statement Construct Result Document XML Document DB

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 11 Generate an SQL-Statement  XML-QL Query CONSTRUCT { WHERE $n $a CONSTRUCT $n $a }  SQL Statement SELECT DISTINCT B.Type AS n_Type, B.TargetId AS n_TargetId, B.Depth AS n_Depth, C.Value AS n_Value, D.Type AS a_Type, D.TargetId AS a_TargetId, D.Depth AS a_Depth, E.Value AS a_Value FROM tblEdge A,tblEdge B,tblLeafs C, tblEdge D,tblLeafs E WHERE (A.EdgeName = ‘person’) AND (A.TargetId = B.SourceId) AND (B.EdgeName = ‘name’) AND (B.LeafId = C.LeafId(+)) AND (A.TargetId = D.SourceId) AND (D.EdgeName = ‘address’) AND (D.LeafId = E.LeafId(+)) Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 12 Construct Result Document  Result Tuple  Subtree Reconstruction SELECT A.EdgeName, A.Type, Al.Value AS A_LeafVal, Aa.Value AS A_AttrVal FROM tblEdge A, tblLeafs Al, tblAttrs Aa WHERE A.SourceId=5 AND A.leafId=Al.leafId(+) AND A.attrId=Aa.attrId(+) n_Typen_Tar getId n_Depthn_Valuea_Typea_Tar getId a_Deptha_Value leaf40Peterref51 EdgeNameTypeA_LeafValA_Attr Val streetleafMain Road 4 zipleaf04236 cityleafLeipzig Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 13 Query Result  XML Result Document Peter Main Road 4 04236 Leipzig Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 14 Advantages  Vendor Independence –no specific DBMS features needed  Stability  High Flexibility of Queries –retrieve and update single values –full SQL functionality can be used  Well-Suited for Structure-Oriented Queries –structures are represented in tables Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 15 Drawbacks  Information Loss –Comments –Processing Instructions –Prolog –CDATA Sections –Entities  Restrictions –only one text (content) per element Text1 Text2  lost –element text as VARCHAR(n); n <= 4000  Increased Load Time –sample document: 3.3. MB, 130.000 tuples, 13 minutes Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 16 Opaque Approach  Characteristics –XML document stored as Large Object (LOB) –document completely preserved  Storage Insert into tblXMLClob values (1,‘person.xml‘,‘ Mary ‘ ); DocIdurlcontent 1person.xml Mary Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 17 Oracle interMedia Text  Query Facilities of interMedia Text –full-text retrieval (word matching only) –path expression only together with content search –no range queries  Example in interMedia Text: SELECT DocId FROM tblXMLClob WHERE CONTAINS(content,‘(Mary WITHIN name) WITHIN person‘)>0  XML Full-Text Index –Autosectioner Index –XML Sectioner Index –WITHIN operator text_subquery WITHIN elementname searches the entire text content of the named tag Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 18 Comparison of Queries  return document IDs  word matching (default)  no existence test for elements or attributes  restricted set of path expressions using WITHIN e.g.: (xml WITHIN title) WITHIN book  provides limited attribute value searches, no nesting of attribute searches  numeric and data values not type-converted  no range searches on attribute values  return document fragments  substring matching  search for existing elements or attributes  path expressions  structure-oriented queries //Book/Title/[contains(..‘xml‘)]  searches for attribute values and element text can be combined  considers also decimal values  range searches possible using filters interMedia Text XPath Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook 

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 19 XPath Queries with PL/SQL  Prerequisite –XDK for PL/SQL installed on the server  Parse CLOB into DOM representation XPath Query Search Document IDs of all CLOBs of the XML Table Execute XPath Query on the DOM tree for each CLOB Objects of a Document ID DocIDs with Result XML Documents server-side DB Doc IDs Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 20 Advantages  Information Preservation  Handling of Large Documents –appropriate for document-centric documents with little structure and prose-rich elements  Different XML Document APIs –interMedia Text: restricted set of XPath functionality –generate a DOM of the document before using XPath queries Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 21 Drawbacks  Restricted Expressiveness of Text Queries  Performance vs. Accuracy of Query Results –interMedia Text queries on CLOBs faster than the DOM-API –sample document: 12.5 MB, parse time 3 min, load time 5 min  Restrictions of Indexes –maximum tag names for indexing (incl. namespace) 64 bytes  Problems with Markup –character entities  Vendor Dependence –text engines are proprietary, e.g., Oracle interMedia  Stability –maximum document size 50 MB –memory errors may occur with smaller documents Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 22 User Interface Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 23 XML Database Interface Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook Client XML Document XML Query Doc List DocID / Doc Name

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 24 Comparison  Import –loss of information –time-consuming –produces lots of tuples  Queries –XML-QL fast new document as result  Import –no loss of information –faster than structure- oriented decomposition  Queries –interMedia Text fast only document IDs as result –XPath high response time flexible granularity of results Structure-Oriented Approach Opaque Approach Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 25 Implementation Experience  Import Problems –structure-oriented decomposition SAX parser produces FillBuf Error for XML documents > 3 MB limitations of VARCHAR columns (max. 4000 bytes) –opaque Approach OutOfMemory error during import of XML documents > 4MB –reason: to little heap size in Java –use start option Xmx  Queries –Opaque Approach OutOfMemory error when parsing CLOB into DB –increase java_pool_size (100-150 MB) –increase shared_pool_size (150-200 MB)  Export and Delete without Problems Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

T. KudrassOOPSLA Workshop “Objects, and Databases“, 2001 26 Outlook  Experience –problems with larger documents in both approaches –no universal solution for all requirements  Co-Existence of Multiple Storage Approaches –integrate different storage engines –combine structure-oriented decomposition and opaque approach –need for a generic XML data type  New Data Model for Structure-Oriented Approach –reduce loss of information  XML Database Interface / Middleware –combines different approaches –parameterize the XML database interface Introduction Structure-Oriented Approach Opaque Approach Prototype Implementation Outlook

10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass.

Similar presentations

Presentation on theme: "10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass.

Similar presentations

Presentation on theme: "10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass."— Presentation transcript:

Similar presentations

About project

Feedback