Download presentation
Presentation is loading. Please wait.
Published byDale Walsh Modified over 9 years ago
1
Sam Idicula, Oracle XML DB Development Team Binary XML Storage and Query Processing in Oracle VLDB 2009
2
Outline Motivation Binary XML Overview Storage Format Details Query Processing Performance Evaluation Conclusion
3
Previous Oracle XML Storage Models CLOB Storage Text representation preserves exact form of original document (including white spaces) Very good performance for insert & full retrieval Size bloat (including tags, string representation of dates, numbers etc) Need to parse the document for all XML processing Query & DML processing are not efficient Memory overhead with DOM Mid-tier does not take advantage of parsing and validation already done on DB tier (and vice-versa)
4
Previous Oracle XML Storage Models Object Relational Storage (OR) XML Schema-based mapping to object-relational tables Preserves DOM fidelity (more than traditional shredding) Simple XPaths translate to table/column access Very good query performance for highly structured use cases Flexibility is limited due to schema dependency Insert, full retrieval etc are poor (expand on this; separate these into 2 slides)
5
Motivation/Goals for Binary XML Bridge the gap between two extremes Structure-unaware text representation: Full flexibility, poor query performance Object-relational mapping: Heavily dependent on rigid structure Several customer use cases fall in between these extremes Native format that can: Handle full spectrum of XML database use cases Optimized semi-structured use cases Provide good performance for a wide variety of operations Retain flexibility advantage of XML data model while providing good performance
6
Structured Unstructured High Flexibility Low Flexibility Customer use cases Majority of semi-structured customer use cases
7
Motivation/Goals for Binary XML XML Schema usage Need to be efficient for query processing on schemaless & loosely structured schemas Ability to use schema constraints for more efficient processing Provide good performance for a wide range of operations Query DML: Insert/Load, Partial (piecewise) update Full-document & fragment retrieval Schema Validation & Evolution Mid-tier integration
8
Oracle Binary XML Overview Compact Schema-aware XML Format Pre-parsed tokenized binary representation Addresses space-bloat associated XML 1.x serialization Intended for use in all tiers of Oracle stack Oracle XML DB Oracle iAS / XDK Java Exploits XML Schema information if available Also supports non-schema-based encoding Preserves Infoset or Data Model fidelity – Not bytes Can create an XML Index for query optimization
9
Oracle Binary XML Mid-tier Processing: Oracle XDK Java support Binary XML allows direct access to fragments/sub-trees XML processing optimization: Scalable mid-tier DOM App Server Web Cache Database Client Binary XML Oracle Binary XML
10
Format Details Opcodes roughly corresponding to SAX events Each opcode has fixed number of operands Document-ordered serialization of opcodes Stored as a BLOB Tag names are tokenized into qname IDs Central repository (or) Inlined definitions Optimized opcodes for simple elements, repeating elements etc. Uses native data-types in the presence of XML schema
11
Streaming Capabilities Streaming XPath evaluation XPathTable with NFA: Multiple XPaths evaluated in a single pass Forward axes Streaming partial updates Most common update scenarios handled in streaming manner eg: updateXML( ‘/purchaseOrder/Reference/text()’, ‘XXXX’) Can be directly applied on disk avoiding expensive DOM construction Takes advantage of the Oracle SecureFile LOB storage to perform delta update
12
Query Processing Architecture XQuery XMLIndex Binary XML Path-based XMLIndex SQL/XML Table-based XMLIndex DB XQuery Rewrite Functional Evaluation (Streaming XPath)
13
Document-level Summary Long-term goal: Efficient tree-oriented navigation Important for query execution Pure streaming is too costly over large documents Current Implementation Start & end offsets for large subtrees Threshold for “large” can be adjusted Used for skipping to end of subtree Working on significant enhancements Handling all axes
14
Search-based Decoder Goal: Search for a simple XPath or XPath location step in a Binary XML stream Main search params are (axis, qname ID) Supports wild cards OR of multiple qnameIDs allowed Return only when there’s a result or search is done Skip irrelevant subtrees Using summary if possible Schema-aware search Can search for kidnum or child-position instead of qname ID Can terminate search earlier based on schema
15
Schema-aware NFA Goal: Evaluate multiple XPaths in single pass over document Uses Y-Filter-like approach to build NFA Works in conjunction with search-based decoder Translates transitions to searches when possible Push unbranched linear state transition paths into search- based decoder Uses XML schema when available Use of kidnum instead of qname ID Sequence & Occurrence constraints Derives a “strict sequential” constraint
16
Performance: Query - XMark Ratio of elapsed time geometric mean for 100M XMark doc SB – Schema- based NSB – Non- schema-based CLOB is 144x No indexes
17
Performance: Insert Ratio of elapsed time for XMark 10M doc SB – Schema-based
18
Performance: Full Retrieval Ratio of elapsed time for XMark 10M doc SB – Schema-based
19
Performance: Compression D1 – Structured D2 – Semi-structured D3 – Document-centric Based on actual customer datasets; mix of XML document sizes Further compression possible via SecureFile LOB compression
20
Summary Binary XML Native XML storage format Handle full spectrum of XML use cases Schema-aware Query Processing Optimizations Search-based Decoder Document-level Summary Performance Results
21
For more information Contact: Sam.Idicula@Oracle.com Downloads, technical documentation: http://www.oracle.com/technology/tech/xml/xmldb/index.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.