An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J. Watson Research Center, Marcus Fontoura, Vanja Josifovski IBM Almaden Research Center Published at ICDE 2003 Presented by Amir Bar-or, Technion
Overview Background Information –Evolution of query processing –XML processing Example Document Used Concepts –X-tree –X-dag XAOS –Algorithm Filtering Events –Building Matching-Structures –Emitting Output –Walk through Experimental results
The evolution of query processing Update modelQuery model Transactional Low to medium update rate Disk resident data Transactional Instant Accurate Static optimizations Index Classical Transactional Low to medium update rate Disk resident data Transactional/Non transactional Continuous Accurate Static optimizations Index Publish subscribe
The evolution of query processing Update modelQuery model Non - transactional High update rate Data is too big and cannot be stored efficiently on disks. Non - Transactional Continuous Approximated Dynamic optimizations Limited Buffering Streaming The close relatives of streaming algorithms are the one-pass algorithms.
XML processing Dom approach –Build in-core representations –Process as needed by standard API –Disadvantages: Scalability – cannot process large documents Locality – multiple traversals Algorithm inefficiencies – API ’ s perform unnecessary traversals SAX approach –Use a streaming event base API for on the fly parsing of XML –Disadvantages: Programmability : low level event handling Lack of support for Xpath, (especially with parent/ ancestor axes) Process DOM tree (XPath,XQuery,..) Build DOM tree XML parser
Caoz Aproach Caoz (chaos): an acronym for XML Analysis, Optimization,and Stuff. XML Parser Specialized XPath processor XML Doc XPath Expression FilterMatch Results Parsing events: SAX,DOM,Custom
Background Information Restricted XPath Set: –loc path: / step –predicate: [ ] –nodetest –axis specifier: ancestor, parent, child, descendant
Example document X (1,1) Root (0,0) Y (9,2) Y (2,2) Z (3,3) U (8,3) Z (10,3) V (4,4) V (5,4) W (6,4) W (7,5) W (11,4) Nodename (id, level)
X-Tree XPath expression is transformed into a rooted tree, the X- tree Vertices of a X- tree are called X- nodes Nodetests in the expression are translated into X- nodes Unique incoming edges. labeled with the specified axis One X- node is marked as 'Output X- node' Root /descendant:: Y[ child:: U]/ descendant:: W[ ancestor:: Z/ child:: V] Root descendent Y UW Z V child ancestor child descendent
X-Dag X-Dag is generated from the X-tree by reformatting the reverse axis into forward axis: Reverse direction –Ancestor Descendant –Parent Child Handle Orphan nodes –Add descendent axe from Root to orphan nodes
Root Y WU Z V descendent child ancestor /descendent::Y[child::U]/descendent::W[ancestor::Z/child::V] Root Y WU Z V descendent child descendent X-treeX-dag
Matching A matching for an x-tree X is a partial mapping from the x-nodes to the elements of document D where –All mapped vertices satisfy the node test –The edge between two mapped vertices describes the relationship between the mapped elements in the document A total matching exists if all the nodes of the x-tree are mapped. It is easy to show that an element e is in the result of the evaluation of xpath expression iff there is a total matching for the corresponding x-tree. The same argument can be proven for an x-dag. A total matching of an x-tree node v, is composed of total matching at each of the children of v. This is not true for an x-dag node.
/descendent::Y[child::U]/descendent::W[ancestor::Z/child::V] Root Y WU Z V descendent child ancestor X-tree Root Y WU Z V descendent child descendent X-dag
XAOS properties Update modelQuery model Non - transactional High update rate Data is too big and cannot be stored efficiently on disks. Non - Transactional Continuous Approximated Dynamic optimizations Limited Buffering Streaming