Download presentation
Presentation is loading. Please wait.
1
(b) Tree representation
Early Profile Pruning on XML-aware Publish-Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. UCR Full version appears in the Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007) 1 Motivation 3 FSM based bottom up approach for XML filtering (BUFF) Theorem: If a query tree Q is a subgraph of a document tree D then the Prüfer sequence Q is a subsequence of the Prüfer sequence of D Increased popularity of Publish-subscribe systems – an important class of content-based dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. 2.1 Bottom-up vs. Top-down filtering We can derive two new sequences Upper bound U: for each position take largest element Lower bound L: for each position take smallest element L and U form a Sequence Envelope. Top-down approach: (i.e. in-order traversal or depth first order): advancing the state machine for each XML element (or attribute) read. Bottom-up approach: This approach takes into consideration the fact that an XML document has its more selective elements in the leaves Q1 a b c d Q2 Q4 e f h Q5 Q6 g (b) Queries (d) BUFF (c) NFA 1 2 5 7 10 3 4 6 8 9 11 13 12 14 Q3 (a) Document root Sequence envelopes can be nested forming BoXFilter tree 2.2 BUFF algorithm The document is parsed through a SAX parser, which triggers events for specific marks (tags) in the XML document The machine keeps a runtime stack that stores the current document path being processed. For each opening tag, the respective element is pushed to the stack For each closing tag, an element is popped from and is employed to trigger a set of transitions within the NFA. 2 System Description Participants in the system: Publisher: generates messages outside of the system Subscribers: announce their interest by submitting profiles Matching process: in charge of finding which messages satisfy which profile Profile Index Profiles P1 P2 P3 Prüfer Sequence Profile Manager Matching Algorithm Input Documents (queries) Matched Module (a) Document and BUFF a <a> b <b> c <c> d <d> </e> 1 d4 a1 b2 c3 d7 b5 c6 e8 f10 e9 2 e 3 4 Q1 5 6 f 7 8 Q2 (b) </d> (c) (d) (e) 1,2 </f> (f) (g) </c> 3,6 1,2,5 (h) The profiles in the system are organized in BoXFilter tree. Documents are traversed thought the tree Publisher Matching algorithm Documents Profile Submit, Modify Result Documents There are two variations of the filtering algorithm Sequential – documents are processed one by one Batch processing – documents are organized in a tree like the queries and both trees are joined. After the traversal, there is a verification step 4 Bounding-based XML Filtering (BoxFilter) The data is exchanged in XML format. Nodes - correspond to elements, attributes or text values Edges - represent immediate element-sub element or element-value relationships 5 Results Prüfer Sequence: A unique sequential encoding of a enumerated and labeled tree Algorithm: Iteratively removes nodes from the tree. At each iteration, the algorithm finds and removes the leaf with the smallest number and adds to the Prüfer sequence the number of that leaf's parent. A 1 B 2 D 5 C 3 E 6 F 8 4 7 9 Prüfer Sequence <Bib> <article vol=“7” no=“11”> <title>t1</title> <author> <last>DeWitt</last> <mi>J</mi> <first>David</first> </author> <journal>TPDS</journal> <year>1996</year> </article> <article> <title>t2</title> <last>Florescu</last> <first>Daniela</first> <proceedings>SIGMOD </proceedings> <year>2006</year> </Bib> Bib article title journal author last first David DeWitt TPDS t1 proceedings Daniela Florescu SIGMOD t2 mi J year 1996 2006 no 11 vol 7 (a) (b) (c) (b) Tree representation The user profiles are expressed in XML query language (XPath, XQuery) XML query contains structural constraints value-based constraints (a) Document 1.03 0.95 0.35 article proceedings conf author last Structural constraints: Tree pattern:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.