Download presentation
Presentation is loading. Please wait.
1
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California, Riverside 2/2/2019
2
Overview Motivation Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019
3
Motivation Publish-subscribe systems: The message transmission is defined by the message content Examples: notification websites hotwire.com or ticketmaster.com Publisher Publisher Publisher Publisher Docu ments Docu ments Docu ments Docu ments Matching algorithm Re su l t Re su l t Re su l t Re su l t Prof ile Prof ile Prof ile Prof ile Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Subscriber Subscriber Subscriber Subscriber 2/2/2019
4
Publish-subscribe systems
The data is exchanged in XML format. Nodes - correspond to elements, attributes or text values Edges represent immediate element-subelement or element-value relationships <Bib> <article vol=“7” no=“11”> <title>t1</title> <author> <last>DeWitt</last> <mi>J</mi> <first>David</first> </author> <journal>TPDS</journal> <year>1996</year> </article> <article> <title>t2</title> <last>Florescu</last> <first>Daniela</first> <proceedings>SIGMOD </proceedings> <year>2006</year> </Bib> Bib article title journal author last first David DeWitt TPDS t1 proceedings Daniela Florescu SIGMOD t2 mi J year 1996 2006 no 11 vol 7 Overview of the talk (a) Document (b) Tree representation 2/2/2019
5
Publish-subscribe systems (cont.)
The user profiles are expressed in XML query language (XPath, XQuery) XML query contains structural constraints value-based constraints Structural constraints: Tree pattern: article author proceedings last conf Overview of the talk 2/2/2019
6
Related Work/Our Contribution
Current work Construction of overlay network Dissemination/indexing of profiles (queries) Processing of stream of messages We focus on the matching process that takes place within a broker Improves the performance of regular FSM by using a bottom-up evaluation of the document Develop index-based filtering technique that performs early pruning of the query profile 2/2/2019
7
Overview Motivation Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019
8
Bottom-up vs. Top-down filtering
State machines are among the most common methods for the XML matching process Top-down approach: (i.e. in-order traversal or depth first order): advancing the state machine for each XML element (or attribute) read. Do not consider any form of early pruning Bottom-up approach: This approach takes into consideration the (usual) fact that an XML document has its more selective elements located at its leaves On this slide we have definition of the problem. 2/2/2019
9
Example Top-down approach groups the queries according to their common prefixes Bottom up: groups them according to their common suffixes. root Q1 a b c d Q2 a Q3 a Q4 a e f h Q5 e Q6 e g h a a a a a a a a a a a b b b b b b b b b b b c e f c c c c c c c c c c c d f h d (a) Document (b) Queries c d a 2 3 4 b 3 4 b Q1 c 1 2 Q1 c d d a 1 5 6 5 On this slide we have definition of the problem. a e Q2 Q2 f h 7 8 9 f e a 6 7 8 Q3 Q4 h Q3 e f h 11 12 f e 10 11 a 12 10 Q5 9 Q5 Q4 g h g e 13 14 13 14 Q6 Q6 (c) Top-down (d) Bottom up 2/2/2019
10
BUFF FSM-based Bottom-up approach for XML filtering.
BUFF avoids translating documents and queries to Prüfer sequences (as the other algorithms do), and employs a more direct evaluation algorithm. The document is parsed through a SAX parser, which triggers events for specific marks (tags) in the XML document The machine keeps a runtime stack that stores the current document path being processed. Overview of the talk 2/2/2019
11
BUFF Example d4 a1 b2 c3 d7 b5 c6 e8 f10 e9 </e> a c d b 1
Q1 f c b b a <b> 5 6 7 8 <a> a Q2 (a) Document and BUFF (b) (c) </d> a c b 2 1 a c b e 1,2 </f> 5 </e> a c b 1 1,2 5 </c> a b 3,6 1,2,5 Overview of the talk (d) (e) (f) (g) 2/2/2019
12
Overview Motivation Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019
13
Bounding-based XML Filtering
Two major processes working asynchronously Profile Management Profile Matching Profile Index Profiles P1 P2 P3 Prüfer Sequence Profile Manager Matching Algorithm Overview of the talk Matching Module Profiles (queries) Input Documents Matched Documents 2/2/2019
14
Prüfer Sequence A unique sequential encoding of a labeled tree
Algorithm: Iteratively removes nodes from the tree until all nodes but the last two have been removed. At each iteration, the algorithm finds and removes the leaf with the smallest label and adds to the Prüfer sequence the label of that leaf's parent. Theorem: If a query tree Q is a subgraph of a document tree D then the Prüfer sequence of Q is a subsequence of the Prüfer sequence of D 2/2/2019
15
Sequence Envelope Assume a set of k Prüfer sequences representing user profiles S1,..,Sk We can derive two new sequences Upper bound U: for each position take largest element Lower bound L: for each position take smallest element L and U form the smallest possible bounding envelope that encompasses all members of the set of sequences from above and below. Overview of the talk 2/2/2019
16
Example Assume 3 sequences with 11 symbols each abcabababcd
cdcdecdcdec dedededebab Overview of the talk 2/2/2019
17
Sequence Envelope (Cont.)
The sequence envelope structure is that it can be used as an aggregation of the sustaining set of sequences Overview of the talk 2/2/2019
18
BoXFilter Tree Sequence envelopes can be nested forming BoXFilter tree
Overview of the talk 2/2/2019
19
Filtering algorithms The profiles in the system are organized in BoXFilter tree. Documents are traversed thought the tree There are two variations of the filtering algorithm Sequential – documents are processed one by one Batch processing – documents are organized in a tree like the queries and both trees are joined After the traversal of the BoXFilter tree, there is a verification step Overview of the talk 2/2/2019
20
Overview Motivation Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019
21
Experimental Results We have generated datasets with 1000, and small documents (with up to 8KB) We generated up to queries with selectivity fixed to 50% Overview of the talk (a) (b) (c) 2/2/2019
22
Experimental Results (cont.)
In this set of experiments, we vary the number of documents that match any of the profile queries. (selectivity 1\% means that one percent of the documents satisfy \textit{any} of the queries.) Overview of the talk 2/2/2019
23
Thank You! 2/2/2019
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.