Early Profile Pruning on XML-aware Publish-Subscribe Systems

Early Profile Pruning on XML-aware Publish-Subscribe Systems
Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California, Riverside 2/2/2019

Overview Motivation Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019

Motivation Publish-subscribe systems: The message transmission is defined by the message content Examples: notification websites hotwire.com or ticketmaster.com Publisher Publisher Publisher Publisher Docu ments Docu ments Docu ments Docu ments Matching algorithm Re su l t Re su l t Re su l t Re su l t Prof ile Prof ile Prof ile Prof ile Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Subscriber Subscriber Subscriber Subscriber 2/2/2019

Publish-subscribe systems
The data is exchanged in XML format. Nodes - correspond to elements, attributes or text values Edges represent immediate element-subelement or element-value relationships <Bib> <article vol=“7” no=“11”> <title>t1</title> <author> <last>DeWitt</last> <mi>J</mi> <first>David</first> </author> <journal>TPDS</journal> <year>1996</year> </article> <article> <title>t2</title> <last>Florescu</last> <first>Daniela</first> <proceedings>SIGMOD </proceedings> <year>2006</year> </Bib> Bib article title journal author last first David DeWitt TPDS t1 proceedings Daniela Florescu SIGMOD t2 mi J year 1996 2006 no 11 vol 7 Overview of the talk (a) Document (b) Tree representation 2/2/2019

Publish-subscribe systems (cont.)
The user profiles are expressed in XML query language (XPath, XQuery) XML query contains structural constraints value-based constraints Structural constraints: Tree pattern: article author proceedings last conf Overview of the talk 2/2/2019

Related Work/Our Contribution
Current work Construction of overlay network Dissemination/indexing of profiles (queries) Processing of stream of messages We focus on the matching process that takes place within a broker Improves the performance of regular FSM by using a bottom-up evaluation of the document Develop index-based filtering technique that performs early pruning of the query profile 2/2/2019

Bottom-up vs. Top-down filtering
State machines are among the most common methods for the XML matching process Top-down approach: (i.e. in-order traversal or depth first order): advancing the state machine for each XML element (or attribute) read. Do not consider any form of early pruning Bottom-up approach: This approach takes into consideration the (usual) fact that an XML document has its more selective elements located at its leaves On this slide we have definition of the problem. 2/2/2019

Example Top-down approach groups the queries according to their common prefixes Bottom up: groups them according to their common suffixes. root Q1 a b c d Q2 a Q3 a Q4 a e f h Q5 e Q6 e g h a a a a a a a a a a a b b b b b b b b b b b c e f c c c c c c c c c c c d f h d (a) Document (b) Queries c d a 2 3 4 b 3 4 b Q1 c 1 2 Q1 c d d a 1 5 6 5 On this slide we have definition of the problem. a e Q2 Q2 f h 7 8 9 f e a 6 7 8 Q3 Q4 h Q3 e f h 11 12 f e 10 11 a 12 10 Q5 9 Q5 Q4 g h g e 13 14 13 14 Q6 Q6 (c) Top-down (d) Bottom up 2/2/2019

BUFF FSM-based Bottom-up approach for XML filtering.
BUFF avoids translating documents and queries to Prüfer sequences (as the other algorithms do), and employs a more direct evaluation algorithm. The document is parsed through a SAX parser, which triggers events for specific marks (tags) in the XML document The machine keeps a runtime stack that stores the current document path being processed. Overview of the talk 2/2/2019

BUFF Example d4 a1 b2 c3 d7 b5 c6 e8 f10 e9 </e> a c d b 1
Q1 f c b b a <b> 5 6 7 8 <a> a Q2 (a) Document and BUFF (b)‏ (c)‏ </d> a c b 2 1 a c b e 1,2 </f> 5 </e> a c b 1 1,2 5 </c> a b 3,6 1,2,5 Overview of the talk (d)‏ (e)‏ (f)‏ (g)‏ 2/2/2019

Bounding-based XML Filtering
Two major processes working asynchronously Profile Management Profile Matching Profile Index Profiles P1 P2 P3 Prüfer Sequence Profile Manager Matching Algorithm Overview of the talk Matching Module Profiles (queries)‏ Input Documents Matched Documents 2/2/2019

Prüfer Sequence A unique sequential encoding of a labeled tree
Algorithm: Iteratively removes nodes from the tree until all nodes but the last two have been removed. At each iteration, the algorithm finds and removes the leaf with the smallest label and adds to the Prüfer sequence the label of that leaf's parent. Theorem: If a query tree Q is a subgraph of a document tree D then the Prüfer sequence of Q is a subsequence of the Prüfer sequence of D 2/2/2019

Sequence Envelope Assume a set of k Prüfer sequences representing user profiles S1,..,Sk We can derive two new sequences Upper bound U: for each position take largest element Lower bound L: for each position take smallest element L and U form the smallest possible bounding envelope that encompasses all members of the set of sequences from above and below. Overview of the talk 2/2/2019

Example Assume 3 sequences with 11 symbols each abcabababcd
cdcdecdcdec dedededebab Overview of the talk 2/2/2019

Sequence Envelope (Cont.)
The sequence envelope structure is that it can be used as an aggregation of the sustaining set of sequences Overview of the talk 2/2/2019

BoXFilter Tree Sequence envelopes can be nested forming BoXFilter tree
Overview of the talk 2/2/2019

Filtering algorithms The profiles in the system are organized in BoXFilter tree. Documents are traversed thought the tree There are two variations of the filtering algorithm Sequential – documents are processed one by one Batch processing – documents are organized in a tree like the queries and both trees are joined After the traversal of the BoXFilter tree, there is a verification step Overview of the talk 2/2/2019

Experimental Results We have generated datasets with 1000, and small documents (with up to 8KB) We generated up to queries with selectivity fixed to 50% Overview of the talk (a)‏ (b)‏ (c)‏ 2/2/2019

Experimental Results (cont.)
In this set of experiments, we vary the number of documents that match any of the profile queries. (selectivity 1\% means that one percent of the documents satisfy \textit{any} of the queries.) Overview of the talk 2/2/2019

Thank You! 2/2/2019

Early Profile Pruning on XML-aware Publish-Subscribe Systems

Similar presentations

Presentation on theme: "Early Profile Pruning on XML-aware Publish-Subscribe Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Early Profile Pruning on XML-aware Publish-Subscribe Systems

Similar presentations

Presentation on theme: "Early Profile Pruning on XML-aware Publish-Subscribe Systems"— Presentation transcript:

Similar presentations

About project

Feedback