Presentation is loading. Please wait.

Presentation is loading. Please wait.

Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

Similar presentations


Presentation on theme: "Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian."— Presentation transcript:

1 Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian

2 2 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions

3 3 Event-based pub/sub systems Publish subscribe systems Event

4 4 Types of pub/sub systems Topic-based vs. Content-based Centralized vs. Distributed

5 5 Information dissemination in pub/sub systems Publication/Subscription routing in distributed pub/sub Subscriber 1 Subscriber 2 Publisher

6 6 Reducing dissemination traffic Goal: Preventing dissemination of redundant subscriptions Subscriber 1 Subscriber 2 Publisher Subscriber 3 Preventing redundant subscription dissemination Reduces subscription forwarding traffic Reduces subscription table size in broker Speeds up publication matching

7 7 Detection of redundant subscriptions: Covering and Subsumption Subscription covering is a pair-wise relationship between subscriptions  Subscription s 2 covers subscription s 1 iff all publications matching s 1 also match s 2 Subscription subsumption is a generalization of covering Subscription s is subsumed by subscription set T = {s 1, s 2,.., s n } iff all publications matching s also match at least one of subscriptions in T s1s1 s2s2 s1s1 s2s2 s3s3 s 3 is subsumed by s 1 υ s 2 but not covered by either of them

8 8 Problem formulation Content space: d-dimensional space where each dimension represents a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional points Given a set of d-dimensional rectangles T = {s 1, s 2,.., s n }, is a new rectangle s contained in the disjunction (union) of rectangles in T ?

9 9 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions

10 10 Related work Pair-wise covering  For a new subscription s, check if any previous subscription covers it  If not, then forward this query to all other brokers in the network Probabilistic subsumption checking  For a new subscription s, randomly select d points in s  If all of these points were covered by previous subscriptions, assume s is subsumed Complexity O(k.m.d), k = # of subscriptions, m = # dimensions & d = # of test points False negatives may be generated, i.e., subscriptions that are not subsumed may be falsely assumed as subsumed  May result in incorrect content routing s1s1 s2s2 s3s3 s1s1 s2s2

11 11 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions

12 12 Exact Subscription Subsumption Checking – Key Observation Checking if a new subscription is covered by the union of previous subscriptions ≡ checking if new subscription intersects with the uncovered region.  We partition the content space into positive and negative spaces Positive space,, is parts of the space that are covered by at least one existing subscription Negative space,, is parts of the space that are not covered by any of the existing subscriptions Both can be represented by a set of non-overlapping rectangles  Subscription s is subsumed iff

13 13 Representation of Negative Space & Subsumption Evaluation We represent the negative space as a set of non-overlapping d-dimensional rectangles If a new subscription intersects with any of these rectangles, it is not subsumed r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8

14 14 Data structures & Complexity The algorithm always detects whether a new subscription is subsumed or not For efficient subsumption checking, the set of negative rectangles are indexed using R-Tree or KD-Tree for fast retrieval For n subscriptions in d-dimensional space, the algorithm generates O(n d ) negative rectangles For high dimensional content space the number of negative rectangles can grow fast  To control the growth of the number of negative rectangles we propose an approximate subsumption checking algorithm

15 15 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions

16 16 Approximation algorithm r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8 r9r9 r6r6 r6r6 In the example we have k=3 On adding a new subscription, restrict the number of new negative rectangles added ≤ k At most O(k.n) negative rectangles after n active subscriptions Leads to no false negatives, may generate some false positives (correctness is not compromised)

17 17 Top-k rectangle selection criteria Top-k selection  We propose a model based on benefit/cost for selecting these rectangles.  benefit of partitioning a negative rectangle with respect to a subscription is the volume of the intersecting region.  cost is the number of new negative rectangles created  We choose the top-k negative rectangles with highest benefit to cost ratio for splitting and add them to the representation of negative space.

18 18 Subscription Forwarding in Approximate Algorithm If new subscription does not intersect with any negative rectangle it is covered Otherwise Find all intersecting negative rectangles with the subscription and sort them based on benefit/cost Select first k negative rectangles and subtract the subscribed region from these Update the representation of the negative space by replacing the k original rectangles by the new ones (Algorithms for unsubscribing can be found in the paper)

19 19 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions

20 20 Experimental evaluation Simulation setup  10K subscriptions  2, 3, 4 and 5 dimensional space  Each dimension in range [0, 1000]  Zipfian distribution

21 21 Experimental evaluation Measuring advantage of subsumption checking  Subscription Subsumption vs. Covering More than 50% improvement in redundant subscription detection Exact algorithm Approximate algorithm (k = 50)

22 22 Storage overhead comparison (Exact vs Approximate) Negative rectangle creation rate

23 23 Experimental evaluation Effect of k in the approximate algorithm Larger k value results in more reduction in redundant subscriptions

24 24 Experimental evaluation Other Selection Metric Value Function Considering both Benefit and Cost results in better subsumption checking

25 25 Conclusions Efficient query subsumption checking can greatly improve the performance of pub/sub systems by reducing subscription routing traffic between brokers. Negative space maintenance as a set of disjoint rectangles leads to efficient subsumption checking by converting it to a intersection detection problem We proposed exact and approximate subsumption checking algorithms & compare their relative performances.

26 26 Thank You! Questions?

27 27 Related work Ouksel et al. present a Monte Carlo type probabilistic algorithm for the subsumption checking  For a new subscription s, randomly select d points in s  If all of these points were covered by previous subscriptions, assume s is subsumed Has the complexity of O(k.m.d) where k is number of subscriptions, m is number of dimensions and d is the number of tests False negative, subscriptions that are not subsumed may be assumed as subsumed  May result in incorrect content routing May mistakenly detect that s 3 is subsumed Our proposed approach prevents false negatives s1s1 s2s2 s3s3

28 28 Exact Subscription Subsumption Checking Subsumption checking algorithm  Input: Set of negative rectangles: R={r 1,r 2,…,r m } Subscription s  Find R intersect : The set of intersecting negative rectangles with s  If R intersect = ∅, s is subsumed  Otherwise, For every r i є R intersect  R=R-{r i }  R i = r i -s, represent R i as a set of non-overlapping rectangles  R= R U R i

29 29 Approximate Subscription Subsumption Checking On adding a new subscription, the number of new negative rectangles added ≤ k At most O(k.n) negative rectangles after n active subscriptions In the following example we have k=3 r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8 r9r9 r 10 r 11 r 12 r9r9

30 30 Experimental evaluation Simulation setup  10K subscriptions  2, 3, 4 and 5 dimensional space  Each dimension in range [0, 1000]  Zipfian distribution  For approximate algorithm, default value for k is 50

31 31 Problem definition and formulation Content space: d-dimensional space where each dimension representing a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional points Example: Covering & Subsumption in 2-dimensional space s1s1 s2s2 s1s1 s2s2 s3s3 s 2 is covered by s 1 s 3 is subsumed by s 1 υ s 2 but not covered by either of them


Download ppt "Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian."

Similar presentations


Ads by Google