Download presentation
Presentation is loading. Please wait.
1
Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian
2
2 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions
3
3 Event-based pub/sub systems Publish subscribe systems Event
4
4 Types of pub/sub systems Topic-based vs. Content-based Centralized vs. Distributed
5
5 Information dissemination in pub/sub systems Publication/Subscription routing in distributed pub/sub Subscriber 1 Subscriber 2 Publisher
6
6 Reducing dissemination traffic Goal: Preventing dissemination of redundant subscriptions Subscriber 1 Subscriber 2 Publisher Subscriber 3 Preventing redundant subscription dissemination Reduces subscription forwarding traffic Reduces subscription table size in broker Speeds up publication matching
7
7 Detection of redundant subscriptions: Covering and Subsumption Subscription covering is a pair-wise relationship between subscriptions Subscription s 2 covers subscription s 1 iff all publications matching s 1 also match s 2 Subscription subsumption is a generalization of covering Subscription s is subsumed by subscription set T = {s 1, s 2,.., s n } iff all publications matching s also match at least one of subscriptions in T s1s1 s2s2 s1s1 s2s2 s3s3 s 3 is subsumed by s 1 υ s 2 but not covered by either of them
8
8 Problem formulation Content space: d-dimensional space where each dimension represents a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional points Given a set of d-dimensional rectangles T = {s 1, s 2,.., s n }, is a new rectangle s contained in the disjunction (union) of rectangles in T ?
9
9 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions
10
10 Related work Pair-wise covering For a new subscription s, check if any previous subscription covers it If not, then forward this query to all other brokers in the network Probabilistic subsumption checking For a new subscription s, randomly select d points in s If all of these points were covered by previous subscriptions, assume s is subsumed Complexity O(k.m.d), k = # of subscriptions, m = # dimensions & d = # of test points False negatives may be generated, i.e., subscriptions that are not subsumed may be falsely assumed as subsumed May result in incorrect content routing s1s1 s2s2 s3s3 s1s1 s2s2
11
11 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions
12
12 Exact Subscription Subsumption Checking – Key Observation Checking if a new subscription is covered by the union of previous subscriptions ≡ checking if new subscription intersects with the uncovered region. We partition the content space into positive and negative spaces Positive space,, is parts of the space that are covered by at least one existing subscription Negative space,, is parts of the space that are not covered by any of the existing subscriptions Both can be represented by a set of non-overlapping rectangles Subscription s is subsumed iff
13
13 Representation of Negative Space & Subsumption Evaluation We represent the negative space as a set of non-overlapping d-dimensional rectangles If a new subscription intersects with any of these rectangles, it is not subsumed r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8
14
14 Data structures & Complexity The algorithm always detects whether a new subscription is subsumed or not For efficient subsumption checking, the set of negative rectangles are indexed using R-Tree or KD-Tree for fast retrieval For n subscriptions in d-dimensional space, the algorithm generates O(n d ) negative rectangles For high dimensional content space the number of negative rectangles can grow fast To control the growth of the number of negative rectangles we propose an approximate subsumption checking algorithm
15
15 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions
16
16 Approximation algorithm r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8 r9r9 r6r6 r6r6 In the example we have k=3 On adding a new subscription, restrict the number of new negative rectangles added ≤ k At most O(k.n) negative rectangles after n active subscriptions Leads to no false negatives, may generate some false positives (correctness is not compromised)
17
17 Top-k rectangle selection criteria Top-k selection We propose a model based on benefit/cost for selecting these rectangles. benefit of partitioning a negative rectangle with respect to a subscription is the volume of the intersecting region. cost is the number of new negative rectangles created We choose the top-k negative rectangles with highest benefit to cost ratio for splitting and add them to the representation of negative space.
18
18 Subscription Forwarding in Approximate Algorithm If new subscription does not intersect with any negative rectangle it is covered Otherwise Find all intersecting negative rectangles with the subscription and sort them based on benefit/cost Select first k negative rectangles and subtract the subscribed region from these Update the representation of the negative space by replacing the k original rectangles by the new ones (Algorithms for unsubscribing can be found in the paper)
19
19 Outline Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption checking Experimental evaluation Conclusions
20
20 Experimental evaluation Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution
21
21 Experimental evaluation Measuring advantage of subsumption checking Subscription Subsumption vs. Covering More than 50% improvement in redundant subscription detection Exact algorithm Approximate algorithm (k = 50)
22
22 Storage overhead comparison (Exact vs Approximate) Negative rectangle creation rate
23
23 Experimental evaluation Effect of k in the approximate algorithm Larger k value results in more reduction in redundant subscriptions
24
24 Experimental evaluation Other Selection Metric Value Function Considering both Benefit and Cost results in better subsumption checking
25
25 Conclusions Efficient query subsumption checking can greatly improve the performance of pub/sub systems by reducing subscription routing traffic between brokers. Negative space maintenance as a set of disjoint rectangles leads to efficient subsumption checking by converting it to a intersection detection problem We proposed exact and approximate subsumption checking algorithms & compare their relative performances.
26
26 Thank You! Questions?
27
27 Related work Ouksel et al. present a Monte Carlo type probabilistic algorithm for the subsumption checking For a new subscription s, randomly select d points in s If all of these points were covered by previous subscriptions, assume s is subsumed Has the complexity of O(k.m.d) where k is number of subscriptions, m is number of dimensions and d is the number of tests False negative, subscriptions that are not subsumed may be assumed as subsumed May result in incorrect content routing May mistakenly detect that s 3 is subsumed Our proposed approach prevents false negatives s1s1 s2s2 s3s3
28
28 Exact Subscription Subsumption Checking Subsumption checking algorithm Input: Set of negative rectangles: R={r 1,r 2,…,r m } Subscription s Find R intersect : The set of intersecting negative rectangles with s If R intersect = ∅, s is subsumed Otherwise, For every r i є R intersect R=R-{r i } R i = r i -s, represent R i as a set of non-overlapping rectangles R= R U R i
29
29 Approximate Subscription Subsumption Checking On adding a new subscription, the number of new negative rectangles added ≤ k At most O(k.n) negative rectangles after n active subscriptions In the following example we have k=3 r1r1 r3r3 r2r2 r4r4 r5r5 r6r6 r7r7 r8r8 r9r9 r 10 r 11 r 12 r9r9
30
30 Experimental evaluation Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution For approximate algorithm, default value for k is 50
31
31 Problem definition and formulation Content space: d-dimensional space where each dimension representing a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional points Example: Covering & Subsumption in 2-dimensional space s1s1 s2s2 s1s1 s2s2 s3s3 s 2 is covered by s 1 s 3 is subsumed by s 1 υ s 2 but not covered by either of them
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.