Download presentation
Presentation is loading. Please wait.
Published byBenedict Watkins Modified over 9 years ago
1
Sven Bittner and Annika Hinze, 2 November 2005 Talk at the 13th International Conference on Cooperative Information Systems (CoopIS 2005) A Detailed Investigation of Memory Requirements for Publish/Subscribe Filtering Algorithms
2
2/26 Motivation: Publish/Subscribe Subscribers register subscriptionsSubscribers register subscriptions Publishers send event messagesPublishers send event messages System informs using notificationsSystem informs using notifications EBay TradeMeUser Pub/Sub System pub(item,price,timeLeft,…) pub(item,price,timeLeft,…) Notify about items of interest Subscriptionpub(item,...) Filtering Annika Hinze – Expressive Event Filtering in Distributed Systems
3
3/26 Motivation: Application Scenario A subscriber is interested in French books whose title contains the phrase “Harry Potter”.A subscriber is interested in French books whose title contains the phrase “Harry Potter”. According to the condition of the copy of the book (new, used), she wants to pay at most NZ$10.0 or NZ$15.0.According to the condition of the copy of the book (new, used), she wants to pay at most NZ$10.0 or NZ$15.0. To avoid unnecessary notifications, the subscriber will be notified not earlier than one day before the auction ends.To avoid unnecessary notifications, the subscriber will be notified not earlier than one day before the auction ends. title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR Annika Hinze – Expressive Event Filtering in Distributed Systems
4
4/26 Motivation: Research Question Current approaches only support conjunctionsCurrent approaches only support conjunctions Canonical conversion (DNF) required + Fast filtering process (no Boolean expressions) − High memory usage (exponentially-sized DNF) Effective in DBMS, but also in pub/sub? Annika Hinze – Expressive Event Filtering in Distributed Systems
5
5/26 title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWprice < 15.0 AND title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = USEDprice < 10.0 AND Canonical conversion Motivation: Canonical Conversion Annika Hinze – Expressive Event Filtering in Distributed Systems
6
6/26 Motivation: Goal Analyse influence of conversions on memory scalability (and efficiency)Analyse influence of conversions on memory scalability (and efficiency) –Define scheme to characterise subscriptions Describe structure of subscriptionsDescribe structure of subscriptions Abstraction from specific application scenarioAbstraction from specific application scenario Derive memory requirements of algorithms Annika Hinze – Expressive Event Filtering in Distributed Systems
7
7/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems
8
8/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems
9
9/26 Characterisation Scheme (1) Fourteen parameters in four classesFourteen parameters in four classes –Subscription-related (S) Characteristics of subscriptionsCharacteristics of subscriptions –Algorithm-related (A) Influence internal storageInfluence internal storage –Conversion-related (C) Canonical conversionsCanonical conversions –Subscription-event-related (E) Relation between events and subscriptionsRelation between events and subscriptions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
10
10/26 Characterisation Scheme (2) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
11
11/26 Characterisation Scheme: Example |p|= 7 (number of predicates) |op|= 4 (number of Boolean operators) op r = |op|/|p| = 4/7 0.6(relative number of Boolean operators) S s = 2 (disj. comb. elements after conversion) s p = (3*2+4*1)/7 = 10/7 1.4(conjunctive elements per predicate) s r = s p /S s = (10/7)/2 = 5/7 0.7(relative conjunctive elements per pred.) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook title like“Harry Potter”endingWithin< 1 day language = FRENCH condition = NEW price < 10.0 price < 15.0 AND OR condition = USED AND OR endingWithin< 1 day language = FRENCH condition = NEWprice < 15.0 AND title like“Harry Potter”endingWithin< 1 day language = FRENCH AND condition = USED price < 10.0 AND 2 2 2 1111 2 title like“Harry Potter” Originalsubs-cription Conver-tedsubscrip-tions
12
12/26 Analysed Algorithms Three filtering algorithmsThree filtering algorithms –Canonical approaches (conjunctions) Counting algorithm [Ashayer02,Yan94]Counting algorithm [Ashayer02,Yan94] Cluster algorithm [Fabret01,Hanson90]Cluster algorithm [Fabret01,Hanson90] –Non-canonical approach (Boolean subscriptions) Subscription-tree-based filtering approach [Bittner05a,Bittner05b]Subscription-tree-based filtering approach [Bittner05a,Bittner05b] Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
13
13/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems
14
14/26 Memory Usage: Analysis Counting algorithmCounting algorithm Cluster algorithmCluster algorithm Non-canonical algorithmNon-canonical algorithm Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
15
15/26 Memory Usage: Comparison (1) All formulaeAll formulae –grow linearly with |s| –Cut ordinate in zero Comparison of first derivations in |s| sufficient Assumptions (less parameters)Assumptions (less parameters) –Reasonable values for algorithm-related parameters (A) –Usage of relative parameters Determine turning point when NCA requires less memory than canonical solutionsDetermine turning point when NCA requires less memory than canonical solutions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
16
16/26 Memory Usage: Comparison (2) Description of turning point by number of disjunctive elements in DNF (S s )Description of turning point by number of disjunctive elements in DNF (S s ) –Beneficial behaviour of NCA –Boolean subscriptions worthwhile Counting requires less memory than cluster algorithm Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
17
17/26 Memory Usage: Example |p| = 7 (number of predicates) op r = 4/7 (relative number of Boolean operators) s r = 5/7 (relative conjunctive elements per predicate) = 89/49 1.82 = 89/56 1.59 Practice: S s = 2 NCA uses less memory (turning point less than one disj.) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook title like “Harry Potter” endingWithin< 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR endingWithin< 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR
18
18/26 Memory Usage: Illustration SettingSetting –half as many operators as predicates (op r ) –conjunctions per predicate vary (s r ) Only one disjunction per subscription results in less memory requirements of non-canonical approach. Counting vs. non-canonical Cluster vs. non-canonical Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
19
19/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems
20
20/26 Practical Analysis Verification of theoretical resultsVerification of theoretical results More memory required for management of data structures, e.g.,More memory required for management of data structures, e.g., –Lists –Dynamic arrays –Hash tables Overhead for different algorithms similar? Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
21
21/26 Practical Analysis: Results (1) s r =0.3 (predicates in few conjunctions)s r =0.3 (predicates in few conjunctions) Consistent behaviour in theory/practiceConsistent behaviour in theory/practice Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
22
22/26 Practical Analysis: Efficiency Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook Nearly similar efficiency propertiesNearly similar efficiency properties Overhead of converted (=more) subscriptions outweighs more efficient filtering (time and space)
23
23/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and Future WorkSummary and Future Work Annika Hinze – Expressive Event Filtering in Distributed Systems
24
24/26 Summary (1) Characterisation schemeCharacterisation scheme –Describe subscriptions –Calculate memory requirements of filter algorithms Theoretical analysis and comparisonTheoretical analysis and comparison –Three algorithms –Determination of point when NCA requires less memory Even one disjunction might favour NCA Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
25
25/26 Summary (2) Practical analysisPractical analysis –Memory in practical settings –Correlation of efficiency properties Theoretical results hold in practice NCA is equally/more time efficient NCA is preferable algorithm if subscriptions include disjunctions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
26
26/26 Future Work Distribute algorithmDistribute algorithm –Optimise event and subscription routing –Problem: Current routing optimisations only work for conjunctive subscriptions (covering, merging) Design novel routing optimisations Support arbitrary subscriptionsSupport arbitrary subscriptions Subscription tree pruningSubscription tree pruning Predicate replacementPredicate replacement Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook
27
Thank you for your attention! Contact: Sven Bittner, Annika Hinze {s.bittner, a.hinze}@cs.waikato.ac.nz
28
References [Ashayer02] G. Ashayer, H.-A. Jacobsen, and H. Leung. Predicate Matching and Subscription Matching in Publish/Subscribe Systems. In Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’02), pages 539–548, Vienna, Austria, July 2–5 2002. [Bittner05a] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’05), pages 451–457, Columbus, USA, June 6–10 2005. [Bittner05b] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Proceedings of the 13th International Conference on Cooperative Information Systems (CoopIS 2005), Agia Napa, Cyprus, October 31– November 4 2005. [Fabret01] F. Fabret, A. Jacobsen, F. Llirbat, J. Pereira, K. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pages 115-126, Santa Barbara, USA, May 21–24 2001. [Hanson90] E. N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A Predicate Matching Algorithm for Database Rule Systems. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pages 271-280, Atlantic City, USA, May 23-25 1990. [Yan94] T. W. Yan and H. Garcia-Molina. Index Structures for Selective Dissemination of Information Under the Boolean Model. ACM Transactions on Database Systems (TODS), 19(2):332–364, 1994. Annika Hinze – Expressive Event Filtering in Distributed Systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.