Sven Bittner and Annika Hinze, 2 November 2005 Talk at the 13th International Conference on Cooperative Information Systems (CoopIS 2005) A Detailed Investigation.

Slides:



Advertisements
Similar presentations
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Advertisements

Bloom Based Filters for Hierarchical Data Georgia Koloniari and Evaggelia Pitoura University of Ioannina, Greece.
Sven Bittner and Annika Hinze, 18 January 2006 Talk at the 29 th Australasian Computer Science Conference (ACSC2006) Pruning Subscriptions in Distributed.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Matching Data Dissemination Algorithms to Application Requirements John Heidermann, Fabio Silva, Deborah Estrin Presented by Cuong Le (CPSC538A)
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
Darmstadt University of Technology CoopIS 2001, TrentoGero Mühl Generic Constraints for Content-Based Publish/Subscribe Gero Mühl PhD Program “Enabling.
Hermes: A Distributed Event- Based Middleware Architecture Peter Pietzuch and Jean Bacon 1st DEBS Workshop, Vienna,
A Framework for Object-Based Event Composition in Distributed Systems Peter Pietzuch and Brian Shand June 2002.
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Introduction - The Need for Data Structures Data structures organize data –This gives more efficient programs. More powerful computers encourage more complex.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Efficient Distribution-Based Event Filtering Annika Hinze, Sven Bittner Institute of Computer Science Freie Universität Berlin {hinze,
IMSS005 Computer Science Seminar
Event-Condition-Action Rule Languages over Semistructured Data George Papamarkos.
Publisher Mobility in Distributed Publish/Subscribe Systems Vinod Muthusamy, Milenko Petrovic, Dapeng Gao, Hans-Arno Jacobsen University of Toronto June.
MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.
Sven Bittner, 12 April 2007 Talk at the 5th New Zealand Computer Science Research Student Conference NEWS ALERT: (Kiwi or Cow) and Chainsaw = (Kiwi and.
Supporting Disconnected Operations in Publish/Subscribe Systems Vinod Muthusamy Joint work with Milenko Petrovic, Ioana Burcea, H.-Arno Jacobsen, Eyal.
Sven Bittner, 28 November 2006 Department of Computer Science The University of Waikato, New Zealand Talk at the 3rd International Middleware Doctoral.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
External data structures
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 5 Arrays.
Sven Bittner and Annika Hinze, 31 October 2006 Talk at the 8th International Symposium on Distributed Objects and Applications (DOA 2006) Optimizing Publish/Subscribe.
Classification and Analysis of Distributed Event Filtering Algorithms Sven Bittner Dr. Annika Hinze University of Waikato New Zealand Presentation at CoopIS.
Talk at the 4th International Workshop on Distributed Event-Based Systems at the Conference ICDCS 2005 On the Benefits of Non-Canonical Filtering in Publish/Subscribe.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
MIDDLEWARE SYSTEMS RESEARCH GROUP Modelling Performance Optimizations for Content-based Publish/Subscribe Alex Wun and Hans-Arno Jacobsen Department of.
MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.
Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
ICDCS Beijing China Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research.
P-Tree Implementation Anne Denton. So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of.
Content-based Publish-Subscribe Over Structured P2P Networks Peter Triantafillou and Ioannis Aekaterinidis Presented by Jesse Chen 4/10/07.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
+ Structures and Unions. + Introduction We have seen that arrays can be used to represent a group of data items that belong to the same type, such as.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.
Parallel and Distributed Simulation Data Distribution II.
A Publish & Subscribe Architecture for Distributed Metadata Management Markus Keidl 1 Alexander Kreutz 1 Alfons Kemper 1 Donald Kossmann 2 1 Universität.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Congestion Avoidance with Incremental Filter Aggregation in Content-Based Routing Networks Mingwen Chen 1, Songlin Hu 1, Vinod Muthusamy 2, Hans-Arno Jacobsen.
Miklós Zoltán Technical University of Vienna Distributed Systems Group
Mining Utility Functions based on user ratings
A Framework for Object-Based Event Composition in Distributed Systems
Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1
Chapter 15 QUERY EXECUTION.
Examples of Physical Query Plan Alternatives
Project Demo Mehdi Sadri Jamshid Esmaelnezhad Spring 2012
Extendible Indexing Dina Said
Propositional Calculus: Boolean Algebra and Simplification
Composite Subscriptions in Content-based Pub/Sub Systems
Update on “Channel Models for 60 GHz WLAN Systems” Document
Overview of Query Evaluation
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
REED : Robust, Efficient Filtering and Event Detection
Presentation transcript:

Sven Bittner and Annika Hinze, 2 November 2005 Talk at the 13th International Conference on Cooperative Information Systems (CoopIS 2005) A Detailed Investigation of Memory Requirements for Publish/Subscribe Filtering Algorithms

2/26 Motivation: Publish/Subscribe Subscribers register subscriptionsSubscribers register subscriptions Publishers send event messagesPublishers send event messages System informs using notificationsSystem informs using notifications EBay TradeMeUser Pub/Sub System pub(item,price,timeLeft,…) pub(item,price,timeLeft,…) Notify about items of interest Subscriptionpub(item,...) Filtering Annika Hinze – Expressive Event Filtering in Distributed Systems

3/26 Motivation: Application Scenario A subscriber is interested in French books whose title contains the phrase “Harry Potter”.A subscriber is interested in French books whose title contains the phrase “Harry Potter”. According to the condition of the copy of the book (new, used), she wants to pay at most NZ$10.0 or NZ$15.0.According to the condition of the copy of the book (new, used), she wants to pay at most NZ$10.0 or NZ$15.0. To avoid unnecessary notifications, the subscriber will be notified not earlier than one day before the auction ends.To avoid unnecessary notifications, the subscriber will be notified not earlier than one day before the auction ends. title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR Annika Hinze – Expressive Event Filtering in Distributed Systems

4/26 Motivation: Research Question Current approaches only support conjunctionsCurrent approaches only support conjunctions  Canonical conversion (DNF) required + Fast filtering process (no Boolean expressions) − High memory usage (exponentially-sized DNF) Effective in DBMS, but also in pub/sub? Annika Hinze – Expressive Event Filtering in Distributed Systems

5/26 title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = NEWprice < 15.0 AND title like “Harry Potter”endingWithin < 1 day language = FRENCH condition = USEDprice < 10.0 AND Canonical conversion Motivation: Canonical Conversion Annika Hinze – Expressive Event Filtering in Distributed Systems

6/26 Motivation: Goal Analyse influence of conversions on memory  scalability (and efficiency)Analyse influence of conversions on memory  scalability (and efficiency) –Define scheme to characterise subscriptions Describe structure of subscriptionsDescribe structure of subscriptions Abstraction from specific application scenarioAbstraction from specific application scenario  Derive memory requirements of algorithms Annika Hinze – Expressive Event Filtering in Distributed Systems

7/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems

8/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems

9/26 Characterisation Scheme (1) Fourteen parameters in four classesFourteen parameters in four classes –Subscription-related (S) Characteristics of subscriptionsCharacteristics of subscriptions –Algorithm-related (A) Influence internal storageInfluence internal storage –Conversion-related (C) Canonical conversionsCanonical conversions –Subscription-event-related (E) Relation between events and subscriptionsRelation between events and subscriptions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

10/26 Characterisation Scheme (2) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

11/26 Characterisation Scheme: Example |p|= 7 (number of predicates) |op|= 4 (number of Boolean operators) op r = |op|/|p| = 4/7  0.6(relative number of Boolean operators) S s = 2 (disj. comb. elements after conversion) s p = (3*2+4*1)/7 = 10/7  1.4(conjunctive elements per predicate) s r = s p /S s = (10/7)/2 = 5/7  0.7(relative conjunctive elements per pred.) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook title like“Harry Potter”endingWithin< 1 day language = FRENCH condition = NEW price < 10.0 price < 15.0 AND OR condition = USED AND OR endingWithin< 1 day language = FRENCH condition = NEWprice < 15.0 AND title like“Harry Potter”endingWithin< 1 day language = FRENCH AND condition = USED price < 10.0 AND title like“Harry Potter” Originalsubs-cription Conver-tedsubscrip-tions

12/26 Analysed Algorithms Three filtering algorithmsThree filtering algorithms –Canonical approaches (conjunctions) Counting algorithm [Ashayer02,Yan94]Counting algorithm [Ashayer02,Yan94] Cluster algorithm [Fabret01,Hanson90]Cluster algorithm [Fabret01,Hanson90] –Non-canonical approach (Boolean subscriptions) Subscription-tree-based filtering approach [Bittner05a,Bittner05b]Subscription-tree-based filtering approach [Bittner05a,Bittner05b] Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

13/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems

14/26 Memory Usage: Analysis Counting algorithmCounting algorithm Cluster algorithmCluster algorithm Non-canonical algorithmNon-canonical algorithm Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

15/26 Memory Usage: Comparison (1) All formulaeAll formulae –grow linearly with |s| –Cut ordinate in zero  Comparison of first derivations in |s| sufficient Assumptions (less parameters)Assumptions (less parameters) –Reasonable values for algorithm-related parameters (A) –Usage of relative parameters Determine turning point when NCA requires less memory than canonical solutionsDetermine turning point when NCA requires less memory than canonical solutions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

16/26 Memory Usage: Comparison (2) Description of turning point by number of disjunctive elements in DNF (S s )Description of turning point by number of disjunctive elements in DNF (S s ) –Beneficial behaviour of NCA –Boolean subscriptions worthwhile  Counting requires less memory than cluster algorithm Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

17/26 Memory Usage: Example |p| = 7 (number of predicates) op r = 4/7 (relative number of Boolean operators) s r = 5/7 (relative conjunctive elements per predicate) = 89/49  1.82 = 89/56  1.59 Practice: S s = 2  NCA uses less memory (turning point less than one disj.) Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook title like “Harry Potter” endingWithin< 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR endingWithin< 1 day language = FRENCH condition = NEWcondition = USED price < 10.0 price < 15.0 AND OR

18/26 Memory Usage: Illustration SettingSetting –half as many operators as predicates (op r ) –conjunctions per predicate vary (s r ) Only one disjunction per subscription results in less memory requirements of non-canonical approach. Counting vs. non-canonical Cluster vs. non-canonical Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

19/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and OutlookSummary and Outlook Annika Hinze – Expressive Event Filtering in Distributed Systems

20/26 Practical Analysis Verification of theoretical resultsVerification of theoretical results More memory required for management of data structures, e.g.,More memory required for management of data structures, e.g., –Lists –Dynamic arrays –Hash tables Overhead for different algorithms similar? Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

21/26 Practical Analysis: Results (1) s r =0.3 (predicates in few conjunctions)s r =0.3 (predicates in few conjunctions) Consistent behaviour in theory/practiceConsistent behaviour in theory/practice Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

22/26 Practical Analysis: Efficiency Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook Nearly similar efficiency propertiesNearly similar efficiency properties  Overhead of converted (=more) subscriptions outweighs more efficient filtering (time and space)

23/26 Structure MotivationMotivation Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms Theoretical Analysis and ComparisonTheoretical Analysis and Comparison Practical AnalysisPractical Analysis Summary and Future WorkSummary and Future Work Annika Hinze – Expressive Event Filtering in Distributed Systems

24/26 Summary (1) Characterisation schemeCharacterisation scheme –Describe subscriptions –Calculate memory requirements of filter algorithms Theoretical analysis and comparisonTheoretical analysis and comparison –Three algorithms –Determination of point when NCA requires less memory  Even one disjunction might favour NCA Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

25/26 Summary (2) Practical analysisPractical analysis –Memory in practical settings –Correlation of efficiency properties  Theoretical results hold in practice  NCA is equally/more time efficient  NCA is preferable algorithm if subscriptions include disjunctions Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

26/26 Future Work Distribute algorithmDistribute algorithm –Optimise event and subscription routing –Problem: Current routing optimisations only work for conjunctive subscriptions (covering, merging)  Design novel routing optimisations Support arbitrary subscriptionsSupport arbitrary subscriptions Subscription tree pruningSubscription tree pruning Predicate replacementPredicate replacement Motivation Characterisation/Algorithms Theoretical Analysis Experiments Outlook

Thank you for your attention! Contact: Sven Bittner, Annika Hinze {s.bittner,

References [Ashayer02] G. Ashayer, H.-A. Jacobsen, and H. Leung. Predicate Matching and Subscription Matching in Publish/Subscribe Systems. In Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’02), pages 539–548, Vienna, Austria, July 2– [Bittner05a] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’05), pages 451–457, Columbus, USA, June 6– [Bittner05b] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Proceedings of the 13th International Conference on Cooperative Information Systems (CoopIS 2005), Agia Napa, Cyprus, October 31– November [Fabret01] F. Fabret, A. Jacobsen, F. Llirbat, J. Pereira, K. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pages , Santa Barbara, USA, May 21– [Hanson90] E. N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A Predicate Matching Algorithm for Database Rule Systems. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pages , Atlantic City, USA, May [Yan94] T. W. Yan and H. Garcia-Molina. Index Structures for Selective Dissemination of Information Under the Boolean Model. ACM Transactions on Database Systems (TODS), 19(2):332–364, Annika Hinze – Expressive Event Filtering in Distributed Systems