VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto
VLDB05 2 Information Dissemination Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers RSS frequently used to disseminate update to interested users CNN.com, Yahoo! News, Amazon.com, MSN search (beta) RSS aggregator RSS readers RSS publishers Problem: Polling based architecture
VLDB05 3 Solution! Current rss dissemination architecture G-ToPSS rss dissemination architecture
VLDB05 4 Interaction Model: Publish/Subscribe Broker Publisher Subscriber RSS feeds Matching RSS feeds Queries over all RSS
VLDB05 5 Research challenges 1. Need a subscription (query) language suitable for filtering of rss documents 2. Need an efficient matching algorithm based on graph representation Structurally matching Constraint matching 3. Scalability to a large number of subscriptions and high publishing rate
VLDB05 6 CMS-ToPSS System Architecture
VLDB05 7 Subscription Scalability
VLDB05 8 Memory Scalability
VLDB05 9 Matching Semantics PAPER17 “Arno Jacobsen” AUTHOR SIGMOD CONFERENCE “California” LOCATION “2001” YEAR ?y (?y <= Publication) “Arno Jacobsen” AUTHOR SIGMOD CONFERENCE ?z (?z > 2000) YEAR Publication Subscription
VLDB05 10 Data Model (RSS Documents) Publications are represented as directed graphs with node and edge labels Node labels are typed Literal value Class Edge labels are typed Class Classes can be related using multiple inheritance ontology
VLDB05 11 Query Language (GQL) Queries are represented as directed graph patterns with node and edge labels Node labels are variables Variables can be constrained by Classes Class instances and literal values Edge labels are class instances Mapping (matching) semantics Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints are satisfied
VLDB05 12 Conclusion and Future Work Proposed a prototype for graph-based metadata filtering G-ToPSS supports high matching rate for an expressive subscription language Extend G-ToPSS with full RDF language features Optimize constraint processing during matching