Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards an Internet-Scale XML Dissemination Service

Similar presentations


Presentation on theme: "Towards an Internet-Scale XML Dissemination Service"— Presentation transcript:

1 Towards an Internet-Scale XML Dissemination Service
Yanlei Diao Shariq Rizvi Michael J. Franklin EECS, U.C. Berkeley 12/9/2018

2 XML dissemination services
Outline XML dissemination services System model Core techniques Status and conclusions Yanlei Diao /9/2018

3 Applications of XML Dissemination
News feeds via RSS (Really Simple Syndication) My Yahoo!: updated headlines from BBC, CNet, NPR. Mobile services Mobile operators: connect content providers with millions of clients running a multitude of operating systems. Stock tickers QuoteMedia: fast access to real-time and historical stock data. Online auctions: freebidingtools.com: create your own feed for your favorite eBay search. Network monitoring: Ganglia: a distributed monitoring system for clusters and grids. Yanlei Diao /9/2018

4 YFilter: An XML Dissemination Service
XML streams user queries query results Data Source YFilter User queries: Specification of data interests, written in an XML query language. Data sources: Continuously publish XML data items. The service: Delivers to each user the XML data items that match her data interests; the delivered results are presented in a customized format. Yanlei Diao /9/2018

5 ONYX: Large-Scale XML Dissemination
Operator Network using YFilter for XML Dissemination Data Source Broker YFilter An overlay network of information brokers running YFilter. U1 U2 U3 U4 U5 Underlying infrastructures: A dedicated network Peer-to-peer Collaboration among administrative domains Yanlei Diao /9/2018

6 Design Space: Expressiveness
Expressiveness: data model + query language a service supports Subject-based Messages: a subject label Queries: a specific label or a wildcard Predicate-based Messages: attribute-value pairs Queries: a set of predicates XML filtering Messages: XML Queries: subset of XPath 1.0 XML filtering and transformation Queries: subset of XQuery Yanlei Diao /9/2018

7 Design Space: Why Distributed Processing?
Privacy Regulations: e.g., CA Senate Bill No Policies: e.g., customers’ data stay behind the firewall. Locality of data interests Disseminate regional data directly to local subscribers. Scalability Data volume: number of messages per second up to thousands, message size from 1 KB to 20 KB. Query population: up to millions. Frequency of query updates: from a daily basis to every few minutes. Result Volume: can amplify the input data volume by a large factor. Yanlei Diao /9/2018

8 Related Systems Yanlei Diao /9/2018

9 Content of the Paper Content-driven routing
Need to handle both structural and value-based constraints. Leverage YFilter: NFA-based operator networks, distributed construction. Filtering power of routing (i.e., fraction of messages filtered) Filtering power can be inherently limited. Use query partitioning (if possible) to improve it. Distributed transformation Currently either at the publishers’ side or at the edge brokers. Perform cascading message transformation during routing. Efficient XML transmission Verbosity of XML, and XML parsing at each routing step. Investigate different XML formats for XML transmission. Detailed architectural design Other optimization techniques… Yanlei Diao /9/2018

10 Outline System model Core techniques Status and conclusions
XML dissemination services System model Core techniques Status and conclusions Yanlei Diao /9/2018

11 Operations on Data/Query flows
Data Source Broker 1 Broker 2 Broker 4 Broker 3 Broker 5 Broker 6 data flow Broker 2: Broker 4: Broker 5: or Broker 6: =“Sports”] [a transformation query*] Q1: Q2: Q3: Forecasts”]] Q5: … Q4: … query flow Yanlei Diao /9/2018

12 System Tasks on Data/Query Planes
Processing planes: query plane and data plane Planes System Tasks Query Plane Data Plane Content-driven routing Incremental transformation Final query processing Build routing tables Lookup in routing tables Build transformation plans Execute transformation Build query plans Execute query plans Yanlei Diao /9/2018

13 Outline Core techniques XML dissemination services System model
Status and conclusions Next, let me talk about some core techniques Yanlei Diao /9/2018

14 Routing Table Design A routing table: mapping from output links to routing queries. a routing query: the data interests of queries down from an output link. data interest of a query: XPath expressions, for and where clauses of FLWOR expressions. Routing table design a canonical form of routing queries; a representation of routing tables; and an algorithm constructing them from a distributed query population. Two (conflicting) goals High filtering power of routing Fraction of messages filtered in routing. High routing efficiency Number of messages routed per second. Yanlei Diao /9/2018

15 YFilter Basics An XML filtering and transformation engine that processes multiple queries in a shared fashion. A Non-Deterministic Finite Automaton (NFA)-based operator network. Benefits for routing: Fast structure matching. A small maintenance cost for query updates. Extensibility for supporting new operators. Q1: /nitf pubdata toject. subject nitf head * 1 2 3 5 4 6 Q1 Q2: /nitf Q2 Y. Diao and M.J. Franklin. Query Processing for High-Volume XML Message Brokering. VLDB 2003. Y. Diao, et al. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. TODS, Dec  YFilter v1.0 release: Coming later this month! Yanlei Diao /9/2018

16 Our Solution Routing queries are a disjunction of path expressions
Each XPath expression (equivalent of the for and where clauses of FLOWR expressions) is a routing query. Multiple routing queries can be connected by or. Routing table representation Merge routing queries into a single combined operator network. Construction algorithm Map(): a user query  a routing query in the canonical form. Collect(): routing queries sent from child brokers  a routing table. Aggregate(): all the routing queries (at a node) a new routing query. Yanlei Diao /9/2018

17 (mapping from links to routing queries)
An Example Scenario A new routing query Aggregate( ) (4d) Broker 4 from Broker 5 from Broker 6 Routing Table (mapping from links to routing queries) Collect( ) (4c) Broker 5 Q1: ] Q2: A new routing query Aggregate( ) (5b) Broker 6 Q3: [.//series/series.name=“Tide Forecasts” ] A new routing query Aggregate( ) (6b) Routing queries Map( ) (5a) Routing queries Map( ) (6a) Yanlei Diao /9/2018

18 Example (continued)  Broker5 Broker6 Broker6 Broker5 Q1 Q2 Q3
pubdata toject. subject nitf head * 1 2 3 5 4 6 Broker5 (6b) pubdata nitf head * 1 2 3 4 5 series 7 Broker6 Collect( ) Map( ) & Aggregate( ) Broker6 (4c’) pubdata toject. subject nitf head * 1 2 3 5 4 6 series 7 Broker5 Map( ) & Aggregate( ) (5a) Q1 Q2 pubdata toject. subject nitf head * 1 2 3 5 4 6 (6a) Q3 pubdata nitf head * 1 2 3 4 5 series 7 Yanlei Diao /9/2018

19 Sharing and Short-cut Evaluation
A problem with sharing Separate routing query representations: short-cut evaluation. Combined one: sharing may sacrifice the short-cut evaluation strategy. (4c) Broker 5: Broker 6: (5b) (6b) (5b) pubdata toject. subject nitf head * 1 2 3 5 4 6 Broker5 (6b) series 7 Broker6 Solution: dynamic pruning of the operator network at runtime Each operator/NFA state has a static set of broker ids that it can reach. System keeps a dynamic set of broker ids that have been reached. YFilter execution is extended to prune the operator network using these sets. (4c’) pubdata toject. subject nitf head * 1 2 3 5 4 6 series 7 Broker5 Broker6 Yanlei Diao /9/2018

20 Other Routing Considerations:
Content Generalization Large routing tables can be a problem. Introduce content generation as an additional step in Collect( ) or Aggregate( ). Generalization methods. Trade off filtering power for routing (space) efficiency. Filtering Power of Routing Fraction of messages filtered by routing. Selectivity of the union of the user queries at the node. Loss in precision in the routing queries representing this node. If inherently low, partition the query population to improve it. An Exclusiveness Pattern: e.g., Identify a set of such patterns, and partition queries using them. Yanlei Diao /9/2018

21 Status and Conclusions
Queries bring intelligence to the network routing fabric. We present a detailed architectural design of ONYX. We address fundamental issues. YFilter’s NFA-based operator networks are good for routing! Locality of data interests is key to filtering power! Status: YFilter release, XML transmission, other implementation underway. This is an area full of opportunities for optimization. Improving routing efficiency. Improving filtering power of routing. Incremental message transformation. Sharing among different processing tasks. Schema-based optimization… Yanlei Diao /9/2018

22 Questions ONYX Yanlei Diao /9/2018


Download ppt "Towards an Internet-Scale XML Dissemination Service"

Similar presentations


Ads by Google