Download presentation
Presentation is loading. Please wait.
Published byJeffery Evans Modified over 6 years ago
1
Towards an Internet-Scale XML Dissemination Service
Yanlei Diao Shariq Rizvi Michael J. Franklin EECS, U.C. Berkeley 12/9/2018
2
XML dissemination services
Outline XML dissemination services System model Core techniques Status and conclusions Yanlei Diao /9/2018
3
Applications of XML Dissemination
News feeds via RSS (Really Simple Syndication) My Yahoo!: updated headlines from BBC, CNet, NPR. Mobile services Mobile operators: connect content providers with millions of clients running a multitude of operating systems. Stock tickers QuoteMedia: fast access to real-time and historical stock data. Online auctions: freebidingtools.com: create your own feed for your favorite eBay search. Network monitoring: Ganglia: a distributed monitoring system for clusters and grids. Yanlei Diao /9/2018
4
YFilter: An XML Dissemination Service
XML streams user queries query results Data Source YFilter User queries: Specification of data interests, written in an XML query language. Data sources: Continuously publish XML data items. The service: Delivers to each user the XML data items that match her data interests; the delivered results are presented in a customized format. Yanlei Diao /9/2018
5
ONYX: Large-Scale XML Dissemination
Operator Network using YFilter for XML Dissemination Data Source Broker YFilter An overlay network of information brokers running YFilter. U1 U2 U3 U4 U5 Underlying infrastructures: A dedicated network Peer-to-peer Collaboration among administrative domains Yanlei Diao /9/2018
6
Design Space: Expressiveness
Expressiveness: data model + query language a service supports Subject-based Messages: a subject label Queries: a specific label or a wildcard Predicate-based Messages: attribute-value pairs Queries: a set of predicates XML filtering Messages: XML Queries: subset of XPath 1.0 XML filtering and transformation Queries: subset of XQuery Yanlei Diao /9/2018
7
Design Space: Why Distributed Processing?
Privacy Regulations: e.g., CA Senate Bill No Policies: e.g., customers’ data stay behind the firewall. Locality of data interests Disseminate regional data directly to local subscribers. Scalability Data volume: number of messages per second up to thousands, message size from 1 KB to 20 KB. Query population: up to millions. Frequency of query updates: from a daily basis to every few minutes. Result Volume: can amplify the input data volume by a large factor. Yanlei Diao /9/2018
8
Related Systems Yanlei Diao /9/2018
9
Content of the Paper Content-driven routing
Need to handle both structural and value-based constraints. Leverage YFilter: NFA-based operator networks, distributed construction. Filtering power of routing (i.e., fraction of messages filtered) Filtering power can be inherently limited. Use query partitioning (if possible) to improve it. Distributed transformation Currently either at the publishers’ side or at the edge brokers. Perform cascading message transformation during routing. Efficient XML transmission Verbosity of XML, and XML parsing at each routing step. Investigate different XML formats for XML transmission. Detailed architectural design Other optimization techniques… Yanlei Diao /9/2018
10
Outline System model Core techniques Status and conclusions
XML dissemination services System model Core techniques Status and conclusions Yanlei Diao /9/2018
11
Operations on Data/Query flows
Data Source Broker 1 Broker 2 Broker 4 Broker 3 Broker 5 Broker 6 data flow Broker 2: Broker 4: Broker 5: or Broker 6: =“Sports”] … [a transformation query*] Q1: Q2: Q3: Forecasts”]] Q5: … Q4: … query flow Yanlei Diao /9/2018
12
System Tasks on Data/Query Planes
Processing planes: query plane and data plane Planes System Tasks Query Plane Data Plane Content-driven routing Incremental transformation Final query processing Build routing tables Lookup in routing tables Build transformation plans Execute transformation Build query plans Execute query plans Yanlei Diao /9/2018
13
Outline Core techniques XML dissemination services System model
Status and conclusions Next, let me talk about some core techniques Yanlei Diao /9/2018
14
Routing Table Design A routing table: mapping from output links to routing queries. a routing query: the data interests of queries down from an output link. data interest of a query: XPath expressions, for and where clauses of FLWOR expressions. Routing table design a canonical form of routing queries; a representation of routing tables; and an algorithm constructing them from a distributed query population. Two (conflicting) goals High filtering power of routing Fraction of messages filtered in routing. High routing efficiency Number of messages routed per second. Yanlei Diao /9/2018
15
YFilter Basics An XML filtering and transformation engine that processes multiple queries in a shared fashion. A Non-Deterministic Finite Automaton (NFA)-based operator network. Benefits for routing: Fast structure matching. A small maintenance cost for query updates. Extensibility for supporting new operators. Q1: /nitf pubdata toject. subject nitf head * 1 2 3 5 4 6 Q1 Q2: /nitf Q2 Y. Diao and M.J. Franklin. Query Processing for High-Volume XML Message Brokering. VLDB 2003. Y. Diao, et al. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. TODS, Dec YFilter v1.0 release: Coming later this month! Yanlei Diao /9/2018
16
Our Solution Routing queries are a disjunction of path expressions
Each XPath expression (equivalent of the for and where clauses of FLOWR expressions) is a routing query. Multiple routing queries can be connected by or. Routing table representation Merge routing queries into a single combined operator network. Construction algorithm Map(): a user query a routing query in the canonical form. Collect(): routing queries sent from child brokers a routing table. Aggregate(): all the routing queries (at a node) a new routing query. Yanlei Diao /9/2018
17
(mapping from links to routing queries)
An Example Scenario A new routing query Aggregate( ) (4d) Broker 4 from Broker 5 from Broker 6 Routing Table (mapping from links to routing queries) Collect( ) (4c) Broker 5 Q1: ] Q2: A new routing query Aggregate( ) (5b) Broker 6 Q3: [.//series/series.name=“Tide Forecasts” ] A new routing query Aggregate( ) (6b) Routing queries Map( ) (5a) Routing queries Map( ) (6a) Yanlei Diao /9/2018
18
Example (continued) Broker5 Broker6 Broker6 Broker5 Q1 Q2 Q3
pubdata toject. subject nitf head * 1 2 3 5 4 6 Broker5 (6b) pubdata nitf head * 1 2 3 4 5 series 7 Broker6 Collect( ) Map( ) & Aggregate( ) Broker6 (4c’) pubdata toject. subject nitf head * 1 2 3 5 4 6 series 7 Broker5 Map( ) & Aggregate( ) (5a) Q1 Q2 pubdata toject. subject nitf head * 1 2 3 5 4 6 (6a) Q3 pubdata nitf head * 1 2 3 4 5 series 7 Yanlei Diao /9/2018
19
Sharing and Short-cut Evaluation
A problem with sharing Separate routing query representations: short-cut evaluation. Combined one: sharing may sacrifice the short-cut evaluation strategy. (4c) Broker 5: Broker 6: (5b) (6b) (5b) pubdata toject. subject nitf head * 1 2 3 5 4 6 Broker5 (6b) series 7 Broker6 Solution: dynamic pruning of the operator network at runtime Each operator/NFA state has a static set of broker ids that it can reach. System keeps a dynamic set of broker ids that have been reached. YFilter execution is extended to prune the operator network using these sets. (4c’) pubdata toject. subject nitf head * 1 2 3 5 4 6 series 7 Broker5 Broker6 Yanlei Diao /9/2018
20
Other Routing Considerations:
Content Generalization Large routing tables can be a problem. Introduce content generation as an additional step in Collect( ) or Aggregate( ). Generalization methods. Trade off filtering power for routing (space) efficiency. Filtering Power of Routing Fraction of messages filtered by routing. Selectivity of the union of the user queries at the node. Loss in precision in the routing queries representing this node. If inherently low, partition the query population to improve it. An Exclusiveness Pattern: e.g., Identify a set of such patterns, and partition queries using them. Yanlei Diao /9/2018
21
Status and Conclusions
Queries bring intelligence to the network routing fabric. We present a detailed architectural design of ONYX. We address fundamental issues. YFilter’s NFA-based operator networks are good for routing! Locality of data interests is key to filtering power! Status: YFilter release, XML transmission, other implementation underway. This is an area full of opportunities for optimization. Improving routing efficiency. Improving filtering power of routing. Incremental message transformation. Sharing among different processing tasks. Schema-based optimization… Yanlei Diao /9/2018
22
Questions ONYX Yanlei Diao /9/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.