Download presentation
Presentation is loading. Please wait.
1
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan
2
Carnegie Mellon University Outline Publish-subscribe systems Subscription Languages Routing Protocols M ERCURY architecture Preliminary evaluation Scalability Performance Future work
3
Carnegie Mellon University Publish-subscribe systems Challenges Subscription language - how to express “interests”? Routing mechanism - how is content routed and where is it matched? Matcher S P ? Publication Subscription Matched Publication
4
Carnegie Mellon University Subscription language How to express interests? Channels or Subjects All content attributes Operators? Exact matches Range queries Regular expressions Subject = MATH Name = ‘A’ & Age < 25 Price = 350$ 300$ ≤ Price ≤ 350$ Name ~= Ash[win]*B*
5
Carnegie Mellon University M ERCURY subscription language Subscription = list of (type, attribute, rel-op, value) Can implement boolean expressions (AND/ORs) ( int, price, LESS_THAN, 300 ) ( string, name, EQUALS, “Opensig” ) Publication = list of (type, attribute, value)
6
Carnegie Mellon University Example: Virtual reality User x ≥ 50 x ≤ 150 y ≥ 150 y ≤ 250 Interests x 100 y 200 Events (100,200) (150,150) (50,250) Arena Virtual World
7
Carnegie Mellon University Routing mechanism Centralized? Easy to make publications “meet” subscriptions Single point of failure – not robust! Distributed? Where are subscriptions stored? How do publications “meet” subscriptions? Broadcast-based solutions not scalable
8
Carnegie Mellon University Distributed routing - goals Scalability is a key goal Flooding anything is bad, bad, bad… System should not have hot-spot in terms of: Computational load – matching ) Subscriptions should be evenly distributed Number of packets routed or received ) Publications should be evenly distributed Yet – we should have low delivery delays!!
9
Carnegie Mellon University Previous work Systems like Scribe use DHTs for scalability Why can’t we ? Exact matches vs. Range queries! int x 1 int x 10 hash Subscription int x = 9 hash Publication ? How about generating 10 subscriptions ? Too many subscriptions Works for discrete-valued attributes only
10
Carnegie Mellon University Attribute Hubs Divide range of an attribute into bins Each node responsible for range of attribute values H price [240, 320) [80, 160) [160, 240) [0, 80) Attribute Hub Hub-nodes connected through a circular overlay Circle only for connectivity One hub per attribute Routing algo compare value in content to my range
11
Carnegie Mellon University Routing illustrated Send subscription to any one attribute hub Send publications to all attribute hubs [0, 80) [210, 320) 50 ≤ x ≤ 150 150 ≤ y ≤ 250 x 100 y 200 HxHx [240, 320) [80, 160) [160, 240) HyHy [0, 105) [105, 210) Subscription Publication Rendezvous point
12
Carnegie Mellon University Efficient routing Reduce number of hops Each hub-node maintains “small” number of pointers to distant parts of the hub How to maintain these pointers? Send ACKs for publication receipts Various caching policies determine the structure of the pointer table e.g., LRU, Uniform-spacing, Exponential-spacing
13
Carnegie Mellon University Routing illustrated [0, 80) [240, 320) [80, 160) [160, 240) HxHx x 100 y 200 Publication [105, 210) HyHy [210, 320) [0, 105) ACK
14
Carnegie Mellon University Evaluation Workload Experimental setup Metrics
15
Carnegie Mellon University Workload One of our target apps multi-player games Model Virtual world as square Subscriptions as rectangles around current positions
16
Carnegie Mellon University Experimental setup Player movements simulated using mobility models from ns-2 Two hubs – x and y co-ordinates Half the nodes in each hub Uniform partition of range
17
Carnegie Mellon University Metrics Scalability metric load Number of publications routed by a node Averaged over time Performance metric publication delivery delay Time between sending of a publication and its receipt by all subscribers Averaged over all subscribers of a publication Averaged over all publications
18
Carnegie Mellon University Results: scalability Ideal graph: delta function Observed variation: ±12%
19
Carnegie Mellon University Results: performance Without caching: linear scaling Caching reduces delays to near optimal Workload effects ? 47.25 Cache size: log(n)
20
Carnegie Mellon University Conclusions Expressive subscription language Decentralized architecture Scalability Avoids flooding of subscriptions and publications – reduces network traffic Distributes publications and subscriptions throughout the network – prevents swamping
21
Carnegie Mellon University Future Work Load balancing Sensitive to data value distribution Adapt ranges dynamically according to the distribution Affects pointer management, caching, etc. x Pr(X=x)
22
Carnegie Mellon University Future Work Perform sensitivity analysis for different kinds of workloads Generic API for building applications on top of M ERCURY To be released soon Build a full-fledged distributed Quake-II
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.