Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.

Similar presentations


Presentation on theme: "Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan."— Presentation transcript:

1 Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

2 Carnegie Mellon University Outline Publish-subscribe systems Subscription Languages Routing Protocols M ERCURY architecture Preliminary evaluation Scalability Performance Future work

3 Carnegie Mellon University Publish-subscribe systems Challenges Subscription language - how to express “interests”? Routing mechanism - how is content routed and where is it matched? Matcher S P ? Publication Subscription Matched Publication

4 Carnegie Mellon University Subscription language How to express interests? Channels or Subjects All content attributes Operators? Exact matches Range queries Regular expressions Subject = MATH Name = ‘A’ & Age < 25 Price = 350$ 300$ ≤ Price ≤ 350$ Name ~= Ash[win]*B*

5 Carnegie Mellon University M ERCURY subscription language Subscription = list of (type, attribute, rel-op, value) Can implement boolean expressions (AND/ORs) ( int, price, LESS_THAN, 300 ) ( string, name, EQUALS, “Opensig” ) Publication = list of (type, attribute, value)

6 Carnegie Mellon University Example: Virtual reality User x ≥ 50 x ≤ 150 y ≥ 150 y ≤ 250 Interests x 100 y 200 Events (100,200) (150,150) (50,250) Arena Virtual World

7 Carnegie Mellon University Routing mechanism Centralized? Easy to make publications “meet” subscriptions Single point of failure – not robust! Distributed? Where are subscriptions stored? How do publications “meet” subscriptions? Broadcast-based solutions not scalable

8 Carnegie Mellon University Distributed routing - goals Scalability is a key goal Flooding anything is bad, bad, bad… System should not have hot-spot in terms of: Computational load – matching ) Subscriptions should be evenly distributed Number of packets routed or received ) Publications should be evenly distributed Yet – we should have low delivery delays!!

9 Carnegie Mellon University Previous work Systems like Scribe use DHTs for scalability Why can’t we ? Exact matches vs. Range queries! int x  1 int x  10 hash Subscription int x = 9 hash Publication ? How about generating 10 subscriptions ? Too many subscriptions Works for discrete-valued attributes only

10 Carnegie Mellon University Attribute Hubs Divide range of an attribute into bins Each node responsible for range of attribute values H price [240, 320) [80, 160) [160, 240) [0, 80) Attribute Hub Hub-nodes connected through a circular overlay Circle only for connectivity One hub per attribute Routing algo compare value in content to my range

11 Carnegie Mellon University Routing illustrated Send subscription to any one attribute hub Send publications to all attribute hubs [0, 80) [210, 320) 50 ≤ x ≤ 150 150 ≤ y ≤ 250 x 100 y 200 HxHx [240, 320) [80, 160) [160, 240) HyHy [0, 105) [105, 210) Subscription Publication Rendezvous point

12 Carnegie Mellon University Efficient routing Reduce number of hops Each hub-node maintains “small” number of pointers to distant parts of the hub How to maintain these pointers? Send ACKs for publication receipts Various caching policies determine the structure of the pointer table e.g., LRU, Uniform-spacing, Exponential-spacing

13 Carnegie Mellon University Routing illustrated [0, 80) [240, 320) [80, 160) [160, 240) HxHx x 100 y 200 Publication [105, 210) HyHy [210, 320) [0, 105) ACK

14 Carnegie Mellon University Evaluation Workload Experimental setup Metrics

15 Carnegie Mellon University Workload One of our target apps  multi-player games Model Virtual world as square Subscriptions as rectangles around current positions

16 Carnegie Mellon University Experimental setup Player movements simulated using mobility models from ns-2 Two hubs – x and y co-ordinates Half the nodes in each hub Uniform partition of range

17 Carnegie Mellon University Metrics Scalability metric  load Number of publications routed by a node Averaged over time Performance metric  publication delivery delay Time between sending of a publication and its receipt by all subscribers Averaged over all subscribers of a publication Averaged over all publications

18 Carnegie Mellon University Results: scalability Ideal graph: delta function Observed variation: ±12%

19 Carnegie Mellon University Results: performance Without caching: linear scaling Caching reduces delays to near optimal Workload effects ? 47.25 Cache size: log(n)

20 Carnegie Mellon University Conclusions Expressive subscription language Decentralized architecture Scalability Avoids flooding of subscriptions and publications – reduces network traffic Distributes publications and subscriptions throughout the network – prevents swamping

21 Carnegie Mellon University Future Work Load balancing Sensitive to data value distribution Adapt ranges dynamically according to the distribution Affects pointer management, caching, etc. x Pr(X=x)

22 Carnegie Mellon University Future Work Perform sensitivity analysis for different kinds of workloads Generic API for building applications on top of M ERCURY To be released soon Build a full-fledged distributed Quake-II


Download ppt "Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan."

Similar presentations


Ads by Google