Presentation is loading. Please wait.

Presentation is loading. Please wait.

The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based.

Similar presentations


Presentation on theme: "The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based."— Presentation transcript:

1 The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based Publish-Subscribe Overlay Network Design Melih Onus, TOBB University of Economics and Technology Andrea W. Richa, Arizona State University

2 Publish/Subscribe (Pub/Sub) N1 Subscription(N1)={B,C,D} N2 {A,B,C,E,} N3 {A,D} N4 {A,B,X} N5 {A,X} Message Bus Publish(M1, A) M1

3 Scalability of Pub/Sub Most traditional pub/sub systems are geared towards small scale deployment –E.g., Isis MDS, TIB, MQSeries, Gryphon New generation of applications… –Large data centers: Amazon, Google, Yahoo, EBay,… –RSS, feed/news readers, on-line stock trading and banking –Web 2.0, Second Life …drive dramatic growth in scale –10,000s of nodes, 1000s of topics, Internet-wide distribution Emerging systems address this trend using P2P techniques

4 Overlay-Based Pub/Sub N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 (M1, A) SCRIBE Corona Feedtree Sub-2-Sub TERA... Relay

5 Overlay Topologies for Pub/Sub “Good” overlay will allow for efficient and simple publication routing –Small routing tables, low load on relays, –low latency Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub- graph –Most existing implementations construct topic-connected overlays

6 Topic-Connectivity Topics B,C,X,E are connected Topics A and D are disconnected N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4

7 Topic-Connectivity: Simple Solution N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4  Node degree grows linearly with the subscription size  Roughly twice as big as the subscription size for rings/trees

8 Scalability of the Simple Solution Negative impact on performance due to –CPU load: neighbor monitoring, message processing –Connection maintenance and header overhead –Memory overhead: per-link state associated with routing and/or compression schemes being used, etc.  Scalability barrier for large systems offering a wide range of subscription choices Can we do better?

9 The MinMax-TCO Problem Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem: –For a set of nodes V, set of topics T, and I nterest: V  T  {true, false} –Construct a topic-connected overlay G with the minimum possible maximum degree TCO (decision version): –Decide whether there is a topic-connected overlay with maximum degree k (for a given k )

10 GM Algorithm The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.

11 Complexity of MinMax-TCO Lemma: MinMax-TCO(V,T,Interest,k)  NP Proof: Topic connectivity is verifyable in polynomial time Lemma: MinMax-TCO(V,T,Interest,k) is NP-hard Proof: 1.Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node  d 2.Set Cover is polynomially reducible to SN-TCO 3.SN-TCO is polynomially reducible to TCO Theorem: MinMax-TCO is NP-complete

12 Approximating MinMax-TCO The idea: exploiting subscription overlaps –Connecting the nodes with overlapping interests improves connectivity of several topics at once Overlay Design Algorithm (ODA): –Start from a singleton connected component for each (v, t)  V  T –At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally –Stop, once there is a single connected component for each topic

13 ODA Running Time O(|V| 4  |T|) –At most |V| 2 iterations –At most |V| 2 edges inspected at each iteration –At most |T| steps to inspect an edge Can be optimized to run in O(|V| 2  |T|) –For each e  V  V, weight(e) = the number of connected components merged by e –At each iteration, output the heaviest edge and adjust the other edge weights accordingly –Stop once there are no more edges with weight > 0

14 Approximability Results Lemma: The number of edges in the overlay constructed by GM  log(|V|  |T|) OPT Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover Uses Maximum Weighted Matching Uses Edge Coloring Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)

15 ODA Algorithm The ODA algorithm can have average degree of (n), when constant average degree overlay network exists. v n-1 v1v1 v2v2 v3v3 vnvn … v1v1 v2v2 v3v3 vnvn …… v3v3 v1v1 v2v2 vnvn

16 ODA and GM Algorithms GM Algorithm: Choose edge with maximum benefit –Average Degree: O(log nt) approximation –Maximum Degree: O(n) approximation ODA Algorithm: Choose edge with maximum benefit among the ones that increases maximum degree minimally –Average Degree: O(n) approximation –Maximum Degree: O(log nt) approximation How to approximate both average and maximum degree?

17 Parameterized Algorithm e 1 : Edge with maximum benefit e 2 : Edge with maximum benefit among the ones that increases maximum degree minimally If w(e 2 ) > w(e 1 ) / k, choose e 2 Otherwise, choose e 1 1 < k < n

18 Algorithms GM Algorithm: –Average Degree: O(log nt) approximation –Maximum Degree: O(n) approximation ODA Algorithm: –Average Degree: O(n) approximation –Maximum Degree: O(log nt) approximation P-ODA Algorithm: –Average Degree: O(k * log nt) approximation –Maximum Degree: O((n/k)*log nt) approximation

19 Constant Diameter Overlays Constant Diameter Topic-Connected Overlay (CD- TCO) problem: –For a set of nodes V, set of topics T, and I nterest: V  T  {true, false} –Construct a topic-connected, constant diameter overlay G with the minimum possible average degree The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.

20 Constant Diameter Overlay Algorithm Constant Diameter Overlay Design Algorithm: –At each iteration: Find number of neighbors for each node Add a star which connects maximum number of nodes, Remove topics which are connected by the star –Stop, once there is a single connected component for each topic Number of neighbors of node u:

21 Constant Diameter Overlay Algorithm I Constant Diameter Overlay Design Algorithm I: –At each iteration: Find weight for each node Add a star which connects the node with maximum weight, Remove topics which are connected by the star –Stop, once there is a single connected component for each topic Weight of node u:

22 Constant Diameter Overlay Algorithm II Constant Diameter Overlay Design Algorithm II: –At each iteration: Find number of neighbors for each node Add a star which connects the node with maximum density, Remove topics which are connected by the star –Stop, once there is a single connected component for each topic Density of node u:

23 Experimental Results I Average Node Degree Varying #nodes #topics: 100 #subscription: 10 Uniform distribution Only 2.3 times more edge

24 Experimental Results II Average Node Degree Varying #topics #nodes: 100 #subscription: 20 Uniform distribution Only 1.9 times more edge

25 Experimental Results III Average Node Degree Varying #subscription #nodes: 100 #topics: 100 Uniform distribution Only 1.8 times more edge

26 Conclusions Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs Parameterized algorithm with low maximum and average degree Defined the problem (CD-TCO), empirical results

27 Future Directions Study dynamic case Investigate other overlay design problems Study distributed case –Partial knowledge of other node interest –Dynamically changing interest assignments Proving diameter results theoretically

28 Thank You!


Download ppt "The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based."

Similar presentations


Ads by Google