Minimum Maximum Degree Publish-Subscribe Overlay Network Design Melih Onus TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009
Publish/Subscribe (Pub/Sub) N1 Subscription(N1)={B,C,D} N2 {A,B,C,E,} N3 {A,D} N4 {A,B,X} N5 {A,X} Message Bus Publish(M1, A) M1
Scalability of Pub/Sub Most traditional pub/sub systems are geared towards small scale deployment –E.g., Isis MDS, TIB, MQSeries, Gryphon New generation of applications… –Large data centers: Amazon, Google, Yahoo, EBay,… –RSS, feed/news readers, on-line stock trading and banking –Web 2.0, Second Life …drive dramatic growth in scale –10,000s of nodes, 1000s of topics, Internet-wide distribution Emerging systems address this trend using P2P techniques
Overlay-Based Pub/Sub N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 (M1, A) SCRIBE Corona Feedtree Sub-2-Sub TERA... Relay
Overlay Topologies for Pub/Sub “Good” overlay will allow for efficient and simple publication routing –Small routing tables, low load on relays, –low latency Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub- graph –Most existing implementations construct topic-connected overlays
Topic-Connectivity Topics B,C,X,E are connected Topics A and D are disconnected N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4
Topic-Connectivity: Simple Solution N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Node degree grows linearly with the subscription size Roughly twice as big as the subscription size for rings/trees
Scalability of the Simple Solution Negative impact on performance due to –CPU load: neighbor monitoring, message processing –Connection maintenance and header overhead –Memory overhead: per-link state associated with routing and/or compression schemes being used, etc. Scalability barrier for large systems offering a wide range of subscription choices Can we do better?
The MinMax-TCO Problem Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem: –For a set of nodes V, set of topics T, and I nterest: V T {true, false} –Construct a topic-connected overlay G with the minimum possible maximum degree TCO (decision version): –Decide whether there is a topic-connected overlay with maximum degree k (for a given k )
GM Algorithm The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.
Complexity of TCO Lemma: TCO(V,T,Interest,k) NP Proof: Topic connectivity is verifyable in polynomial time Lemma: TCO(V,T,Interest,k) is NP-hard Proof: 1.Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node d 2.Set Cover is polynomially reducible to SN-TCO 3.SN-TCO is polynomially reducible to TCO Theorem: TCO is NP-complete
Approximating Min-TCO The idea: exploiting subscription overlaps –Connecting the nodes with overlapping interests improves connectivity of several topics at once Overlay Design Algorithm (ODA): –Start from a singleton connected component for each (v, t) V T –At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally –Stop, once there is a single connected component for each topic
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A4 B3 C2 D2 X2 E1
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A3 B2 C2 D2 X2 E1
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A3 B2 C2 D1 X2 E1
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A3 B1 C1 D1 X2 E1
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A2 B1 C1 D1 X1 E1
Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic# of conn. comps A1 B1 C1 D1 X1 E1 Maximum degree of 2 vs. almost 4 for ring-per- topic!
ODA Running Time O(|V| 4 |T|) –At most |V| 2 iterations –At most |V| 2 edges inspected at each iteration –At most |T| steps to inspect an edge Can be optimized to run in O(|V| 2 |T|) –For each e V V, weight(e) = the number of connected components merged by e –At each iteration, output the heaviest edge and adjust the other edge weights accordingly –Stop once there are no more edges with weight > 0
Approximability Results Lemma: The number of edges in the overlay constructed by GM log(|V| |T|) OPT Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover Uses Maximum Weighted Matching Uses Edge Coloring Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)
Experimental Results I Maximum Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results II Average Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results III Maximum Node Degree #topics: 100 #nodes: 100 Uniform distribution
Constant Diameter Overlays Constant Diameter Topic-Connected Overlay (CD- TCO) problem: –For a set of nodes V, set of topics T, and I nterest: V T {true, false} –Construct a topic-connected, constant diameter overlay G with the minimum possible average degree The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.
Constant Diameter Overlay Algorithm The idea: adding stars –Make topics connected with star structures Constant Diameter Overlay Design Algorithm: –Start from a singleton connected component for each (v, t) V T –At each iteration: Add a star which connects maximum number of nodes, Remove topics which are connected by the star –Stop, once there is a single connected component for each topic Number of neighbors of node u:
Experimental Results Average Node Degree #topics: 100 #nodes: 100 Uniform distribution Only 2.3 times more edge
Conclusions Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub Defined the problem (MinMax-TCO) capturing the cost of constructing topic-connected overlays –NP-Completeness, polynomial approximation, inapproximability results Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs Defined the problem (CD-TCO), empirical results
Future Directions Study dynamic case Investigate other overlay design problems Study distributed case –Partial knowledge of other node interest –Dynamically changing interest assignments
Thank You!