Storing and Replication in Topic-Based Pub/Sub Networks

Storing and Replication in Topic-Based Pub/Sub Networks
V. Sourlas, P. Flegkas, G. S. Paschos, D. Katsaros and L. Tassiulas Department of Computer & Communication Engineering, University of Thessaly, Greece. CERTH-ITI vsourlas, pflegkas, gpasxos, dkatsar, (Globecom 2010)

Intro (1) Clients – publish and subscribe to classes of events they are interested in. Broker (event dispatcher) – collect subs and forwards events to subscribers Clients use (filters) that allow sophisticated matching on the event content Messages is guaranteed to reach all interested “active” clients

Intro (2) Dynamic distributed environment
Clients join the network after the publication time of an interesting message Existing pub/sub arch’s (IBM Gryphon, Siena, REDS) do not provide historic data retrieval Storing is one of the most challenging problems in pub/sub

Contribution Enhance pub/sub with an advertisement and a request/response mechanism Propose a new algorithm for the selection of M storage points among the N brokers (M < N) based a) on the locality of the interest, b) the targeted “replication degree” of each topic and c) the storage capacity “SC” of each storage Evaluate through simulations the storing technique and the new placement and replication algorithm

Objective Minimize client’s response latency subject to installing the minimum number of storages in the network

Related work No previous work on storing in pub/sub networks, only a couple of caching schemes for historic data retrieval Placement problem is thoroughly investigated in the context of CDN and Web Proxies. Placement problem is NP-hard when striving for optimality A bunch of approximate solutions – k-median problem

Advertise and Store

Request and Response

Placement/Replication strategy
ri be the traffic (in reqs/sec) from clients attached to node i Pij be the percentage of the overall traffic accessing the target server j that passes through node i. propagation delay (hops) from node i to the target server j as Dij If a storage is placed at node i we define the Gain to be Gij = Pij Dij. This means that the Pij percentage of the traffic would not need to traverse the distance from node i to server j

Greedy algorithm 1st round: evaluates each of the N nodes to determine its suitability to become a storage. Computes the Gain associated with each node and selects the one that maximizes the Gain 2nd round: searches for a second storage which, in conjunction with the storage already picked, yields the highest Gain completes: iterates until k storages have been chosen for the specific server

Modified Greedy algorithm
In pub/sub: no knowledge of the location of the server, differently there is no server at all repeat Greedy alg N times (server j is a different node of the network) N vectors of k possible storages ( [ ] , N=5, k = 2, store at nodes 3 and 5) Choose as our storages those k nodes that appeared more times in the per element summation of the N vectors

Placement algorithm for pub/sub networks (1)
Parameters used: rti : request rate for topic t in broker i N : number of nodes (brokers) in the network M : (M < N) number of storages in the network k : (k ≤ M) replication of each topic in the network SC : storage capacity of each storage point in the network T : number of classes of content (topics) wt : weight of each topic in the network SBV : storage brokers vector PSt : possible stores vector for each topic t

Placement algorithm for pub/sub networks (2)
Steps: For each topic t we execute the modified greedy algorithm and we get T vectors of possible storages PSt Each vector (PSt) is weighted by wt (significance regarding the traffic of each topic in the network) We select as our storages those M nodes that appeared more times in the per element weighted summation of the T vectors (SBV vector) For each topic t starting from the most significant (based on the weight) assign k storages following the procedure below (Generalized assignment problem → NP-complete knapsack problem ): For each entry in the PSt of topic t calculated in step 1 assign a storage if that entry also appears in the SBV calculated in step 3 and only if in that storage has been assigned less than SC topics until we get k storages (replication of topic t).

Placement algorithm for pub/sub networks - example
N=6, k=2, SC=2 and T=3 → M=3 Step 1: PSa=[ ], PSb=[ ], PSc=[ ] Step 2: wa=17/50=0.34, wb=27/50=0.54, wc=6/50=0.12 PSa=[ ] PSb=[ ] PSc=[ ] Step 3: per element sum [ ] SBV=[3 5 2] Step 4: b in [3 5], a in [2 3], c in [2 5] (assign based in w)

Performance Evaluation (1)
Compare “pub/sub” to: “grd_opt”: each topic is assigned to the k storages produced by the first step of the placement alg “rnd”: no differentiation among topics, random assignment after the selection of the storages Metric: Mean hop distance between the requesting client and the storage (indicative of the response latency)

Performance Evaluation (3)

Conclusion and future work
Put forward a new mechanism for storing in topic based pub/sub networks Presented a new placement and replication algorithm that differentiates classes of content (1%-5% worse than greedy, using 50%-80% less storages) F.W.: optimize different objectives, dynamic assignment when req. rates change

Questions – Suggestions ???
Thank you!!! Questions – Suggestions ???

Storing and Replication in Topic-Based Pub/Sub Networks

Similar presentations

Presentation on theme: "Storing and Replication in Topic-Based Pub/Sub Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storing and Replication in Topic-Based Pub/Sub Networks

Similar presentations

Presentation on theme: "Storing and Replication in Topic-Based Pub/Sub Networks"— Presentation transcript:

Similar presentations

About project

Feedback