A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock.

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock IBM Haifa Research Lab, Israel

IBM Haifa Research Lab 2 Outline  Motivation  The channelization problem  Our hybrid approach  Experimental results  Conclusions

IBM Haifa Research Lab 3 Motivation: large scale publish subscribe application  Large number of information flows (topics) and subscribers  Each flow must be delivered to a subset of interested subscribers  Example: financial market data dissemination  Publisher divides data feed into a large number information flows, (~100K) e.g. stock symbols, futures, commodities  Many stand-alone subscribers (~1K)  Subscribers display interest heterogeneity - are interested in different yet overlapping subsets of the topics  Any single topic may be delivered to a large number of subscribers (hot / cold topics) Subscribers Publisher Data Vendor WAN Enterprise LAN Multiple information flows (Topics)

IBM Haifa Research Lab 4 Common approaches  Use unicast (point-to-point) connections  Limitations: poor utilization of network resources (duplicate transmissions)  Use broadcast (single multicast channel)  Limitations: receivers filter unwanted content  Utilize multicast to transmit data  Topics are mapped into multicast groups. Each user joins the groups that cover his topic-interest.  Reduces receiver filtering  Limitations: limited amount of multicast addresses  Network element state problem  Receiver resources (NICs)

IBM Haifa Research Lab 5 Our novel contribution  Create a hybrid approach that combines both multicast and unicast  Flexible allocation of transmissions  Topics with high interest enjoy efficiency of multicast  Topics with low interest are transmitted in unicast  Formalize as an optimization problem  Propose a two step alternating method for computing the resource allocation

IBM Haifa Research Lab 6 The Channelization Problem  n flows  Flow rates λ  k multicast groups  m users  Interest matrix W The task: find mapping matrices X,Y that minimizes the communication cost  The cost of transmission – take into account transmission to multiple groups  The cost of reception – minimize excess filtering

IBM Haifa Research Lab 7 The Hybrid Channelization Problem F1F1 F2F2 FnFn F3F3 G1G1 G2G2 GkGk U1U1 U2U2 UmUm U3U3 Flows Users Multicast Groups F 1 F 2 F 1 F 2 F 8 F 3 F 4 F 6 F 1 F n Interest Extraction (W) F4F4 X – flow to group map Y – user subscription map T – unicast transmission map

IBM Haifa Research Lab 8 The Hybrid Channelization Problem  Modified cost function  Problem objective is Cost of multicast reception Cost of multicast transmission Cost of unicast reception & transmission

IBM Haifa Research Lab 9 Proposed Solution  Unfortunately the hybrid problem is NP-hard  We propose a two step heuristic solution  First step: solve the channelization problem (multicast mapping)  Second step:  Choose flow-user pairs for unicast,  Remove redundant assignments from multicast mapping  Recalculate the cost  Iterate until convergence, or unicast BW limit exceeded

IBM Haifa Research Lab 10 First step: channelization problem solution  We have experimented with the following algorithms  K-Means (2005) performs best

IBM Haifa Research Lab 11 K-Means Mapping Algorithm  Input  Interest matrix, topic rate vector  Basic insight  Put “similar” topics in the same group  “Similar” topics have a similar audience - causes less filtering  Take the rate into account  Iterative Clustering Algorithm (K-means)  Init: Topics are assigned into a fixed number of groups  Move: In each step, remove a single topic, and move it to the best group – the one producing the lowest cost  Cost: After each epoch, compute total filtering cost  Stop: cost doesn’t improve | time elapsed | max # iter. T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T5T5 ? ? ? v xxxx x vv xx Users Topics xx vvv User’s Interest Vector Topic’s Audience Vector Interest Matrix = R1R1 R2R2 …RKRK Rate Vector =

IBM Haifa Research Lab 12 Second step: choosing user-flow pairs for unicast  Experimented with several heuristics  Heavy users - all transmission to a specific heavy user is sent using unicast  Lightweight flows - flows with low bandwidth are sent using unicast  Greedy flows - move to unicast the flow which best minimizes the total cost  Greedy users - move to unicast the user which best minimizes the total cost  An additional heuristic - Greedy user-flow pairs – move to unicast the user-flow pair which best minimizes the total cost - very slow, impractical run-time

IBM Haifa Research Lab 13 Experimental results  Construction of user-interest matrix W  Random, uniform  Market distribution – based on a model of NYSE stock volume  IBM WebSphere cell – a real system

IBM Haifa Research Lab 14 Channelization algorithms  K-Means (2005) performs best  Takes rate into account  Gradient decent on the true cost function

IBM Haifa Research Lab 15 Effect of the interest matrix on channelization performance  The interest and rate have a significant effect on channelization performance  Some interests have patterns that are easy to “channelize”  Interests with less entropy, more order, are easier

IBM Haifa Research Lab 16 Hybrid Algorithm Heuristics Market dist. - Greedy users Can use more unicast BW WebSphere dist. - Greedy flows Doesn’t need more than 20% unicast BW Unicast BW limit – algorithm will use optimal amount up to the limit

IBM Haifa Research Lab 17 Hybrid using greedy flow – unicast / multicast tradeoff Unicast BW allocation – exact amount of unicast BW used  Every interest and rate distribution has an optimal amount of unicast BW it can use  The hybrid approach improves upon both unicast-only and multicat-only

IBM Haifa Research Lab 18 Conclusions  We have presented a novel hybrid approach for publish subscribe  We have shown using extensive and realistic simulation results that our approach reduces consumed network and host resources  K-Means (2005) performs best for channelization, from the selection of algorithms we tested  Greedy hybrid heuristics performed best in our tests  Relative competitiveness of the greedy-flows & greedy-users heuristics depends on the structure of the interest matrix and rate ~ The End ~

IBM Haifa Research Lab 19  Model based on statistical analysis of NYSE daily trade data  20K Topics  500 Subscribers  Avg. ~70 flows / user  Min 15 flows / user  Max 115 flows / user  Avg. message fan out ~10.1 clients  Multicast - message is transmitted once  Unicast transmitter data rate is x10 of multicast ! Real Life Messaging Load Model Backup – Model

IBM Haifa Research Lab 20 Messaging Load Model – Based on Market Research  Financial front office  Hundreds of users, requiring stock quotes and financial information from several markets  Topic space structure  Within each market, symbol popularity and rate are exponentially distributed (NYSE market research)  Several different markets, with Avg. popularity and size prop. ~1/m (assumption).  20K flows, 10 markets, 500 users  User interest  Each user: selects some markets, selects a percent of the symbols from each chosen market, according to the said distributions ~10% of Symbols ~55% of trade Backup – Model

IBM Haifa Research Lab 21 Mapping Algorithm  Input  interest matrix, topic rate vector  Basic insight  Put “similar” topics in the same group  “Similar” topics have a similar audience  A group with a homogenous audience causes less filtering  Take the rate into account  The cost of putting two topics in the same group  The cost of adding a new topic to a group of topics vxxxx xvvxx Users Topics xxvvv Interest Matrix Topics with identical audience Topics with similar audience vx vv xv xx Users R2 0 R1 0 Topics 12 1 2 3 4 R1+ R2 Filtering Cost Rk – the rate of topic k Backup – Algorithm

IBM Haifa Research Lab 22 Iterative Clustering Algorithm (K-means)  Init: Topics are assigned into a fixed number of groups  Move: In each step, remove a single topic, and move it to the best group – the one producing the lowest cost  Cost: After each epoch, compute total filtering cost  Stop: time elapsed | cost does not improve | exceeded max number of iterations Topic group v v v x x x v x v v x x v v v x v x x v v x x x 123 Users v v v v x x Group audience vector Candidate topic 5 R1+R2+R3 0 R5 0 R1+R2+R3+R5 The cost of adding topic 5 to topic group {1,2,3} 0 0 The best group for topic K is the group with the lowest cost T1 T2 T3 T4 T5 T6 T7 T8 T9 T5 ? ? ? Backup – Algorithm

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock.

Similar presentations

Presentation on theme: "A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock.

Similar presentations

Presentation on theme: "A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock."— Presentation transcript:

Similar presentations

About project

Feedback