Community Clustering in Distributed Publish/Subscribe System Wei Li 1,2,Songlin Hu 1, Jintao Li 1, Hans-Arno Jacobsen 3 1 Institute of Computing Technology, Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences, Beijing, China 3 University of Toronto, Toronto, Canada IEEE Cluster 2012
Agenda Background Algorithms Experiments Conclusions
Background Distributed publish/subscribe systems Clients (publishers & subscribers) Routers (a.k.a. brokers) … Distributed Router System … Advertisement
Background … Distributed Router System … Subscription Advertisement
Background … Distributed Router System … Advertisement Subscription Publication
Background Distributed Publish/Subscribe Systems Loosely coupled communication abstraction Widely used in industry, for example GooPS at Google PNUTS at Yahoo!
Client Placement Client placement affects performance of the system Current solutions Connecting to closest broker [Chen_05] Interest clustering of subscribers [Querzoni_08, Riabov_02] Publisher dynamic placement [Cheung_10] Limitations Complex communication relationships in interacting clients are not considered The cost of client relocation is not considered
Algorithms Problem definition Network of interacting clients Distributed routers
Algorithms Problem definition cont’d. The allocation of clients to routers Maximize the performance of the system Minimize the cost of client allocation
Agenda Background Algorithms Experiments Conclusions
Algorithms Overview
Algorithms Steps Phase 1: Network construction among clients Phase 2: Community division of client network Newman’s algorithm: modularity-based [Newman_04]
Algorithms Steps Phase 3: Heuristic community clustering Majority-place Mp:
Algorithms Steps Phase 2 and Phase 3 are iterative: Re-divide several communities into smaller ones Performance lose vs. deployment cost decrease Achieve trade off between performance and deployment cost Phase 4: Load balancing
Agenda Background Algorithms Experiments Conclusions
Experiments Community clustering vs. interest clustering Experiment settings Different relationship modes of clients Random Small-world Scale-free Differently structured router overlays
Evaluation Different relationship modes among clients Message distribution
Evaluation Different relationship modes among clients Message latency & load reduction
Evaluation Different cluster compositions
Agenda Background Algorithms Experiments Conclusions
A community clustering method is proposed for distributed publish/subscribe systems Community clustering is effective to improve the performance under different experimental settings